0% found this document useful (0 votes)

14 views

What Is A Distributed Database

Uploaded by

gaurav621.961

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

What Is A Distributed Database

Uploaded by

gaurav621.961

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

What is a distributed database?

In the most basic terms, a distributed database is a database that stores data in multiple locations
instead of one location. This means that rather than putting all data on one server or on one
computer, data is placed on multiple servers or in a cluster of computers consisting of individual
nodes. These nodes are oftentimes geographically separate and may be physical computers or virtual
machines within a cloud database.

Illustration of cluster and nodes

The MongoDB cluster example above is one of many configuration possibilities available when
creating a distributed database. However, unlike traditional centralized databases, all distributed
databases share the common characteristic of spreading data across multiple locations (physical
and/or virtual) which improves data resiliency and availability. Sharding the data across multiple
locations allows for horizontal scaling as well.

Distributed database types

There are two distinct types of distributed databases: homogeneous databases and heterogeneous
databases.
Homogeneous distributed databases

In a homogeneous distributed database, the machines, nodes, servers, or sites store the same data,
use the same data model, work with the same operating system, and share the same distributed
database management system (DDBMS) or occasionally multiple types of DDBMS from the same
vendor.

Within homogenous distributed databases, there are two subsets: autonomous and non-
autonomous.

 Autonomous distributed databases: In an autonomous distributed database, nodes work on

their own with their own complete set of data, only requiring an application to facilitate
universal updates across all nodes or messaging between nodes.

 Non-autonomous distributed databases: In non-autonomous distributed databases, nodes

rely on a centralized database management system (DBMS) to coordinate data distribution,
communications, and all updates.

As a rule, homogeneous distributed databases offer significant data protection through redundancy
and simplified management due to the similarity of all nodes.

Heterogeneous distributed databases

In a heterogeneous distributed database, different machines or sites may house different data sets,
use different operating systems, contain different data schemas, and require software to facilitate
communication between machines. Further, different sites may not even be aware of the existence
of other sites.

Within heterogeneous distributed databases, there are two subsets: federated and unfederated.

 Federated distributed databases: In a federated distributed database, multiple nodes —

which are able to function completely on their own and may contain different data — can
work together and function as one entity. This means that when a query occurs, the system
determines which node is best equipped to respond and passes the query appropriately. This
process is sometimes referred to as data virtualization.

 Unfederated distributed databases: In an unfederated distributed database, each node

operates individually and there is a central application that manages the access to each
database in each node.

While more complex to manage, heterogeneous distributed databases offer more flexibility in terms
of data models, schema choices, and the types of data that can be stored than homogeneous
distributed databases.

How do distributed databases work?

As previously discussed, nodes are individual servers or computers that reside within a distributed
database system (e.g., computers, virtual machines, servers that share no physical components).
Each node stores a set of data and runs on distributed database management system software
(DDBMS). To determine which data will be stored amongst which nodes, the concept of data
distribution must be considered.
Data distribution

Proper data distribution is critical to the efficiency, security, and optimal user access in a distributed
database. This process, sometimes referred to as data partitioning, can be accomplished using two
different methods.

 Horizontal partitioning: Horizontal partitioning involves splitting data tables into rows across
multiple nodes.

 Vertical partitioning: Vertical partitioning splits tables into columns across multiple nodes.

(Source: Hazelcast.com, 2023)

The resulting data sets from horizontal or vertical partitioning of the original table are sometimes
referred to as shards.

Distributed database system communication

While nodes are able to fully function on their own, it is necessary for them to communicate with
other nodes as well since, unlike centralized databases, they do not share the same physical
components or even the same data sets. There are three types of distributed database
communication:
 Broadcast communication: One message is sent to all other nodes within the distributed
database system.

 Multicast communication: One message is sent to some but not all other nodes within the
distributed database system.

 Unicast communication: A message is sent from an individual node to one other individual
node.

Transaction management

Distributed databases must often support distributed transactions, where one transaction can
involve more than one node. This support methodology is highlighted in the ACID properties
(atomicity, consistency, isolation, durability) of transactions across distributed database systems. Key
elements of ACID properties include:

(Source: Dev.to, 2020)

 Atomicity means that a transaction is treated as a single unit. This also means that either a
complete transaction is available for storage or it's rejected as an error which ensures data
integrity.

 Consistency is maintained in distributed database systems by enforcing predefined rules and

data constraints. If the state, nature, or content of a transaction violates these rules, the
transaction will not be ingested and stored in the distributed system.

 Isolation involves the separation of each transaction from the other transactions to prevent
data conflicts and maintain data integrity. In addition, this benefits operations when
managing multiple distributed data records that may exist across local data stores, virtual
machines via cloud computing, and multiple database nodes which may be located across
multiple sites.

 Durability ensures that stored data is preserved in the event of a system failure. There are a
variety of ways that a transactional distributed database management system accomplishes
this task, including:

Fault tolerance

Because distributed database systems are more likely to experience failures or operations
interruptions than centralized databases (e.g., due to multiple sites or a suboptimal file system),
strong fault tolerance processes are essential to maintain access reliability and effective database
operations. With that said, the number of individual components that distributed systems are able to
preserve removes the risk of a single point of failure.

Some common fault tolerance processes include data replication, backup protocols, continuous
failure detection, data checksums, load balancing, and query optimization.

Data replication

Data replication is the process by which multiple copies of data are maintained across different
nodes, servers, or sites. There are different types of database replication schema to choose from,
including:

 Full replication: In full replication, a complete, functional copy of the entire database is sent
to all sites within the distributed database system. Database copy updates are provided on a
routine schedule. There are two subtypes of full replication, as well.

 Transactional replication: In transactional replication, a full and complete database

copy is provided to each node, and then data changes are updated to that copy as
transaction processing occurs, often in real-time.

 Snapshot replication: Using snapshot replication, a copy of the database at a specific

point in time is captured. This snapshot is then distributed across nodes and the user
base as needed but does not consistently monitor for data changes. For this reason,
snapshot database replication is only recommended for infrequently changing
content.

 Partial replication: In some cases, certain nodes only require specific portions of the
database, so a defined portion of the database is replicated to a select group. In this type of
data replication, any number of nodes or sites can receive the replication.

 Merge replication: As its name indicates, merge database replication is the merging of two
databases into one. This is the most complex of the database replication types.
Backup protocols

Through a consistent program of automated data backups, data integrity and database systems
availability can be maintained without overburdening organizational employees. Some of the most
common solutions in the marketplace include backup software from Veeam, Druva, and Commvault.

Three of the most used types of backup for distributed databases include:

 Full backup: The entire database is copied and stored every time a database backup is
executed.

 Differential backup: Only the changes made since the last full backup are copied and stored.

 Incremental backup: Incremental backups do not require a previous full backup — they can
save changes since the previous differential or incremental backup.

Continuous failure detection

As with any system, it is critical for distributed database systems to be continuously monitored for
system failures — whether they be technical issues, natural disasters, or cyberattacks. Just a few of
the ways this monitoring is accomplished include:

 Heartbeating: In heartbeating, each node sends out a signal (heartbeat) to other nodes to
verify it's operational. If that signal isn't received, a failure message is created and further
investigation of that node's operations by system administration is undertaken.

 Watchdog timers: Individual nodes will have watchdog timers that are focused on a specific
activity or process. If the timer expires without the activity or process being completed, a
failure message is generated indicating further investigation is required.

 Data checksums: In order to identify data tampering or other issues with data transmission,
when a data transmission is sent, it is assigned a certain value (or checksum). When that
transmission is received, it is also assigned a checksum. By using software to verify that both
the sender and receiver have equivalent checksums for that transmission, issues with data
transmission integrity can be quickly identified.

Load Balancing

Load balancing techniques distribute user requests and queries evenly across database nodes. This
not only improves performance but also ensures that the failure of one node does not cause an
overload on others.

Usually, load balancing software is deployed as the intermediary between the applications or
database users. When a query is received, the load balancer will evaluate the request and determine
which node(s) are best equipped to respond. During this evaluation, such factors as proximity,
current load, and other predetermined system rules will be considered. This evaluation and
assignment helps the system avoid system overload and system inefficiency which can result in long
wait times for users.
Query optimization

Distributed databases use query optimization techniques to distribute queries efficiently across
nodes while minimizing data transfer traffic between nodes. One of the ways this is accomplished is
through cost-based query optimization. This form of query optimization considers the most efficient
execution for the query, with such factors as query complexity, available data, and the location of the
site containing that data.

Benefits and challenges distributed databases offer

As with any type of database solution, there are both benefits and challenges. Here is a brief
summary to consider when researching distributed databases for your organization.

Distributed database benefits

 Flexibility: Flexibility of data structures and schemas used within a distributed database (e.g.,
heterogeneous) are a significant benefit for organizations with a variety of data asset types
and processing requirements.

 Resiliency: Because distributed databases locate data across multiple nodes in the
distributed system, the risk of a single point of failure is significantly reduced.

 Scalability: Distributed databases can easily scale up (or down) by simply adjusting the
number of nodes in the database, making them ideal for growing organizations.

 Improved performance: Distributed databases are able to use load balancing and query
optimization to improve overall database performance while reducing user wait times.

 High availability: Fault tolerance (e.g., data replication, continuous failure detection) provide
high system availability for users.

Distributed database challenges

 Complexity: Because there are more moving parts to distributed databases vs. centralized
databases, they can be more complex to both design and manage. The Atlas developer data
platform simplifies this dramatically by providing a single UI/API to control and manage
secure MongoDB distributed systems at scale.

 Latency: If not managed properly, latency can occur when users query data from multiple
nodes.
 Data consistency: Since distributed databases are able to employ multiple data schemas and
structures, maintaining data consistency requiresmore effort than traditional databases. In
addition, if there is a hardware or network failure, data restoration can be more complex.

 Cost: Distributed databases can be more expensive due to the added complexity that their
greater flexibility brings. In addition, there may be additional networking costs since they
tend to have more sites and hardware than traditional databases.

Distributed Database Concepts
No ratings yet
Distributed Database Concepts
52 pages
Mo Nurse Practice Act
No ratings yet
Mo Nurse Practice Act
107 pages
DB unit-2
No ratings yet
DB unit-2
27 pages
Distributed Database
No ratings yet
Distributed Database
12 pages
Distributed Database-Chapter 3
No ratings yet
Distributed Database-Chapter 3
26 pages
Unit 2 DDMS
No ratings yet
Unit 2 DDMS
26 pages
Distributed DB
No ratings yet
Distributed DB
16 pages
Advanced Data Base Management Systems
No ratings yet
Advanced Data Base Management Systems
35 pages
MC4202 - Adavanced Database Technology
No ratings yet
MC4202 - Adavanced Database Technology
159 pages
Tybca Recent Trends in It Chpter 1
No ratings yet
Tybca Recent Trends in It Chpter 1
16 pages
Module 3 ADS
No ratings yet
Module 3 ADS
17 pages
Adt Unit I
No ratings yet
Adt Unit I
18 pages
ADT Notes
No ratings yet
ADT Notes
36 pages
Unit V NoSQL Databases
No ratings yet
Unit V NoSQL Databases
124 pages
Adt Unitnotes 1to3
No ratings yet
Adt Unitnotes 1to3
107 pages
Unit - 2 (1) DBMS
No ratings yet
Unit - 2 (1) DBMS
25 pages
Lefikir PowerPoint
No ratings yet
Lefikir PowerPoint
15 pages
Distributed Database System
No ratings yet
Distributed Database System
4 pages
Unit 5
No ratings yet
Unit 5
28 pages
Distributed Database
No ratings yet
Distributed Database
12 pages
Unit 2-DBP
No ratings yet
Unit 2-DBP
44 pages
Distributed Data Model
No ratings yet
Distributed Data Model
11 pages
Dd Mid Answers
No ratings yet
Dd Mid Answers
29 pages
CH.4
No ratings yet
CH.4
16 pages
Distributed Database Systems
No ratings yet
Distributed Database Systems
50 pages
Distributed Database Systems: January 2002
No ratings yet
Distributed Database Systems: January 2002
25 pages
Distributed DBMS - Database Environments
No ratings yet
Distributed DBMS - Database Environments
7 pages
SQL Unit 3 Distributed DB
No ratings yet
SQL Unit 3 Distributed DB
10 pages
Distributed Database Vs Conventional Database
50% (2)
Distributed Database Vs Conventional Database
4 pages
8 Distributed Databases
No ratings yet
8 Distributed Databases
13 pages
Module 1
No ratings yet
Module 1
24 pages
Distributed Databases
No ratings yet
Distributed Databases
39 pages
Midterm Elective Database Notes
No ratings yet
Midterm Elective Database Notes
14 pages
Week 6 - Global Serialization
No ratings yet
Week 6 - Global Serialization
4 pages
Distrubuted Database Concept
No ratings yet
Distrubuted Database Concept
22 pages
Team:DBMS: by Navdeep Kaur Assistant Professor Computer Science Department
No ratings yet
Team:DBMS: by Navdeep Kaur Assistant Professor Computer Science Department
19 pages
DDB.NOTES
No ratings yet
DDB.NOTES
19 pages
Chapter 4 - Distributed Database System
No ratings yet
Chapter 4 - Distributed Database System
52 pages
CS3492-DBMS unit-5
No ratings yet
CS3492-DBMS unit-5
9 pages
Unit-Iii Distributed Database: System
No ratings yet
Unit-Iii Distributed Database: System
55 pages
Chapter 6 Distributed System Management
No ratings yet
Chapter 6 Distributed System Management
12 pages
21 Distributed
No ratings yet
21 Distributed
6 pages
22-distributed
No ratings yet
22-distributed
6 pages
Distributed Databases
No ratings yet
Distributed Databases
17 pages
Distributed Database - Unit 5
No ratings yet
Distributed Database - Unit 5
4 pages
Distributed database system
No ratings yet
Distributed database system
5 pages
ADS Chapter 7 Distributed Database
No ratings yet
ADS Chapter 7 Distributed Database
16 pages
Distributed Databases: Daniel Marcous
No ratings yet
Distributed Databases: Daniel Marcous
41 pages
DISTRIBUTED DATABASES Presentation
No ratings yet
DISTRIBUTED DATABASES Presentation
13 pages
A It Report Final
No ratings yet
A It Report Final
15 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Data Base Ppt.... Dbms
No ratings yet
Data Base Ppt.... Dbms
8 pages
Database Management System
From Everand
Database Management System
Knowledge Flow
No ratings yet
Distributed Database
No ratings yet
Distributed Database
9 pages
DDS Unit - 1-1
No ratings yet
DDS Unit - 1-1
22 pages
Practical No. 1: Aim: Study About Distributed Database System. Theory
No ratings yet
Practical No. 1: Aim: Study About Distributed Database System. Theory
22 pages
Distributed DBMS
No ratings yet
Distributed DBMS
6 pages
1 Distributed DB
No ratings yet
1 Distributed DB
67 pages
Distributed Databases and Client Server Architectures
No ratings yet
Distributed Databases and Client Server Architectures
14 pages
Types of Distributed Databases.: Homogeneous Distributed Databases System Heterogeneous Distributed Database System
No ratings yet
Types of Distributed Databases.: Homogeneous Distributed Databases System Heterogeneous Distributed Database System
22 pages
CHAP13
No ratings yet
CHAP13
30 pages
Motion For Reconsideration 1ST Draft Joel Libardo Et Al
No ratings yet
Motion For Reconsideration 1ST Draft Joel Libardo Et Al
9 pages
Mech JRF ISRO 2017 v2
No ratings yet
Mech JRF ISRO 2017 v2
4 pages
1 Jose Mujica
No ratings yet
1 Jose Mujica
9 pages
Cagills
No ratings yet
Cagills
41 pages
ICPEEB Handbook For Detailed Version
No ratings yet
ICPEEB Handbook For Detailed Version
142 pages
Confirmation For Booking ID # 1255833768
No ratings yet
Confirmation For Booking ID # 1255833768
2 pages
Course List
No ratings yet
Course List
6 pages
13883-Article Text-53516-3-10-20231228
No ratings yet
13883-Article Text-53516-3-10-20231228
7 pages
Caroling Letter To Registrars AVRC Clinic AGSO
No ratings yet
Caroling Letter To Registrars AVRC Clinic AGSO
4 pages
Atividade de Ingles Halloween 03-11-2021
100% (1)
Atividade de Ingles Halloween 03-11-2021
3 pages
An Understanding of Collective Agreements, Trade Disputes and Industrial Actions
100% (1)
An Understanding of Collective Agreements, Trade Disputes and Industrial Actions
26 pages
Sona Karar Anr V The Howrah Municipal Corporation Ors 414457
No ratings yet
Sona Karar Anr V The Howrah Municipal Corporation Ors 414457
5 pages
Form Konfirmasi Kuliah Pembekalan KKNT Inovasi IPB Periode Des 2023 - Jan 2024
No ratings yet
Form Konfirmasi Kuliah Pembekalan KKNT Inovasi IPB Periode Des 2023 - Jan 2024
7 pages
Term Interest Course Number Course Name: Winter21 BU661 International Strategy
No ratings yet
Term Interest Course Number Course Name: Winter21 BU661 International Strategy
10 pages
'The Seven Habits of Highly Effective People, First Published in
No ratings yet
'The Seven Habits of Highly Effective People, First Published in
9 pages
Refunds: S. No. Particulars (')
No ratings yet
Refunds: S. No. Particulars (')
9 pages
Szókincs Nyelvvizsgára És Érettségire - Sports-The Olympic Games
No ratings yet
Szókincs Nyelvvizsgára És Érettségire - Sports-The Olympic Games
3 pages
BSBPMG430 Undertake Project Work: Task Summary
No ratings yet
BSBPMG430 Undertake Project Work: Task Summary
16 pages
BDP Manifesto
No ratings yet
BDP Manifesto
56 pages
CBS Jan 24 2024
No ratings yet
CBS Jan 24 2024
27 pages
Cost Accounting 12-8
100% (1)
Cost Accounting 12-8
3 pages
Introduction to the Anglo-Saxon Period in English Literature
No ratings yet
Introduction to the Anglo-Saxon Period in English Literature
15 pages
Constable, Thomas L. - Commentary On Ezekiel
No ratings yet
Constable, Thomas L. - Commentary On Ezekiel
235 pages
HTTP Tunnel
No ratings yet
HTTP Tunnel
6 pages
Larry Low EQi Client
No ratings yet
Larry Low EQi Client
21 pages
The_Help_assignment_104
No ratings yet
The_Help_assignment_104
8 pages
Verbos Regulares
No ratings yet
Verbos Regulares
13 pages
Investment Proposal Paper Ayala Corporation: Don Bosco Technical Institute Makati Accountancy Business and Management
No ratings yet
Investment Proposal Paper Ayala Corporation: Don Bosco Technical Institute Makati Accountancy Business and Management
9 pages
Peni Parker, The Eternal Outcast
No ratings yet
Peni Parker, The Eternal Outcast
14 pages