Distributed DBMS - Quick Guide
Distributed DBMS - Quick Guide
SQLProgramming
HTML Languages
CSS Javascript Python
Python Technologies
Software Testing
Cyber Security
All Categories
Distributed DBMS -
Concepts
For proper functioning of any organization,
there’s a need for a well-maintained
database. In the recent past, databases used
to be centralized in nature. However, with the
increase in globalization, organizations tend
to be diversified across the globe. They may
choose to distribute data over local servers
instead of a central database. Thus, arrived
the concept of Distributed Databases.
MySQL
Oracle
SQL Server
dBASE
FoxPro
PostgreSQL, etc.
Database Schemas
A database schema is a description of the
database which is specified during database
design and subject to infrequent alterations.
It defines the organization of the data, the
relationships among them, and the
constraints associated with them.
Types of DBMS
There are four types of DBMS.
Hierarchical DBMS
Network DBMS
Relational DBMS
Distributed DBMS
Operations on DBMS
UPDATE STUDENT
SET STREAM = 'ELECTRONICS AND COMMUNICAT
WHERE STREAM = 'ELECTRONICS';
Distributed DBMS -
Distributed Databases
This chapter introduces the concept of
DDBMS. In a distributed database, there are a
number of databases that may be
geographically distributed all over the world.
A distributed DBMS manages the distributed
database in a manner so that it appears as
one single database to users. In the later part
of the chapter, we go on to study the factors
that lead to distributed databases, its
advantages and disadvantages.
Features
A distributed database
incorporates transaction
processing, but it is not
synonymous with a transaction
processing system.
Features
Distributed Nature of
Organizational Units − Most
organizations in the current times
are subdivided into multiple units
that are physically distributed over
the globe. Each unit requires its
own set of local data. Thus, the
overall database of the
organization becomes distributed.
Advantages of Distributed
Databases
Following are the advantages of distributed
databases over centralized databases.
Adversities of Distributed
Databases
Following are some of the adversities
associated with distributed databases.
Distributed DBMS -
Database Environments
In this part of the tutorial, we will study the
different aspects that aid in designing
distributed database environments. This
chapter starts with the types of distributed
databases. Distributed databases can be
classified into homogeneous and
heterogeneous databases having further
divisions. The next section of this chapter
discusses the distributed architectures namely
client – server, peer – to – peer and multi –
DBMS. Finally, the different design
alternatives like replication and fragmentation
are introduced.
Homogeneous Distributed
Databases
Non-autonomous − Data is
distributed across the
homogeneous nodes and a central
or master DBMS co-ordinates data
updates across the sites.
Heterogeneous Distributed
Databases
Types of Heterogeneous
Distributed Databases
Architectural Models
Some of the common architectural models are
−
Multi-database Conceptual
Level − Depicts integrated multi-
database that comprises of global
logical multi-database structure
definitions.
Design Alternatives
The distribution design alternatives for the
tables in a DDBMS are as follows −
Fully replicated
Partially replicated
Fragmented
Mixed
Fully Replicated
Partially Replicated
Fragmented
Vertical fragmentation
Horizontal fragmentation
Hybrid fragmentation
Mixed Distribution
Data Replication
Data replication is the process of storing
separate copies of the database at two or
more sites. It is a popular fault tolerance
technique of distributed databases.
Simpler Transactions −
Transactions require less number
of joins of tables located at
different sites and minimal
coordination across the network.
Thus, they become simpler in
nature.
Increased Storage
Requirements − Maintaining
multiple copies of data is
associated with increased storage
costs. The storage space required
is in multiples of the storage
required for a centralized system.
Undesirable Application –
Database coupling − If complex
update mechanisms are not used,
removing data inconsistency
requires complex co-ordination at
application level. This results in
undesirable application – database
coupling.
Snapshot replication
Near-real-time replication
Pull replication
Fragmentation
Fragmentation is the task of dividing a table
into a set of smaller tables. The subsets of the
table are called fragments. Fragmentation
can be of three types: horizontal, vertical, and
hybrid (combination of horizontal and
vertical). Horizontal fragmentation can further
be classified into two techniques: primary
horizontal fragmentation and derived
horizontal fragmentation.
Advantages of Fragmentation
Disadvantages of Fragmentation
In case of recursive
fragmentations, the job of
reconstruction will need expensive
techniques.
Vertical Fragmentation
In vertical fragmentation, the fields or
columns of a table are grouped into
fragments. In order to maintain
reconstructiveness, each fragment should
contain the primary key field(s) of the table.
Vertical fragmentation can be used to enforce
privacy of data.
STUDENT
Horizontal Fragmentation
Horizontal fragmentation groups the tuples of
a table in accordance to values of one or more
fields. Horizontal fragmentation should also
confirm to the rule of reconstructiveness.
Each horizontal fragment must have all
columns of the original base table.
CREATE COMP_STD AS
SELECT * FROM STUDENT
WHERE COURSE = "Computer Science";
Hybrid Fragmentation
In hybrid fragmentation, a combination of
horizontal and vertical fragmentation
techniques are used. This is the most flexible
fragmentation technique since it generates
fragments with minimal extraneous
information. However, reconstruction of the
original table is often an expensive task.
DDBMS - Distribution
Transparency
Distribution transparency is the property of
distributed databases by the virtue of which
the internal details of the distribution are
hidden from the users. The DDBMS designer
may choose to fragment tables, replicate the
fragments and store them at different sites.
However, since users are oblivious of these
details, they find the distributed database
easy to use like any centralized database.
Location transparency
Fragmentation transparency
Replication transparency
Location Transparency
Location transparency ensures that the user
can query on any table(s) or fragment(s) of a
table as if they were stored locally in the
user’s site. The fact that the table or its
fragments are stored at remote site in the
distributed database system, should be
completely oblivious to the end user. The
address of the remote site(s) and the access
mechanisms are completely hidden.
Fragmentation Transparency
Fragmentation transparency enables users to
query upon any table as if it were
unfragmented. Thus, it hides the fact that the
table the user is querying on is actually a
fragment or union of some fragments. It also
conceals the fact that the fragments are
located at diverse sites.
Replication Transparency
Replication transparency ensures that
replication of databases are hidden from the
users. It enables users to query upon a table
as if only a single copy of the table exists.
Combination of Transparencies
In any distributed database system, the
designer should ensure that all the stated
transparencies are maintained to a
considerable extent. The designer may choose
to fragment tables, replicate them and store
them at different sites; all oblivious to the end
user. However, complete distribution
transparency is a tough task and requires
considerable design efforts.
Distributed DBMS -
Database Control
Database control refers to the task of
enforcing regulations so as to provide correct
data to authentic users and applications of a
database. In order that correct data is
available to users, all data should conform to
the integrity constraints defined in the
database. Besides, data should be screened
away from unauthorized users so as to
maintain security and privacy of the database.
Database control is one of the primary tasks
of the database administrator (DBA).
Authentication
Access rights
Integrity constraints
Authentication
In a distributed database system,
authentication is the process through which
only legitimate users can gain access to the
data resources.
Access Rights
A user’s access rights refers to the privileges
that the user is given regarding DBMS
operations such as the rights to create a
table, drop a table, add/delete/update tuples
in a table or query upon the table.
Query Processing
Query processing is a set of all activities
starting from query placement to displaying
the results of the query. The steps are as
shown in the following diagram −
Relational Algebra
Relational algebra defines the basic set of
operations of relational database model. A
sequence of relational algebra operations
forms a relational algebra expression. The
result of this expression represents the result
of a database query.
Projection
Selection
Union
Intersection
Minus
Join
Projection
STUDENT
Amit
2 BCA 1 Male
Prasad
Varsha
4 BCA 1 Fema
Tiwari
Joe
6 MCA 1 Male
Wallace
Shivani
8 BCA 1 Fema
Iyengar
Selection
πName
(σGender="Female"AND Course="BCA" (ST U DEN T )
)
F emaleBCAStudent
← σGender="Female"AND Course="BCA" (ST U DEN T )
Result ← πName (F emaleBCAStudent)
Union
BCAStudent
← σCourse="BCA" (ST U DEN T )
Intersection
EMPLOYEE
PROJECT
Minus
P rojectDept
← πDepartment (σStatus="ongoing" (P ROJECT )
)
Join
CUSTOMER
BRANCH
Result ← CU ST OM ER
⋈Customer.BranchID=Branch.BranchID BRAN CH
Examples
EMPLOYEE
PROJECT
WORKS
Example 1
σSalary<AvgSal> EM P LOY EE
Example 2
ArunEmpID
← πEmpID
(σName="ArunKumar" (EM P LOY EE))
ArunP ID
← πPID (σEmpID="ArunEmpID" (W ORKS))
Result
← πPID,Status (σPID="ArunPID" (P ROJECT )
)
Operator type
Available memory
Disk structures
Computation of Selection
Computation of selection operation depends
upon the complexity of the selection condition
and the availability of indexes on the
attributes of the table.
Computation of Joins
When we want to join two tables, say P and
Q, each tuple in P has to be compared with
each tuple in Q to test if the join condition is
satisfied. If the condition is satisfied, the
corresponding tuples are concatenated,
eliminating duplicate fields and appended to
the result relation. Consequently, this is the
most expensive operation.
Nested-loop Approach
Sort-merge Approach
Hash-join Approach
Query Optimization in
Centralized Systems
Once the alternative access paths for
computation of a relational algebra expression
are derived, the optimal access path is
determined. In this chapter, we will look into
query optimization in centralized system while
in the next chapter we will study query
optimization in a distributed system.
Increase parallelism.
EMPLOYEE
DEPARTMENT
Example 1
Example 2
Query Optimization in
Distributed Systems
This chapter discusses query optimization in
distributed database system.
PROJECT
Query trading.
Query Trading
DDBMS - Transaction
Processing Systems
This chapter discusses the various aspects of
transaction processing. We’ll also study the
low level tasks included in a transaction, the
transaction states and properties of a
transaction. In the last portion, we will look
over schedules and serializability of
schedules.
Transactions
A transaction is a program including a
collection of database operations, executed as
a logical unit of data processing. The
operations performed in a transaction include
one or more of database operations like
insert, delete, update or retrieve data. It is an
atomic process that is either performed into
completion entirely or is not performed at all.
A transaction involving only data retrieval
without any data update is called read-only
transaction.
begin_transaction − A marker
that specifies start of transaction
execution.
read_item or write_item −
Database operations that may be
interleaved with main memory
operations as a part of
transaction.
end_transaction − A marker
that specifies end of transaction.
Transaction States
A transaction may go through a subset of five
states, active, partially committed,
committed, failed and aborted.
Desirable Properties of
Transactions
Any transaction must maintain the ACID
properties, viz. Atomicity, Consistency,
Isolation, and Durability.
Consistency − A transaction
should take the database from one
consistent state to another
consistent state. It should not
adversely affect any data item in
the database.
Durability − If a committed
transaction brings about a change,
that change should be durable in
the database and not lost in case
of any failure.
Types of Schedules
Conflicts in Schedules
Serializability
A serializable schedule of ‘n’ transactions is
a parallel schedule which is equivalent to a
serial schedule comprising of the same ‘n’
transactions. A serializable schedule contains
the correctness of serial schedule while
ascertaining better CPU utilization of parallel
schedule.
Equivalence of Schedules
Distributed DBMS -
Controlling Concurrency
Concurrency controlling techniques ensure
that multiple transactions are executed
simultaneously while maintaining the ACID
properties of the transactions and
serializability in the schedules.
Distributed Timestamp
Concurrency Control
Conflict Graphs
Distributed DBMS -
Deadlock Handling
This chapter overviews deadlock handling
mechanisms in database systems. We’ll study
the deadlock handling mechanisms in both
centralized and distributed database system.
Deadlock prevention.
Deadlock avoidance.
Deadlock Prevention
Deadlock Avoidance
Transaction Location
Transaction Control
Distributed Wound-Die
If T1 is younger than
T2, T1 is aborted. The
concurrency control at
Site P sends a
message to all sites
where T1 has visited
to abort T1. The
controlling site notifies
the user when T1 has
been successfully
aborted in all the sites.
Distributed Wait-Wait
If T1 is younger than
T1, T1 is allowed to
wait. T1 can resume
execution after Site P
receives a message
that T2 has completed
processing.
Distributed DBMS -
Replication Control
This chapter looks into replication control,
which is required to maintain consistent data
in all sites. We will study the replication
control techniques and the algorithms
required for replication control.
Begin T1:
Update PROJECT Set PLocation = 'Mumba
Where PLocation = 'Bombay';
End T1;
Begin T1:
Begin T1A :
Update PROJECT Set PLocation = 'Mu
Where PLocation = 'Bombay';
End T1A;
Begin T2A :
Update PROJECT Set PLocation = 'Mu
Where PLocation = 'Bombay';
End T2A;
End T1;
Transaction
acceptance/rejection phase −
When a transaction enters the
transaction monitor of a slave site,
the slave site sends a request to
the master site. The master site
checks for conflicts. If there aren’t
any conflicts, the master sends an
“ACK+” message to the slave site
which then starts the transaction
application phase. Otherwise, the
master sends an “ACK-” message
to the slave which then rejects the
transaction.
Distributed transaction
acceptance phase − When a
transaction enters the transaction
manager of a site, it sends a
transaction request to all other
sites. On receiving a request, a
peer site resolves conflicts using
priority based voting rules. If all
the peer sites are “OK” with the
transaction, the requesting site
starts application phase. If any of
the peer sites does not “OK” a
transaction, the requesting site
rejects the transaction.
Distributed transaction
application phase − Upon
entering this phase, the site where
the transaction has entered,
broadcasts a request to all slaves
for executing the transaction. On
receiving the requests, the peer
slaves execute the transaction and
send an “ACK” message to the
requesting slave on completion.
After the requesting slave has
received “ACK” messages from all
its peers, it lets the transaction
manager know that the
transaction has been completed.
Transaction
acceptance/rejection phase −
If the requesting site receives a
majority “OK” on the transaction,
it accepts the transaction and
broadcasts “ACCEPT” to all the
sites. Otherwise, it broadcasts
“REJECT” to all the sites and
rejects the transaction.
Soft Failure
Soft failure is the type of failure that causes
the loss in volatile memory of the computer
and not in the persistent storage. Here, the
information stored in the non-persistent
storage like main memory, buffers, caches or
registers, is lost. They are also known as
system crash. The various types of soft
failures are as follows −
Power failure.
Hard Failure
A hard failure is the type of failure that causes
loss of data in the persistent or non-volatile
storage like disk. Disk failure may cause
corruption of data in some disk blocks or
failure of the total disk. The causes of a hard
failure are −
Power failure.
Faults in media.
Read-write malfunction.
Network Failure
Network failures are prevalent in distributed
or network databases. These comprises of the
errors induced in the database system due to
the distributed nature of the data and
transferring data over the network. The
causes of network failure are as follows −
Network congestion.
Site failures.
Network partitioning.
Commit Protocols
Any database system should guarantee that
the desirable properties of a transaction are
maintained even after failures. If a failure
occurs during the execution of a transaction,
it may happen that all the changes brought
about by the transaction are not committed.
This makes the database inconsistent.
Commit protocols prevent this scenario using
either transaction undo (rollback) or
transaction redo (roll forward).
Commit Point
Transaction Undo
Transaction Redo
Transaction Log
A transaction log is a sequential file that
keeps track of transaction operations on
database items. As the log is sequential in
nature, it is processed sequentially either
from the beginning or from the end.
Distributed DBMS -
Database Recovery
In order to recuperate from database failure,
database management systems resort to a
number of recovery management techniques.
In this chapter, we will study the different
approaches for database recovery.
Checkpointing
Checkpoint is a point of time at which a
record is written onto the database from the
buffers. As a consequence, in case of a
system crash, the recovery manager does not
have to redo the transactions that have been
committed before checkpoint. Periodical
checkpointing shortens the recovery process.
Consistent checkpointing
Fuzzy checkpointing
Consistent Checkpointing
Fuzzy Checkpointing
Example of Checkpointing
One-phase commit
Two-phase commit
Three-phase commit
Threats in a Database
Confidentiality loss −
Confidentiality loss occurs due to
unauthorized or unintentional
disclosure of confidential
information. It may result in illegal
actions, security threats and loss
in public confidence.
Measures of Control
The measures of control can be broadly
divided into the following categories −
What is Cryptography?
Cryptography is the science of encoding
information before sending via unreliable
communication paths so that only an
authorized receiver can decode and use it.
Digital Signatures
A Digital Signature (DS) is an authentication
technique based on public key cryptography
used in e-commerce applications. It
associates a unique mark to an individual
within the body of his message. This helps
others to authenticate valid senders of
messages.
DDBMS - Security in
Distributed Databases
A distributed system needs additional security
measures than centralized system, since
there are many users, diversified data,
multiple sites and distributed control. In this
chapter, we will look into the various facets of
distributed database security.
Communications Security
In a distributed database, a lot of data
communication takes place owing to the
diversified location of data, users and
transactions. So, it demands secure
communication between users and databases
and between the different database
environments.
Data Security
In distributed systems, it is imperative to
adopt measure to secure data apart from
communications. The data security measures
are −
Authentication and
authorization − These are the
access control measures adopted
to ensure that only authentic
users can use the database. To
provide authentication digital
certificates are used. Besides,
login is restricted through
username/password combination.
Internal to distributed
database approach:
The user applications
encrypt the data and
then store the
encrypted data in the
database. For using
the stored data, the
applications fetch the
encrypted data from
the database and then
decrypt it.
External to distributed
database: The
distributed database
system has its own
encryption capabilities.
The user applications
store data and retrieve
them without realizing
that the data is stored
in an encrypted form
in the database.
Data Auditing
A database security system needs to detect
and monitor security violations, in order to
ascertain the security measures it should
adopt. It is often very difficult to detect
breach of security at the time of occurrences.
One method to identify security violations is
to examine audit logs. Audit logs contain
information such as −
Advertisements
TOP TUTORIALS
Python Tutorial
Java Tutorial
C++ Tutorial
C Programming Tutorial
C# Tutorial
PHP Tutorial
R Tutorial
HTML Tutorial
CSS Tutorial
JavaScript Tutorial
SQL Tutorial
TRENDING TECHNOLOGIES
Git Tutorial
Docker Tutorial
Kubernetes Tutorial
DSA Tutorial
SDLC Tutorial
Unix Tutorial
CERTIFICATIONS
DevOps Certification
Online C Compiler
Online C# Compiler