0% found this document useful (0 votes)
10 views32 pages

P01 - IntroToDistributedDatabase

The document provides an introduction to Distributed Database Systems (DDBS), detailing their structure, benefits, and challenges. It explains the integration of database systems with computer networks, emphasizing the importance of transparency, reliability, and performance in distributed environments. Additionally, it outlines various network topologies and the complexities involved in managing distributed databases.

Uploaded by

hendy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views32 pages

P01 - IntroToDistributedDatabase

The document provides an introduction to Distributed Database Systems (DDBS), detailing their structure, benefits, and challenges. It explains the integration of database systems with computer networks, emphasizing the importance of transparency, reliability, and performance in distributed environments. Additionally, it outlines various network topologies and the complexities involved in managing distributed databases.

Uploaded by

hendy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

1

Introduction
to Distributed
Database
Indra Maryati, S.Kom.
maryati@stts.edu
Agenda
» Distributed data processing
» What is a distributed database system
» Promises of distributed database
systems

2
DDBS = Database + Network
» The technology of computer networks,
promotes a mode of work that goes against all
centralization efforts and facilitates distributed
computing
» Distributed database system technology is the
union of what appear to be diametrically
opposed approaches to data processing:
Database System, Computer Network
technologies
» A database system aims at integrating the
operational data of an enterprise, and to provide
a centralized and controlled access to that data 3
Distributed Computing System
» A distributed computing system consists
of
˃ a number of autonomous processing
elements (not necessarily homogeneous)
˃ interconnected by a computer network
˃ cooperate in performing their assigned tasks
» What is distributed?
All these are necessary
˃ Processing Logic and important for
˃ Function distributed database
technology
˃ Data 4
˃ Control
What is a True Distributed System?
» “A system that runs on a collection of
machines that do not have shared
memory, yet looks to its users like a
single computer”

5
Computer Networks
» Computers (hosts or sites) which are capable
of performing autonomous work are
connected by a communication network
» The communication network constitutes of
communication links of various kinds
(telephone lines, coaxial cables, satellite links,
etc.) and computers dedicated to
communication function (not hosts)
» A communication network facilitates a
process running at one site to send a message
to a process running at any other site of the 6
network (point to point)
Computer Networks
» A communication network facilitates any two sites
of the network to communicate independent of
its structure
» Parameters of communication network that are
relevant for distributed DBMSs
˃ Delay : the amount of time taken for a message to
reach its destination from the time it was sent from its
source
˃ Cost : consists of a fixed cost plus a variable cost
depending on the length of the message
˃ Reliability : of the network; i.e., the probability that a
message is correctly delivered at its destination
7
» Broadcast is the ability of a site to send the same
message to processes at all other sites.
Network Topology - Mesh
» Fully connected
» High cost, direct communication
between sites and high reliability

8
Network Topology - Irregular
» Partially connected
» Lower cost, not all sites directly
connected, less reliable than fully
connected

9
Network Topology - Tree
» Hierarchical
» Sites organized as a tree with
each node having an unique
parent and some children
» Lower cost than partially
connected topology
» Inter-node communication
through ancestor nodes
» If a parent fails, further
communication among
children is not possible
» Failure of any node or link
partitions the network into 10
disjoint subtrees
Network Topology - Star
» One site (central site)
connected to all the sites
» Cost is linear in the
number of sites
» All communication
through the central site
» Reliability depends on
the central site, since if it
fails, the network is
completely partitioned
11
Network Technology - Ring
» Each site is connected to
exactly two sites (neighbours)
» The ring can be unidirectional
or bidirectional
» Cost is linear in the number
of sites
» Communication cost can be
high
» In a unidirectional ring,
failure of one site or a link
partitions the network
» A bidirectional ring can 12
withstand two link failures
Network Topology – Multiaccess Bus
» A single shared link (the
bus) accessed by all the
sites
» Sites communicate with
each other through the
link
» Cost is linear in the
number of sites
» Communication cost is
low
» If the link fails, the
network is completely 13

partitioned
Distributed System Communications
» A distributed system relies on
communication, generally involving:
˃ Message Passing
+ When a client want to communicate with a server it
sends a message. The server replies with a response
message passing mechanism may be:
˃ Remote Procedure Calls
+ A reliable send followed by a get is very like a
procedure call A remote procedure call can be
provided by including a stub with the server and the
client The stub is responsible for dealing with the
communication between systems The use of the
stubs makes the client call just a local procedure call 14
and the server routine is called locally
Local Area Networks
» LANs usually cover a small geographical
area like a building or a few adjacent
buildings
» The communication links tend to be
high-speed (Gigabytes/sec) and a low
error rate
» Links are usually twisted pair, coaxial
cable or fiber optic
» Messages are usually sent in packets
15
Wide Area Networks
» WAN connects sites
that are physically
distributed over a wide
geographical area
» Typical communication
links are telephone
lines, microwave links
and satellite links
» WANs are generally
slower than LANs and 16
are more error prone
DS: Design Issues
» Must be transparent
» Provide flexibility
» Be reliable
˃ Design should not require the simultaneous functioning of a
substantial number of critical components
˃ More redundancy greater availability and greater inconsistency
˃ Fault tolerance, the ability to mask failures from the user
» Good performance
˃ The rest are useless without this
˃ Hard to measure
˃ Balance number of messages and grain size of distributed
computations
» Scalable
˃ A maxim for developing distributed systems
˃ Avoid centralized components, tables and algorithms 17
˃ Only decentralized algorithms should be used
Agenda
» Distributed data processing
» What is a distributed database system
» Promises of distributed database
systems

18
Distributed Database Systems
» A distributed database is a collection of
multiple, logically interrelated databases
distributed over a computer network;
stores data on multiple computers
(nodes) over the network and permits
access from any node to the joint data
» A distributed database management
system (DDBMS) is a software system
that permits the management of the
distributed databases and makes the 19
distribution transparent to the users.
Reasons for Data Distribution
» Several factors have led to the
development of DDBS:
˃ Distributed nature of some database
applications
˃ Increased reliability and availability
˃ Allowing data sharing while maintaining
some measure of local control
˃ Improved performance

20
Distributed DBMS Environment

21
What is not a Distributed Database
System?
» A DDBS is not a “collection of files'' that
can be individually stored at each node
of a computer network
˃ files are not logically related
˃ no access via common interface

22
Centralized DBMS on a Network
» data resides only at one node
» the database management is no
different from centralized DBMS
» remote processing, single server-
multiple clients

23
Distributed Database System
Technology
» The key is integration, not centralization
» Distributed database technology
attempts to achieve integration without
centralization

24
Example
» Multinational manufacturing company:
˃ head quarters in New York
˃ manufacturing plants in Chicago and
Montreal
˃ warehouses in Phoenix and Edmonton
˃ R&D facilities in San Francisco
» Data and Information:
˃ employee records (working location)
˃ projects (R&D)
˃ engineering data (manufacturing plants, R&D)
25
˃ inventory (manufacturing, warehouse)
Agenda
» Distributed data processing
» What is a distributed database system
» Promises of distributed database
systems

26
Promises of Distributed DBMS
» transparent management of
distributed, fragmented, and replicated
data
» improved reliability and availability
through distributed transactions
» improved performance
» higher system extendibility

27
Transparency
» Transparency refers to separation of the
higher-level semantics of a system from
lower-level implementation details.
» From data independence in centralized
DBMS to fragmentation transparency in
DDBMS.
» Issues
˃ Who should provide transparency?
˃ What is the state of the art in the industry?
28
Improved Reliability
» Distributed DBMS can use replicated
components to eliminate single point
failure.
» The users can still access part of the
distributed database with “proper care”
even though some of the data is
unreachable.
» Distributed transactions facilitate
maintenance of consistent database
state even when failures occur. 29
Improved Performance
» Since each site handles only a portion of
a database, the contention for CPU and
I/O resources is not that severe. Data
localization reduces communication
overheads.

30
Easier System Expansion
» Ability to add new sites, data, and users
over time without major restructuring.
» New applications (such as, supply chain)
are naturally distributed - centralized
systems will just not work.

31
Disadvantages of DDBSs
» Lack of Experience
˃ No operating true distributed database systems in existence
» Complexity
˃ DDBS problems are inherently more complex than
centralized DBMS ones
» Cost
˃ More hardware, software and people costs
» Distribution of control
˃ Problems of synchronization and coordination to maintain
data consistency
» Security
˃ Database security + network security
» Difficult to convert 32
˃ No tools to convert centralized DBMSs to DDBSs

You might also like