Chapter 1
Chapter 1
1
Definition of a Distributed System
Goals of a Distributed System
Types of Distributed Systems
2
A collection of independent computers that appears to its users as a single
coherent system.
Features:
◦ No shared memory – message-based communication
◦ Each runs its own local OS
◦ Heterogeneity
Ideal: to present a single-system image:
◦ The distributed system “looks like” a single computer rather than a
collection of separate computers.
3
To present a single-system image:
◦ Hide internal organization, communication details
◦ Provide uniform interface
Easily expandable (scalability)
◦ Adding new computers is hidden from users
Continuous availability (availability)
◦ Failures in one component can be covered by other components
Fault-Tolerant: The goal of fault tolerance is to avoid failures in the system
even in the presence of faults to provide uninterrupted service.
4
Redundancy: Redundancy is defined as those parts of the system that are not
needed for its correct functioning.
Hardware redundancy is achieved by adding extra hardware components to
system which take over the role of failed components in case some faults occur
in them.
Software redundancy includes extra instructions and code included for managing
the extra hardware components, and using them correctly for uninterrupted
service, in case of some component failure
time redundancy the same instruction is executed many times. This is used to
handle temporary faults in the system
5
Security: Distributed systems should allow communication between
programs/users/ resources on different computers by enforcing necessary
security arrangements.
The security features are mainly intended to provide confidentiality, integrity
and availability.
Confidentiality (privacy) is protection against disclosure to unauthorized person.
Ensuring that no one can read the message except the intended receiver.
Integrity provides protection against alteration and corruption.
Assuring that the original message has not been altered.
Availability keeps the resource accessible.
Supported by middleware
6
The middleware enables computers to coordinate their activities and to share
the resources of the system, so that users perceive the system as a single,
integrated computing facility.
Thus, middleware is the bridge that connects distributed applications across
dissimilar physical locations, with dissimilar hardware platforms, network
technologies, operating systems, and programming languages.
7
CORBA (Common Object Request Broker Architecture)
DCOM (Distributed Component Object Management) – being replaced by .net
Sun’s ONC RPC (Remote Procedure Call)
RMI (Remote Method Invocation)
SOAP (Simple Object Access Protocol)
All of the above examples support communication across a network:
They provide protocols that allow a program running on one kind of computer, using
one kind of operating system, to call a program running on another computer with a
different operating system
◦ The communicating programs must be running the same middleware.
8
Resource Accessibility
Distribution Transparency
Openness
Scalability
9
Goal 2 – Distribution Transparency
Software hides some of the details of the distribution of system
resources.
◦ Makes the system more user friendly.
A distributed system that appears to its users & applications to be a
single computer system is said to be transparent.
◦ Users & apps should be able to access remote resources in the same
way they access local resources.
Transparency has several dimensions.
10
Transparency Description
Access Hide differences in data representation & resource access
(enables interoperability)
Location Hide location of resource (can use resource without
knowing its location)
Migration Hide possibility that a system may change location of
resource (no effect on access)
Replication Hide the possibility that multiple copies of the resource
exist (for reliability and/or availability)
Concurrency Hide the possibility that the resource may be shared
concurrently
Failure Hide failure and recovery of the resource.
11
Goal 3 - Openness
An open distributed system “…offers services according to standard rules
that describe the syntax and semantics of those services.” In other words,
the interfaces to the system are clearly specified and freely available.
Interface Definition Languages (IDL): used to describe the interfaces
between software components, usually in a distributed system
Open Systems Support …
◦ Interoperability: the ability of two different systems or applications to
work together
12
◦ Portability: an application designed to run on one distributed system
can run on another system which implements the same interface.
◦ Extensibility: Easy to add new components, features
Goal 4 - Scalability
Dimensions that may scale:
dimensions.
13
Scalability is negatively affected when the system is based on
◦ Centralized server: one for all users
◦ Centralized data: a single data base for all users
◦ Centralized algorithms: one site collects all information, processes it,
distributes the results to all sites.
14
Scalability affects performance more than anything else.
Three techniques to improve scalability:
◦ Hiding communication latencies, Distribution, Replication.
1. Hiding communication latencies
Structure applications to use asynchronous communication (no blocking
for replies)
While waiting for one answer, do something else; e.g., create one thread to
wait for the reply and let other threads continue to process or schedule
an other task
15
2. Distribution
Instead of one centralized service, divide into parts and distribute
geographically.
3. Replication
Replication: multiple identical copies of something
Replication
◦ Increases availability
◦ Improves performance through load balancing
◦ May avoid latency by improving proximity of resource
16
Distributed Computing Systems
◦ Clusters
◦ Grids
◦ Clouds
Distributed Information Systems
◦ Transaction Processing Systems
◦ Enterprise Application Integration
Distributed Embedded Systems
◦ Home systems
◦ Health care systems
◦ Sensor networks
17
Cluster Computing
A collection of similar processors (PCs, workstations) running the
same operating system, connected by a high-speed LAN.
Parallel computing capabilities using inexpensive PC hardware
Replace big parallel computers
18
High Performance Clusters (HPC)
◦ run large parallel programs
◦ Scientific, military, engineering apps; e.g., weather modeling
Load Balancing Clusters
◦ Front end processor distributes incoming requests
◦ server farms (e.g., at banks or popular web site)
High Availability Clusters (HA)
◦ Provide redundancy – back up systems
◦ May be more fault tolerant than large mainframes
19
Highly heterogeneous with respect to hardware, software, networks,
security policies, etc.
Grids support virtual organizations: a collaboration of users who
pool resources (servers, storage, databases) and share them
Grid software is concerned with managing sharing across
administrative domains.
20
Similar to clusters but processors are more loosely coupled, tend to be
heterogeneous, and are not all in a central location.
Can handle workloads similar to those on supercomputers, but grid
computers connect over a network (Internet) and supercomputers’
CPUs connect to a high-speed internal bus/network
Problems are broken up into parts and distributed across multiple
computers in the grid – less communication betw parts than in
clusters.
21
Provides scalable services as a utility over the Internet.
Often built on a computer grid
Users buy services from the cloud
◦ Grid users may develop and run their own software
Cluster/grid/cloud distinctions blur at the edges!
22
Business-oriented
Systems to make a number of separate network applications interoperable
and build “enterprise-wide information systems”.
Two types discussed here:
◦ Transaction processing systems
◦ Enterprise application integration (EAI)
23
Provide a highly structured client-server approach for database applications
Transactions are the communication model
Obey the ACID properties:
◦ Atomic: all or nothing
◦ Consistent: invariants are preserved
◦ Isolated (serializable)
◦ Durable:committed operations can’t be undone
24
Example primitives for transactions.
25
Transaction processing may be centralized (traditional client/server
system) or distributed.
A distributed database is one in which the data storage is distributed –
connected to separate processors.
26
A nested transaction is a transaction within another transaction (a sub-
transaction)
◦ Example: a transaction may ask for two things (e.g., airline reservation
info and hotel info) which would spawn two nested transactions
Primary transaction waits for the results.
27
28
Less structured than transaction-based systems
EA components communicate directly
Communication mechanisms to support this include CORBA, Remote
Procedure Call (RPC) and Remote Method Invocation (RMI)
29
Middleware as a communication facilitator in enterprise application integration.
30
This type of system is likely to incorporate small, battery-powered, mobile
devices
◦ Home systems
◦ Electronic health care systems – patient monitoring
◦ Sensor networks – data collection, surveillance
31
Built around one or more PCs, but can also include other electronic
devices:
◦ Automatic control of lighting
◦ Network enabled appliances
◦ PDAs and smart phones, etc.
32
Monitoring a person in a pervasive electronic health care system, using (a) a local hub
or (b) a continuous wireless connection.
33
A collection of geographically distributed nodes consisting of a comm.
device, a power source, some kind of sensor, a small processor…
Purpose: to collectively monitor sensory data (temperature, sound, moisture
etc.,) and transmit the data to a base station
“smart environment” – the nodes may do some rudimentary processing of
the data in addition to their communication responsibilities.
34
Organizing a sensor network database, while storing and processing data … or (b)
only at the sensors.
35
Questions?
36