Dr. Sanjay P. Ahuja, Ph.D. FIS Distinguished Professor of CIS School of Computing UNF
Dr. Sanjay P. Ahuja, Ph.D. FIS Distinguished Professor of CIS School of Computing UNF
Dr. Sanjay P. Ahuja, Ph.D. FIS Distinguished Professor of CIS School of Computing UNF
The motivation for distributed systems stems from the need to share resources,
both hardware (disks, laser printers etc), software (programs), and data (files,
databases and other data objects).
The Internet is a very large distributed system that provides services such as the
WWW, email, ftp, telnet, etc. The set of services is open-ended in that it can
extended by the addition of server computers and software components.
1. Economics
Computers harnessed together give a better price/performance ratio than
mainframes.
2. Speed
A distributed system may have more total computing power than a mainframe.
4. Reliability
If one machine crashes, the system as a whole can still survive if you have
multiple
server machines and multiple storage devices (redundancy).
Security is a major hazard since easy access to data means easy access to
secret data as well.
Parallel and Distributed Systems (MIMD) are classified into:
1) Multiprocessors or Shared memory systems
These are also referred to as tightly coupled systems. There is a
single, system-wide address space shared by all processors.
The API to TCP/IP for different platform are different. E.g. the BSD Socket API
for UNIX/SPARC platforms and Winsock API for WINTEL platforms.
Both CORBA and Java RMI are examples of middleware. RMI is Java
specific while CORBA is language-neutral.
b. Mask Failures: Some failures that have been detected can be masked/hidden or
made less severe. E.g. messages can be retransmitted when then fail to be acknowledged. This
might not help if the network is severely congested and in this case even the retransmission may
not get through before timeout. Another e.g. File data can be written to a pair of disks so that if
one is corrupted, the other may still be correct (redundancy to achieve fault-tolerance).
c. Tolerate Failures: Most of the services on the Internet exhibit failures and it is
not practical to detect or mask all the possible kinds of failures. In such cases, clients can be
designed to tolerate failures. E..g. a web browser cannot reach a web server it does not make the
user wait forever. It gives a message indicating that the server is unreachable and the user can try
later.
5. Failure handling (contd.)
Recovery from failures: This involves the design of software so that the
state of permanent data can be recovered or “rolled back” after a server
has crashed. E.g. database servers have a transaction handling ability
that enables them to roll back a transaction that was not completed.
Redundancy:
a. There should be at least two different routes between any two routers
in the Internet.
b. In the DNS, every name table is replicated in at least two different
servers.
c. A database may be replicated in several servers to ensure that the data
remains accessible after the failure of a single server; the servers can be
designed to detect faults in their peers; when a fault is detected in one
server, clients are redirected to the remaining servers.
7. Transparency
This is defined as the concealment from the user of the separation of components in
a distributed systems, so that the system is perceived as a whole rather than as a
collection of independent components. There are many kinds of transparency: