Distributed Computing, An Introduction: 1.1 Definitions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

CHAPTER

Distributed Computing,
An Introduction
This book addresses distributed computing. In this chapter, we will begin by clar-
ifying what is meant by distributed computing in the context of this book. We will
do so by looking at the history of distributed computing and by comparing this
type of computing with other forms of computing. We will then present some
basic concepts in the disciplines of operating systems, networks, and software
engineering, concepts that you will need to be familiar with in order to under-
stand the material presented in later chapters.
1.1 Definitions
Asource of confusion in the field of distributed computing is the lack of a uni-
versal vocabulary, perhaps because of the breathtaking pace with which new
ideas evolve in the field. Following are the definitions of some of the key terms
used in the context of this book. When you read the book, please keep these def-
initions in mind, and be aware that some of these terms may not have the same
definition in other contexts.
Early computing was performed on a single processor. A uni-processor, or
monolithic computing, makes use of a single central processing unit (CPU) to
execute one or more programs for each application.
Adistributed system is a collection of independent computers, interconnected
via a network, that are capable of collaborating on a task. Computers are con-
sidered independent if they do not share memory or program execution space.
21
22 CHAPTER 1 Distributed Computing, An Introduction
Such computers are called loosely coupled computers, as opposed to tightly
coupled computers; the latter can share data using common memory space.
Distributed computing is computing performed in a distributed system. In this
book, we explore the ways that computer programs, running on independent
computers, collaborate with each other to perform computing such as network
services and Web-based applications.
Anetwork service is a service provided by a special kind of program known
as a server on a network. The World Wide Web is such a service, as is elec-
tronic mail (email) and file transfer (FIP). Aserver program is just half of the
story in the so-called client-server model of distributed computing. Client-
server will be studied extensively in later chapters of this book.
Anetwork application is an application that runs on networked computers
for end users. Network applications range from enterprise applications such
as online shopping carts and electronic auction sites to noncommercial
applications such as chatrooms and network games.
The distinction between network services and network applications is not
always clear-cut, and the terms are often used interchangeably.
Request for C:omments
are specifications pro-
posed by Internet engi
neers to invite public
comments. Over the
years, thousands of such
specifications have
arisen, and they are
archived and accessible
in a number of Web
sites, including The
Intemet RFC/STD/FYI/BCP
Archive' [faq,.org, 5J.
The ARPANET, initiated
in 1970, was the prede.
cessor of the Internet.
1.2 The History of Distributed Computing
In the beginning there were stand-alone computers, each of which was capable
of executing stored programs. Connecting stand-alone computers so that data
could be exchanged among them was a natural progression. Rudimentary con-
nection of computers using cables for file sharing was practiced as early as the
1960s. However, such an undertaking requires manual intervention and cannot
be called a computer application when one or more computer programs execute
autonomously to carry out a task. Such a computer application requires data
communication, whereby two computers spontaneously exchange data using
software and hardware in order to carry out the tasks inherent in the application.
The first Internet Request for Comments (RFC), RFC I, is a proposal that speci-
fies how participating hosts can exchange information with each other through
messages. Whereas there may have been individual attempts to create network
applications on a small scale (perhaps involving two or more computers con-
nected via cables), the earliest network application was electronic mail, or
email, by which the first message was reportedly sent in 1972 on a four-node
ARPANET. (A node on a network is a computer, or host, that participates in the
network.) Automated file transfer mechanisms, which allowed data files to be
exchanged between hosts, were another natural progression, and as early as
1971 there was a proposal for such a mechanism (see RFC 114 and RFC 141). To
this day, email and file transfer remain two of the most popular network serv-
ices. The best-known network service, however, is undoubtedly the World Wide
Web (WWW). The Web was originally conceived in the late 1980s by scientists
at the Swiss research institute CERN in Geneva as an application that could sup-
port the access of hypertext over a network. The WWW has since become a plat-
form for network applications and services, including email, search engines,
and electronic commerce (e-commerce).
1.2 The History of Distributed Computing 23
The WWW was responsible for an explosion in the scale of the Internet. Until
J990, ARPA ET, the predecessor of the Internet as we know it, was primarily a
data network used by scientists, researchers, and academicians. Spurred by the
popularity of the WWW.thenetworkgrewspectacularlyintheJ990s.asillus-
trated in Figures 1. J and 1.2.
If you are interested in the history of network computing, some Web sites that
are well worth visiting are [vlmp.museophile.com, J], [zakon.org, 2), and
[isoeorg, 38]. In addition, [Hafner and Lyon, 4] is a fascinating account of the
early development of the Internet, including the people and the organizations
involved in it.
Hypertext, a term
coined by visionary Ted
Nelson, refers to textual
documents in which
associative paths may be
followed to access addi-
tional documents. The
bestknown example of
hypertext is a Web page
that contains Web links.
160,000,000
1'0,000,000
120,000,000
100,000,000

II 80,000,000
:z:
..
60,000,000
40,000,000
20,000.000
0
Hobbes' Intemet Timeline Copyright C2lD2 Robert Hlakon
htlpJIwww zakan org/robertlintemetllimelinel
pm Hom I
pm 80m
"i .ev Survey
'7
lZl69

I 05/82 .JS Old SUrvey


06/70

I 08/83 S62
-

10/70 11 I 10/84 1,024


unO 13 I 10/85 1,961

04/71 23 I 02/86 2,308


10/72 31 I 11/86 S,089

01/73 35 I 12/87 28,1'4 .


06/74 6. I 07/88 33,000

03/71 111 I 10/B8 56,000


12/79 ..8 I 07/89 130,000

.,
08/81 213 I 10/89 159,000

1
.f
f
Figure 1.1 The growth of Internet hosts [zakon.org, 2] (reprinted by permission).
An Internet domain is
part of the naming
scheme of resources on
the Internet.


0
0
;;;
N N N M



1
m
i
m m m
1
m m m m m
,;

.. c c
"
.,
.b

"

"
.. c c
,

, A

,
15

,

,
'"
0
'"
<
z <
'"


0
'" '"
<
, ,
Hobbes' Inlemel Timeline Copynght C2OJ2 Robert Hlaken
,
..... .....
1.600.000

1,400,000

1.100,000
i 1,000,000 ...
800,000 ..
J

400,000 ...
100,000
,.,.,.'"""....... mnm.._ll
Figure 1.2 Internet domains [zakon.org, 2) (reprinted by permission).
24 CHAPTER 1 Distributed Computing, An Introduction
HISTORICAL TnR... E... N""D.... S'-- _
by Richard Gabriel and jim Waldo, Sun MicroSystems
(Excerpted from http://www.sun.com/jini/overview/[7])
Reprinted by permission of lim Waldo.
How have we arrived at a place where
connected services and devices are the
driving forces for the next wave of com-
puting?
The most significant reason is our bet
ter understanding of physics, chemistry,
the physical bases for computation, and
chip manufacturing process. Today, a 5i9
nificantly powerful computer can be built
from one or two small chips and an entire
computer system can be built on one
small board.
There were three dimensions of
improvement: size, cost, and computa-
tional power. Since the 19605, size and
cost of computers have decreased dra-
matically while computational power has
gone through the roof.
The mainframe of the 1960s was a col-
lection of boxes in a large room-it cost
millions of dollars and set the bar for
computational power. Only a company
could afford one.
The minicomputer became possible
when the functionality of a mainframe
could be put in a few boxes. It had the
computational power of the previous
mainframe generation and could be
bought by a single department. Most
minicomputers were connected to inter-
active terminals-the beginnings of com-
puter-based culture, a community.
When a computer with the power of a
mini shrank to a box that fit beside a desk,
we got the workstation. A department
could afford to buy one for a couple of
professionals. A workstation had enough
computational power to support sophisti-
cated design, engineering, and scientific
applications, and to provide the graphical
support for them.
The personal computer was small
enough to fit on a desk and powerful
enough to support intuitive graphical user
interfaces, individuals could afford them,
and companies bought them for every
employee.
Eventually processors became small
enough and cheap enough to put one in
a car in place of an ignition system, or in
a TV instead of discrete electronics.
Today's cars can have frfty or more proces-
sors, the home over a hundred.
The computational power dimension
has another fallout. Tho overall trend
toward smaller, faster, cheaper processors
meant that fewer people had to share a
CPU, but it also meant that people in the
organization could become isolated.
When a tool is shared, it creates a com-
munity; as the tool shrinks, fewer people
use it together, and the community dis-
perses. But a community is hard to give
up. Fortunately, computational power
kept pace with the shrinking processor,
and as the community served by a single
computer system shrank, there was
enough power to support communica-
tion between systems. Thus, for example,
workstations became successful once
they could communicate and exchange
data.
The final stretch of the computational
power dimension is that now processors
are powerful enough to support a high-
level, object-oriented programming lan-
guage in such a way to support moving
objects between them. And such a
processor is small enough and cheap
enough to sit in the simplest devices.
Once there is sufficient computational
power, the ability to connect and com-
municate is the dominant factor deter-
mining value. Today for most people. a
computer runs only a few applications
1.3 Different Forms of Computing 25
and mainly facilitates communication:
email, the Web. Recall how fast Internet
popularity soared first with email and,
more recently, once the Web and
browsers became prevalent
1.3 Different Forms of Computing
To understand what is meant by distributed computing in the context of this
book, it is instructive to look at various forms of computing using computers.
Monolithic Computing
In the simplest form of computing, a single computer, such as a personal com-
puter (PC), is used for computing. The computer is not connected to any net-
work, and thus it may use only those resources within its immediate access. This
form of computing may be called monolithic computing. In the most basic
monolithic computing, the computer is used by a single user at a time. The user
oms applications on the system with no access to resources beyond those avail
able with the system. When you use applications such as a word processing pro-
gram or a spreadsheet on a PC, you are practicing this form of computing,
which may be called single-user monolithic computing.
Multiple users can engage in monolithic computing. This form of computing
(see Figure 1.3a), where the resources of a single computer can be shared by con-
current users using a technique known as timesharing, was popular in the
1970s and 1980s. The computer that proVides the centralized resource is usually
called a mainframe to differentiate it from smaller computers such as minicom-
puters and .microcomputers. Through devices known as terminals, users (who
may be geographically dispersed) can be connected to the mainframe computer
and interact with it during a terminal session. Some widely used mainframe
computers include the IBM 360 series and the Univac 1100 series. Applications
using this form of computing are typically separate programs designed to per-
form a single function, such as payroll or billing for a firm or a university.
Distributed Compl,lting
In contrast, distributed computing involves computing performed among
multiple network-connected computers, each of which has its own processor(s)
and other resources (see Figure 1.3b). Auser, using a workstation, has full use of
the resources on the local computer to whiCh its workstation is connected. In
26 CHAPTER 1 Distributed Computing, An Introduction
addition, through the interaction of the local computer and the remote com
puters, the user may access resources on the remote computers. The World Wide
Web is an excelient example of this type of computing. When you use a browser
to visit a Web site, a program such as Netscape or Internet Explorer runs on your
local system and interacts with a program (known as a Web server) running on
a remote system to fetch a file that may reside on yet another remote system.
Terminal
(a)
Mainframe computt.,'
(b)
Workstation
Figure 1.3 Centralized computing (a) versus distributed computing (b).
Parallel Computing
Similar to but distinct from distributed computing is a form of computing
known as parallel computing or parallel processing, which uses more than
one processor simultaneously to execute a single program. "Ideally, paraUel pro-
cessing makes a program run faster because there are more engines (CPUs) run-
ning it. In practice, it is often difficult to divide a program in such a way that
separate CPUs can execute different portions without interfering with each
other" [Koniges, 9]. Parallel computing is typically performed on a single com-
puter that has multiple CPUs, but, according to Koniges, it is also possible to
"perform paraliel processing by connecting the computers in a network.
However, this type of paraliei processing requires very sophisticated software
calied distributed processing software" [Koniges, 9].
Using paraliel co;nputing, one can solve problems that are otherwise impossible
to solve on one computer or solve computing-intensive problems that are oth-
erwise economicaliy untenable. Today, parallel computing is primarily used in
large-scale scientific computing in areas such as biology, aerospace, weather
forecasting, and semiconductor design. Although a fascinating subject, parallel
computing is not within the scope of this book.
1.3 Different Forms of Computing 27
WHERE _
Seller submits
information
UUNet
Many
mirrored
font-end
L.,.,' .."" .,"'!,..,-' servers
Sprint
Planned new
Starfire server
O
Buyer
submits bid
Anaconda
EBay is planning to add another Starfke attached to
the final data disks, mirroring Anaconda.
by Joseph Menn
(From La' Angeles Times, La, Angeles, Calif., Dec. 2, 1999, Joseph Menn.
Copyright Cl 1995, La, Angele, Times.)
Reprinted with permission.
EBay users rarely think about the bidding process-
until the site crashes. Behind the scenes, the online
auctioneer has a number of safeguards that rely
increasingly on duplicated, or mirrored, technolo-
gies in case one piece of machinery or software fails.
Buth the information must ,till pas, through many
different companies and type' of equipment for
everything to work properly.
o Bidder at home registers and submits an elec-
tronic bid from a personal computer.
a The bid travel, from the con,umer', Internet
service provider, through switches and routers, to
the ISP company', 'ervers.
(] The bid is sent through the Internet backbone.
The bid travels to one of EBay's ISPs, most likely
Sprint or UUNet, and through pipes to EBay.
11 The bid passes through EBay' Cisco switche,
and routers.
mThe Information reaches one of about 200
front-line Compaq servers running on Windows NT.
The servers are mirrored, so that if anyone fails, the
others pick up the slack.
a The bid is passed along to one of Sun
Microsystems' Starfire servers, named Bull and Bear,
that mirror each other.
[J The bid is added to two information-storage
databases running Oracle software, where it is
matched with the seller's information.
mThe information flow is reversed back out of
EBay, into e-mails sent to both the 'eller and poten-
tial buyers who are outbid. Confirmation is also sent
to the bidder.
IIi] From Bull, the bid amount and other details
are sent to another Starfire server, called Anaconda,
and recorded on mirrored storage disks.
Sources: Times staff, EBay
28 CHAPTER 1 Distributed Computing, An Introduction
An interested computer
owner will download a
free piece of software
(for example, a screen
saver) from SETI@home.
Then, when his or her
computer is idle while
online, the software
downloads a data file
from an Internet site for
analysis on his or her
computer. The results of
the analysis are sent
back to the Internet site
where they are com-
bined with those can
tributed by other
SETI@home participants
and used to help in the
search for extraterrestrial
signals.
Cooperative Computing
Recently, the term distributed computing has also been appIJed to cooperative com-
puting projects such as the Search for Extraterrestrial Intelligence (SETI) [setiath-
ome.ssl.berkeley.edu, IOJ and distributed.net [distributed.net, 33J. These are
projects that parcel out large-scale computing to workstations on Internet hosts,
making use of surplus CPU cycles, as described in the sidebar. (Note: Further dis-
cussion of this type of computing is not within the scope of this book.)
1.4 The Strengths and Weaknesses of -
Distributed Computing
Prior to the appearance of the World Wide Web, monolithic computing, such as
business applications running on a mainframe computer, or a single user using
a personal computer to perform word processing or spreadsheet functions, was
the dominant form of computing. Thomas Watson, the founder of IBM, was
said to have made the following statement in 1943: "I think there is a world
market for maybe five computers." Since the 1980s, however, distributed com-
puting has become as important as-if not more importailt than-monolithic
computing.
There are a number of reasons for the popularity of distributed computing:
The affordability of computers and availability of network access. Today's
personal computer has computing power superior to that of the mainframe
computers of the early days, at a fraction of the size and the cost. Coupled
with the fact that connectivity to the Internet has become universally avail-
able and generally affordable, the large number of interconnected computers
makes for an ideal community for distributed computing.
Resource sharing. The architecture of distributed computing mirrors the
computing architecture of modern organizations. Each organjzation inde-
pendently maintains computers and resources that are local to the organiza-
tion while sharing resources over the network. Using distributed computing,
organizations can pool their resources very effectively. The Web, for example,
is a powerful platform for sharing documents and other resources within and
among organizations.
Scalabiljty. With monolithic computing, the available resources are limited
to the capacity of one computer. By contrast, distributed computing provides
scalability in that increasing demand for resources can be addressed effec-
lively with additional resources. For example, more computers providing a
service such as email can be added to the network to satisfy an increase in the
demand for that service.
Fault tolerance. Compared to monolithic computing, distributed computing
provides the opportunity for fault tolerance in that a resource can be repli-
1.4 The Strengths and Weaknesses of Distributed Computing 29
cated (or mirrored) to sustain its availability in the presence of failures. For
example, backup copies of a database can be maintained on different systems
on the network, so that when one system fails, other copies can be accessed
without disrupting the service. Although it is not possible to build a distrib-
uted system that is completely reliable in the presence of failures [Fischer,
Lynch, and Paterson, 30j, it is the responsibility of a developer, when design-
ing and implementing such a system, to maximize its fault tolerance. Fault
tolerance in distributed computing is a complex topic that has received
extensive attention in the research community. Interested readers may want
to refer to sources such as the work of Pankaj Jalote Ualote, 31].
In any form of computing, there is always a trade-off between advantages and
disadvantages. The advantages already mentioned are offset by disadvantages.
Some of the most significant ones are:
Multiple points of failure. There are more points of failure in distributed
computing. Since multiple computers are involved, all of which depend on
the network for communication, the failure of one or more computers, or
one or more network links, can spell trouble for a distributed computing sys-
tem. There is a popular quote, attributed to noted computer scientist Leslie
Lamport, which says that "a distributed system is one in which the failure of
a computer you didn't even know existed can render your own computer
unusable."
Security concerns. In a distributed system, there are more opportunities for
unauthorized attack. Whereas in a centralized system all the computers and
resources are typically under the control of a single administration, in a dis-
tributed system management is decentralized, often involving a large l1um-
ber of independent organizations. The decentralization makes it difficult to
implement and enforce security policies; hence distributed computing is vul-
nerable to security breaches and unauthorized access, which unfortunately
can affect all participants on the system. This problem is clearly illustrated by
wellknown attacks on tht: Internet, such 35 worms and viruses [Eichen and
Rochlis, 21; Zetter, 221.
Because of its importance, computer security is a widely researched and
studied topic, and successful techniques have been developed for writing and
deploying secure applications. Such techniques include encryption, keys,
certificates, digital Signatures, sandboxes, authentication, and authorization.
Security is a broad topic that is beyond the scope of this book. Readers are
encouraged to pursue the topiC in references such as lOaks, 321.
Now that we have clarified the objective of this book, let's next look at some of
the basic concepts in three related disciplines in computer science: operating
systems, networks, and software engineering. Although no in-depth knowledge
of these disciplines is required as a prerequisite for this course, this book does
refer to some concepts. and terminologies associated with these disciplines. In
the rest of this chapter we will introduce these concepts and terminologies,
30 CHAPTER 1 Distributed Computing, An Introduction
WEB ATTACKS MIGHT HAVE MANY SOURCES
by Matt Richtel and Sara Robinson (NYT), Feb. 11, 2000
(reprinted with permission of the New York Times)
SAN FRANCISCO, Feb. 1o-Computer
security experts said today that evidence
now suggests that the three days of attacks
on leading Web sites may have been the
work of more than one person or group.
The analysis that more than one group
was at work called into question the con-
clusion of some security experts who were
initially skeptical that following Monday's
attack on Yahoo, multiple vandals would
have been able to muster large "copy cat"
assaults on other sites.
And while the Internet community
searched aggressively for leads, computer
experts said that it would.be difficult even
to determine which computers initiated
the attacks, let alone find the responsible
parties.
CERT, a federally financed computer
security organization formerly known as
the Computer Emergency Response
Team, said today that it was no longer
seeing an unusual number of reports of
attacks. From Monday through
Wednesday, service on several leading
Web sites, including those of the Yahoo
portal, the PTrade Group brokerage firm,
the eBay auction company and Time
Warner's CNN.com news site, were d i s ~
rupted and in some cases halted by
assaults involving dozens or more com-
puters flooding them with data feeds.
But security experts said that Web sites
and the Internet in general would remain
vulnerable for the near future because so
many organizations were failing to take
steps to prevent their computers from
being used by vandals to initiate the
attacks.
One government official said today
that tougher laws might be necessary to
combat such attacks. 'We don't consider
this a prank," Deputy Attorney General
Eric Holder said. "These are very serious
matters."
Also today, it was disclosed that more
major Web sites than had been previously
known were hit on Wednesday, the last
day of the assaults. Those included
Excite@Home, a provider of high-speed
access over cable modems, which was
attacked early Wednesday evening despite
having taken precautions to defend its
network.
At least two other major e-commerce
companies were hit with attacks on
Wednesday, according to IFsec, a com-
puter security firm in New York, though it
declined to name the companies, saying
that one of them was a client.
'We're seeing more of these than have
appeared in the popular media," said
David M. Remnitz, the chief executive of
IFsec.
In addition, users of Intemet Relay
Chat, or J.R.C., said that the forum had
been under irotense fire in the last two
weeks by attacks similar to those levied at
the e-commerce companies.
Meanwhile, network service providers
and investigators continued analyzing evi-
dence, including the packets of data that
had been used to overwhelm and paralyze
the victim sites.
Computer security experts at Stanford
University in Palo Alto, Calif., said that the
preliminary evidence suggested the
attacks might have been the work of more
than one person or group.
David J. Brumley, assistant computer
security officer for Stanford, said the
type of data included in the packets
1.4 The Strengths and Weaknesses of Distributed Computing 31
I
I
used to attack Yahoo on Monday differed
from the data in the Tuesday assault on
eBay.
"The attacks were just completely dif-
ferent between those two days," Mr.
Brumley said. "The people who did Yahoo
are different than the people who did
eBay and CNN."
Network service providers said that the
recent assaults included two types of
attacks, further suggesting that more tnan
one party may have been involved. Both
are what are known as denial of service
attacks because they prevent the targeted
site from serving its customers.
In the first, known as a SYN flood,
attackers hack into-and install software
on-a large number of computers, then
use those machines to bombard the vic-
tim site with requests to start an e<orn-
meree session. The large number of
requests overwhelms the victim's servers,
preventing customers from gaining access
to the site.
To prevent any tracing of these
requests, the vandals employ a practice
called spoofing, which alters the initiating
address.
The second type, known as a smurf
attack, again involves the use of compro-
mised machines, but it also employs a
large third-party network of computers to
"amplify" the data used in the attack and
greatly increases the effectiveness of the
assault It is believed that Stanford's net-
work of computers may have been used in
this way in the attack on Yahoo.
Security experts say it is simple to con-
figure networks so they cannot be used in
a smurf attack, yet many sites do not
know to take these steps.
Computer security experts noted that
the large numbers of computers used to
initiate the attacks this week made tracing
those attacks very difficuit.
At this point, there's been so much
traffic thrown at these people that it's
pretty hard to do a trace: said Joel de la
Garza of the Kroll-O'Gara Information
Security Group, a risk mitigation company.
Moreover, companies whose comput-
ers are hijacked and then used as plat-
forms for an assault often have no idea of
the probiem, even as the assault is going
on, computer security experts said.
Vandals can activate the assault from a
remote location, and to a company or an
individual whose computer is being used;
the only impact may appear to be a slow-
down in the activity of the network.
Victim companies and security experts
said today that in some cases the attacks
seemed more complicated than originally
thought-reinforcing how difficult they
are to prevent.
Excite@Home, for example, said it
sought to take precautionary measures in
light of the earlier attacks but was still
unable to keep its Web site from being
crippled for at least h a ~ an hour.
"To the best of our knowledge, a site
cannot take preventative measures
against ~ h e attacks without the help of
others," said Kelly Distefano, an
Excite@Home spokeswoman. She said the
company would have needed more coop-
eration from the companies that provide
Excite network services.
Peter Neumann, principal scientist at
SRI International in Menlo P.rk, Calff., reit-
erated that the success of the attacks had
shown that Internet sites were not taking
adequate precautions to prevent them-
selves from being used for attacks.
Hit's time people woke up, H Mr.
"'eumann said. "People are racing to do
electronic commerce on the Net without
any understanding of the risks-and there
are much greater risks than we've seen
here."
32 CHAPTER 1 Distributed Computing, An Introduction
1.5 Basics of Operating Systems
Distributed computing involves programs running on multiple computers. Let's
look at some of the concepts involved with the execution of programs in mod-
ern-day computers.
Computer Programs and Processes
A software program is an artifact constructed by a software developer using
some form of programming language. Typically, the language is a high-level one
that requires a compiler or an interpreter to translate it into machine language.
When a program is "run," or executed, on a computer, it is represented as a
process. On modern computers, a process consists of an executing program, its
current values, state information, and the resources used by the operating sys-
tem to manage the execution of the program. In other words, a process is a
dynamic entity that exists only when a program is run.
Figure 1.4 illustrates the state transitions during the lifetime of a process. A
process enters a ready state when a program is at the start of its execution, when
it is placed in a queue by the operating system, along with other programs that
are to be executed. When system resources (such as the CPU) are available for
its actual execution, the process is dispatched, at which point it enters the run-
ning state. It continues to execute until the process must wait for the occurrence
of an event (such as the completion of some input/output operation), at which
time it enters a blocked state. Once the anticipated event occurs, the process will
be placed on the execution queue and await its turn to execute once again. The
process repeats the ready-running-blocked cycle for as many times as necessary
until the execution of the process is completed, at which time the process is said
to be terminated.
In this book, we will use Java programs, or fragments of them, as code examples.
There are three types of Java programs: applications (Figure 1.5), applets (Figure
1.6), and servlets (Figure 1.7). Regardless of which type of program you are writ-
ing, each one is written as a Java class. AJava application program has a main
start
Queued
ready
terminated
Exit
running
Event completion
blocked
Waiting
for event
Figure 1.4 Asimplified state transition diagram of a process.
1.5 Basics of Operating Systems 33
method, and it is run as an independent (stand-alone) process. On the other
hand, an applet does not have a main method, and it is run using a browser or
the appletviewer. Aservlet is similar to an applet in that it does not have a main
method, and it is run in the context of a Web server. We will have occasion to
see examples of all three types of programs and program fragments in this book,
with applications being the form of programs most frequently employed.
AJava program is compiled into bytecode, a universal object code. When run,
bytecode is translated by the Java Virtual Machine GYM) to the machine code
native to the computer, following the state transitions that we have studied ear-
A standalone Java application is run on a local machine.
Computer
-
~ .
lava object
Java Virtual Machine
/ ~ ***.*.* *** * ***** *
.. A sample of a simple Java applic&tion .
.. H. Liu 1/8/02
*.****.*.* * * * *.*.*~ *******.**/
import java.ie.;
class MyProgram{
public static void main(Stringl J ergs)
throws IOException{
BufferedReader keyboard - new
BuferedReader(new InputStreamReader(Syst8m.in;
String theName;
System.out.println(MWhat is your name?-);
theName - keyboard.readLine( );
System.out.print(-Hello - + theName);
System.out.println(- - welcome to CSC369.\n-):
II end main
Ilend class
Figure 1.5 Astand-alone lava application (top) and the code that activates it (bottom).
34 CHAPTER 1 Distributed Computing, An Introduction
An appJet is an object downloaded (transferred) from a remote machine
and then run on a local machine.
..
,
I ~ .
An applet
I ~
-
.
.. : I ~

-
0
A sample of a simple applet .
H. Liu 1/8/02
import java.applet.Applet;
import java.awt.*;
public class MyApplet extends Applet{
public void paint(Graphics g){
setBackground(Color.blue)i
Font Claude = new Pont(-Arial-, Font.BOLD, 40);
q.setFont(Claude);
q.setColor(Color.yellow)i
q.drawString(*Hello Worldl*, 100, 100);
II end paint
fiend class
<1-- A web page which, when browsed, will run>
<1-- the MyApplet applet>
<1-- M. Liu 1/8/02>
<title>SampleApplet</title>
<hr>
<applet code=-MyApplet.class width-SOD heightE500>
</applet>
<hr>
<a h r e f - ~ Hello.java->The source.</a>
Java object
Java Virtual
Machine
Figure 1.6 An applet (top) and the Web page (bottom) that activates it.
1.S Basics of Operating Systems 3S
Aservlet is an object that runs on a remote machine and interacts
with a local process using a request-response protocol.
Aservlet
Request
I ~ = ~ ~ : ; : : : ' ; : : ; : : ; : : ~ = = ~ ---+- Aprocess
Response
, ***** **
* A sample of a simple Java aervlet .
M. Liu 1/8/02
import java.lo.-;
import java. text.:
import java.util.-;
import javax.servlet.*i
import javax.servlet.http.*;
public class HyServlet extends HttpServlet {
public void doGet (HttpServletRequest request,
BttpServletResponse response)
throws ServletException, IOException (
PrintWriter out;
String title ~ -HyServlet Output-;
Ilset content type and other response header
Ilfiel.da first
response.setContentTypeC-text/html-);
Ilthen write the data of the response
out - response.getWriter():
out.println("<HTHL><HEAD><TITLE>");
out.println(title):
out.println("</TITLE></HEAD><BODY>");
out.println("<Bl>" + title + "<lSi>"):
out.printlnC"<P>Hello Worldl"):
out.printlnc"</BODY></HTKL>");
out.close( );
} Ilend doGet
/lend class
Figure 1.7 Aservlet (top) and the code that activates it (bottom).
36 CHAPTER 1 Distributed Computing, An Introduction
Iier. Because the bytecode is an intermediate code that is the same regardless of
machine types and is translated to the specific machine code at run time, Java
programs are therefore said to be platform-independent, meaning that the
same program can be run on any machine type that supports the ]VM.
In this book, it is assumed that you have knowledge of basic]ava programming,
to the extent that you can compile and execute a stand-alone application or
applet. A stand-alone program is one that executes on its own, without
exchanging messages with another program.
Concurrent Programming
Distributed computing ;nvolves concurrent programming, which is program-
ming that involves the simultaneous execution of processes. In the following
paragraphs we look at three kinds of concurrent programming.
Concurrent processes executed on multiple computers. Much of the mate-
rial in this book deals with separate processes running concurrently on sepa-
rate, independent computers interconnected via a network. The processes
interact with each other by exchanging data over the network, but their exe-
cution is otherwise completely independent. When you access a Web page
using a browser, a process of the browser program, running on your machine,
interacts with a process running on the Web server machine.
Concurrent programming involVing multiple machines requires program-
ming support; that is, the software for the participating program must be
written to contain logic to support the interaction between processes. How
this logic can be expressed in the programs is a main theme of this book.
Concurrent processes executed on a single conlputer. Modern computers
are supported by multitasking operating systems, which allow multiple tasks,
or processes, to be executed concurrently. The concurrency may be real or vir-
tual. True concurrent multitasking on a single computer is feasible only if the
computer has multiple CPUs, so that each CPU can execute a separate
process. On a computer that has only one CPU, timesharing (see Figure 1.8),
or time-slicing, is used to allow processes to take turns being executed, creat-
ing the illusion that they are being executed in parallel.
Processes
PI
P2
P3
P4
Time
Timesharing of a resource
Figure 1.8 Timesharing on a computer.
1.5 Basics of Operating Systems 37
Since multitasking is a functionality of the operating system, no pro-
gramming is needed for this type of concurrent programming. No special
software logic needs to be contained in a program to initiate multitasking.
Concurrent programming in a process. In addition to concurrent pro-
gramming in separate processes, it is often necessary for a single program to
initiate tasks that are to be executed concurrently. For example, it may be
necessary for a program to perform other tasks while waiting indefinitely for
user input in one user interface window. It may also be desirable for a pro-
gram to execute tasks in parallel, for performance reasons. Concurrent pro-
gramming within a process is performed using two types of facilities
provided by the operating system.
Parent and Child Processes
At run time, a process may spawn subordinate processes, or child processes.
Through real or virtual multitasking, the original process, called the parent
process, continues to run simultaneously with the child processes (see Figure
1.9). Achild process is a complete process, consisting of an executing program,
its own current values, and state information, some of which is inherited from
the parent process. A parent process can be notified when a child process has
terminated.
A parent process may spawn child processes. A process may spawn child threads.
Parent process
Child processes
Child thread 1
Aprocess
Child thread 2
(0) (b)
Figure 1.9 Concurrent processing within a process.
Threads
In lieu of child processes, a process may spawn threads, also known as light-
weight processes. Threads carry a minimum of state information, but other-
wise behave the same as processes. Since they incur less overhead, threads are
preferred over child processes.
38 CHAPTER 1 Distributed Computing, An Introduction
The spawning and coordination of child threads requires programming support.
The software for the program must be written to contain logic to support the
spawning of the threads and to coordinate, or synchronize, the execution of the
family of threads spawned by the parent thread.
The concurrent execution of threads may result in a race condition. A race con-
dition occurs when a series of commands in a program are executed in parallel,
in an arbitrarily interleaved fashion, yielding nondeterministic execution out-
come. Figure 1.10 illustrates such a situation. Suppose counter is a variable shared
among two concurrent threads. Execution sequence 1, in which the instructions
of the two processes are executed serially, will result in the counter value being
incremented to 2. On the other hand, in execution sequence 2, in which the two
sets of instructions are interleaved, the counter will only be incremented to 1.
Race conditions can be avoided if mutual exclusion is proVided to a code seg-
mentto ensure that the commands in the segment can only be executed by one
thread at a time. Such a code segment is called a critical region. For our exam-
ple, the critical region comprises the code where the counter variable is accessed
and incremented.
Programming using threads is called multi-threaded programming, or
threaded programming for short. A multi-threaded program that is written to
guard against race conditions is said to be thread-safe. The development of a
complex thread-safe program reqUires advanced programming skills.
Time
fetch value in counter and load into a register I
increment value in register
store value in register to counter
fetch value in counter and load into a register
increment value in register
store value in register to counter
This execution results in the
value 2 in the counter.
fetch value in counter and load into a register
fetch value in counter and load into a register
1 in_c_'e_rn_en_t_v_a_lu_e_i_n_
r
..c e9oc
i
_'t_e_'----
1 in_c_'e_rn_en_t_v_a_lu_e_i_n_'_eg ..i_'I_e_' _
1 '_to_r_e_v_al_u_e_in_'e..o9c-is_te_,_t_o_c_o_u_n_te_r _
1__....;s;.;to;;,';.;e..v..;a;.;lu;.;e..i.. ..t;.;o..c;.;o;.;u.. n;.;te;.;' _
This execution results in the
value 1 in the counter.
Instruction executed in concurrent process or thread 1
Instruction executed in concurrent process or thread 2
Figure 1.10 A race condition ,esulting from unsynchronized concurrent processes or threads.
1.5 Basics of Operating Systems 39
Fortunately, in this book we will seldom have to use threads explicitly, as
threaded programming is nften provided behind the scenes by the toolkits that
support network applications. .
Java Threads
The Java Virtual Machine enables an application to have multiple threads of
execution running concurrently. When a Java Virtual Machine starts up, there
is usually a single thread (although in some systems a program may start with
more than one thread) that typically calls the method named main of some des-
ignated class, such as the class of an application that you wrote. Additional
threads can be spawned from an active thread, and each thread will run inde-
pendently and in parallel with other threads until it terminates.
To support threading in a program, Java provides a class named Thread as well
as an interface named Runnable interface.
From within a Java program, there are two ways to create a new thread of exe-
cution:
1. Declare a class to be a subclass of Thread. This subclass should override the
run method of class Thread. When an instance of the subclass is allocated
and started, the code in the run method is executed concurrently with the
main thread.
2. Declare a class that implements the Runnable interface. That class implements
the TUn method of the interface. When an instance of the class is allocated
and started, the code in the TUn method is executed concurrently with the
main thread.
Figure 1.11 iilustrates the use of the first means of creating a new thread of exe-
cution, while Figure 1.12 iilustrates the use of the second way.
pUblic claas RunTbreds
{
public static void main (String{J Args)
{
SomeThread pI z new SomeThread(l);
pLstart( );
SomeThread p2 new SomeThread(2);
p2.staxt() ;
SomeThread p3 = new SomeThread(3);
p3.start()i
}
}/I end c l a ~ B RunThrd.
public class SomeThread extends Thread {
lnt myID;
SomeThread(int id)
this.myID "" idi
public void rune) {
int ii
for (1 - 1; i < 11; i++)
system.out.println
("Thread"+ myID + ": .. + i) i
}
fiend elass SomeThread
Figure 1.11 Sample application that spawns three threads using a subclass of the
Thread class.
40 CHAPTER 1 Distributed Computing, An Introduction
public class RunTbread.2
{
public static void main (Strinq(J ergs)
{
Thread pi - new Thread(new
SomeThread2 ( 1) ) ;
pl. start() ;
Thread p2 - new Thread(new
SomeThread2(2;
p2.8tart();
Thread pJ - new Thread(new
SomeThread2(3;
p3.8tart() ;
}
} fiend class RunTbre.d2
class sc.eftread2 implements RUDable {
int myID;
8a.eTbrd2(int id) (
this .myID - idi
}
public vd'id ruo() {
int i;
for (1 - 1; i < 11; i++)
System.aut.println
+ ": .. + i);
}
Ilend class sa.eTbread2
Figure 1.12 Sample application that spawns three threads using an implementation of the
Runnable interface.
In java, the most straightforward way to guard against race conditions is by
using synchronized static methods. A static method with the reserved word
synchronized appearing in its signature can be executed by only one thread at a
time. Hence the code in a synchronized static method is guaranteed to be mutu-
ally exclusive. For the example shown in Figure 1.10, the code for incrementing
the counter variable should be enclosed in a synchronized static method so that
the increments to the counter can only be made by one thread at a time. Ajava
code sample iJIustraiing the USe of threads and a synchronized static method
can be found in Exercise 2(d.) at the end of this chapter.
In subsequent chapters, we will use the terms process and thread frequently. If
you are not familiar with threading, there are some exercises at the end of this
chapter that allow you to practice threading using java programming.
1.6 Network Basics
Having looked at some key concepts in operating systems that are relevant to
distributed computing, next we will do the same with network basics.
Protocols
In the context of communications, a protocol is a set of rules that must be
observed by participants. In a face-to-face meeting, human beings instinctively
follow an unspoken protocol based on eye contact, body language, and gestures.
This protocol stipulates that only one person speaks at a time while the others
listen. In a phone conversation, one party initiates the call, and then, after the
call is answered, the parties at the two ends take turns speaking, using pauses or
1.6 Network Basics 41
questions to signify when it is the other party's turn to talk.
In communications involving computers, protocols must be formally defined
and precisely implemented. For each protocol, there must be rules that specify
the following:
How is the data exchange encoded?
How are events (sending, receiving) synchronized (ordered) so that the par-
ticipants can send and receive in a coordinated manner?
The concept of protocols will become more concrete when we study a number
of protocols in the rest of this book.
It should be emphasized that a protocol is a set of rules. The specification of a
protocol does not dictate how the rules are to be implemented. For example,
Hypertext Transfer Protocol (HTTP) specifies the rules that must be observed
between a Web browser process and a Web server process. Any Web server pro-
gram written in conformance to these rules satisfies the protocol, regardless of
what programming language or syntax is employed. Therefore, you should
understand that a protocol (such as HTTP) is distinct from its implementations
(such as the varieties of Web browsers, including Netscape and Internet
Explorer).
As an analogy, the rules for a sport, say basketball, are specified by some author-
ity, say the National Basketball Association (NBA), but it is up to each individ-
ual team and then each player to execute or implement the game while
observing those rules.
Network Architecture
In the textbooks for data networks, the functionalities of a network are fre-
quently presented using a network architecture (Figure 1.13). The classic net-
work architecture, called the Open System Interconnect (051) architecture,
divides the complex functionalities of a network into seven layers. All or part of
these functionalities must be present on a computer that participates in data
communication and hence also in distributed computing. If you are interested
in the specifics of the 051 model, you should be able to find them in textbooks
on networks such as ITanenbaum, 351. For the purposes of this book, a simpli-
fied architecture that IS appropriate for the Internet will be presented.
The network architecture for the Internet is illustrated in Figure 1.1-1, where
there are four layers: physical, Internet, transport, and application. The physi-
cal layer provides the functionalities for the transmission of signals, represent-
ing a stream of data, from one computer to another. The Internet layer allows
a packet of data to be addressed to a remote computer and delivered to that
computer. The transport layer proVides the functionalities for data packets to
be delivered to a specific process running on a remote computer. Finally, the
application layer allows messages to be exchanged between programs in sup-
port of an application such as the World Wide Web.
The syntax of a pro-
gramming language is
the set of language
rules, including spelling
and grammar, of the
language.
OSI stands for Open
System Interconnect,
the name given to a
model of network archi-
tecture promoted by an
organization called the
International
Organization for
Standardization (ISO).
42 CHAPTER 1 Distributed Computing, An Introduction
Application layer - --- - ---..
________ 4_ ......
Application layer~ . ,
!
Presentation layer Presentation layer
Session layer Session layer
Transport layer Transport layer
Network layer Network layer
Data link layer Data link layer
.. Physical layer

Physical layer

Figure 1.13 The OSI seven-layer network architecture.


.. Application layer ..... - - - - -
Transport layer
Internet layer
'4---- --
- Application layer
Transport layer
Internet layer
Physical layer ~ .. o----

Physical layer .....
Figure 1.14 The Internet network architecture.
The division of the layers is conceptual: the implementation of these function-
ali ties need not be clearly divided as such in the hardware and software that
implement the architecture. The conceptuai division of the architecture into
layers serves at least two useful purposes. First, it enables protocols to be speci-
fied systematically; that is, using a network architecture, these protocols can be
specified layer by layer, addressing the functionalities required at each layer.
1.6 Network Basics 43
Secondly. the layered architecture allows the details of the network's function-
alities to be abstracted, or hidden. When writing an application, it is helpful
when one does not have to be concerned witli the details of data communica-
tion but can instead concentrate on the application protocol at hand. Alayered
architecture allows a program to be written as if data can be exchanged directly
(see the dashed lines in Figures 1.13 and 1.!4). In actuality and behind the
scenes, a message sent from one application must be processed by all the func-
tionalities in the lower layers of the network architecture (see the dotted lines).
Eventually, the stream of data signals representing the message is transmitted
over the physical link interconnecting the computers (see the solid lines). Upon
arriving at the receiving computer, the data signals are then processed by the
functionalities of the network architecture in the reverse order, until eventually
the data is reassembled into a message and delivered to the appropriate process.
Network Architecture Protocols
Let's now look at some of the specific protocols for the Internet architecture.
The protocol for the Internet layer is named, aptly enough, the Internet
Protocol. This protocol uses a particular naming scheme, which we will soon
study, for identifying computers on the network and for routing the data. At the
transport layer, there are two Widely used protocois: The Transmission Control
Protocol (TCP) provides connection-oriented communication, while the User
Datagram Protocol (UOP) supports .connection!ess communication. (A discus-
sion of connection-oriented versus connectionless communication will be
introduced in the next section and then discussed further in Chapter 2.) Finally,
at the application layer, protocols such as File Transfer Protocol (FTP), Simple
Network Mail Protocol (SNMP), and Hypertext Transmission Protocol (HTTP)
are specified for network applications. The well-known Transmission Control
Protocol/Internet Protocol (TCP/IP) is a set' of protocols encompassing the
Internet and transport layers of this architecture; these protocols are universally
employed for data communication over the Internet. An Internet application
therefore must be run on a computer that implements this portion of the
Internet architecture, colloquially termed the TCP/lP stack.
Readers who are interested in the protocols at the lower layers may want to con-
sult textbooks such as [Stallings, !2; Tanenbaum, 13]. This book is devoted to
the study of protocols at the application layer. We will start by looking at some
of the popular application protocols, such as those just mentioned in the pre-
vious paragraph. We will then go on to study how such applications are sup-
ported using distributed computing.
Connection-Oriented versus Connectionless Communication
Although connection-oriented and connectionless communication are more
properly a topic for data networks discussion, we will have occasion to make a
distinction between the two in OUf discussions.
44 CHAPTER 1 Distributed Computing, An Introduction
A data network trans
mits data; a voice net-
work transmits voice.
Modern networks trans-
mit both data and voice.
In connection-oriented communication, a connection-which may be physical
(Le., tangible, provided using hardware such as cables, modems, and receivers)
or logical (Le., abstract or virtual, using software that emulates a connection)-
is established between two parties, the caller and the callee. Such is the case
when you (the caller) dial a number to make a phone call to a friend (the caliee).
Once a connection is established, data (voice, in the case of a phone cali) can be
sent repeatedly over the connection continuously unlilthe session is over, such
as when you hang up the phone at the end of a conversation, at which point
the connection is severed. Note that in this mode of communication there is no
need to address an individual data packet explicitly while a connection is in use.
As the name implies, connectionless communication involves no connection.
Instead, data is sent a packet at a time. Each packet must be explicitly addressed
by the sender to the receiver. An example of connectionless communication is
when you correspond with a friend using rounds of email messages or letters.
Each email or letter you send, containing a message, must be addressed to your
friend. In reply, your friend sends an email or letter addressed to you. The
exchange continues until the correspondence, or session, is over.
On a data network, connectionless communication is simpler to provide, since
there is no need to maintain separate connections. Yet the lack of a connection
can result in data packets being lost during delivery or being delivered out of
order. For example, if you send multiple emails or letters to your friend in suc-
cession, each one containing part of a message, it is entirely possible for your
friend to receive the emails or letters in a scrambled order, since each email/let-
ter is delivered independently.
On the other hand, connection-oriented communication can ensure that data
packets are delivered safely and in order along an established connection, at the
cost of additional processing overhead. This is another example of trade-offs.
Figure 1.15 graphically illustrates the difference between these two forms of
communication. In exercise 3 at the end of this chapter, you will be guided
through a simplified analysis of the trade-offs between the two forms.
At any layer of a network architecture, communication can be carried out using
a connection-oriented facility or a connectionless facility. At the transport layer
of the TCP/IP suite, the Vser Datagram Protocol (VDP) is a connectionless pro-
tocol, while the Transmission Control Protocol (TCP) is a connection-oriented
protocol. Afacility or protocol that lIses VDP to transmit data is said to be con-
nectionless at the transport layer, while one that uses TCP is said to be connec-
tion-oriented at the same layer. Note that it is possible for a communication
facility to be connection-oriented at one layer but connectionless at another.
For example, a Web application uses HTTP, a connection-oriented protocol, at
the application layer, but actual data transmitted to and from the application
may use VDP at the transport layer.
Table 1.1 compares the two modes of communication.
1.6 Network Basics 45
ftiiIo-
~ .
...--' ~
Connection-oriented communication
-

...
--
:
Connectionless communication
Adata packet
Figure 1.15 Connection-oriented versus connectionless communication.
Table 1.1 Comparisons of Connection-Oriented and Connectionless Interprocess
Communication (IPC).
Connection-Oriented Connectionless
Addressing Specified at connection Addressing is specified with
time; there is no need each operation.
to re-specify with each
subsequent operation
(send or receive).
Connection overhead There is overhead Not applicable.
for establishing a
connection.
Addressing overhead There is no addressing Overhead is inrurred with
overhead with each each operation.
individual operation.
Data delivery order The connection abstraction The lack of a connection
allows the IPe mechanism makes it difficult for the
to maintain the order of IPe facility to maintain
delivering data packets. delivery order.
(colltilllled 011 Ilexl page)
46 CHAPTER 1 Distributed Computing, An Introduction
Table 1.1 Comparisons of Connection-Oriented and Connectionless Interprocess
Communication (IPC). (continued)
Connection-Oriented ConnectionJess
Protocols This mode of communica- This mode of communica-
tion is appropriate for tion is appropriate for
protocols that require protocols that exchange a
exchange of a large stream small amount of data in a
of data and/or a large limited number of rounds of
number of rounds of exchange.
exchange.
An Internet host is a
computer that imple.
ments the Internet pro-
tocol architecture and
hence is capable of par
ticipating in Internet
communications.
A router is a computer
that specializes in for-
warding data between
networks. On the
Internet, a router imple-
ments the functionalities
of the Internet l a y e ~
Network Resources
Throughout this book you will often encounter the term network resources. By
network resources we are referring to resources that are available to the partici-
pants of a distributed computing community. For example, on the Internet the
network resources include hardware such as computers (including Internet
hosts and routers) and equipment (printers, facsimile machines, cameras, etc.),
and software such as processes, email mailboxes, files, and Web documents. An
important class of network resources is network services, such as the World
Wide Web (WWW) and file transfer service, which are provided by specific
processes running on computers.
Although the idea may seem simple, one of the key challenges in distributed
computing is the unique identification of resources that are available on the
network. In the next section, we will look at how resource identification is
accomplished on the Internet.
Host Identification and Internet Protocol Addresses
Physically, the Internet is a gigantic mesh of network links and computers.
Conceptually (see Figure 1.16), the main arteries of the Internet are a set of high-
bandwidth network links that constitute the "backbone" of the network.
Connected to the backbone are individual networks, each of which has a unique
identifier. Computers with TCP/IP support, called Internet hosts, are linked to
individual networks. Through this system of "information highways," data can
be transmitted from a host HI on network Nt to another host Hz on network N
z
.
To transfer data from within a program, it must be possible to uniquely identify
the process that is to receive the data, similar to addressing the recipient' of a let-
ter delivered by the postal service.
As you recall, a process is a run-time representation of a program when the pro-
gram is executed on a computer. Further, recall that on the Internet a computer,
or a host, is linked to a network. In order to identify a process, it is therefore
necessary to name the network, the host linked to that network, and then the
particular process running on the host.
1.6 Network Basics 47
Q An Internet host
C) Networks
o The Internet backbone
Figure 1.16 The Internet topology.
In the Internet architecture, host identification is part of the Internet protocol
(lP), which, as you recall, is the protocol at the Internet layer of the TCP/IP
suite. The discussion that follows refers to the host identification scheme spec-
ified in version 4 of IP, or IPv4. Although the scheme has been modified in IP
version 6 (IPv6) to accommodate more Internet addresses, the principles of the
scheme are unchanged in the two versions, and IPv4 is chosen here for its rela-
tive simplicity. In the context of this book, the distinctions between the two
versions are not significant.
In IPv4, each host on the Internet is identified by a unique 32-bit string. Given
a length of 32 b.its, the total number of addresses allowable is 2
32
. Put another
way, the address space of IPv4 accommodates 2
32
(4,294,967,296 or over 4 bil-
lion) addresses in total.
Each IP address must identify both the network on which a host resides and
then the particular host on that network. The IPv4 addressing scheme does so
as follows:
The address space is divided into five classes, A through E. As illustrated in
Figure 1.17, each class has a unique prefix. Class A starts with a bi.t 0, Class B
starts with a bit sequence of 10, Class C with 110, and so forth. The remaining
bits in each address are used for identifying the network and the host On a par-
ticular network. Thus a Class A address has 31 bits for network-host identifica-
tion, a Class Baddress 30 bits, and so forth. This means that a total of 2
31
(about
2 billion) Class A addresses are available, while a maximum of 2
30
(about I bil-
lion) Class B a d d r e s s e ~ are available. The maximum number of addresses in
Class C, D, or Ecan be calculated similarly. It should be noted that within each
class a small number of addresses (such as all Os and all Is) are reserved for spe-
cial purposes.
48 CHAPTER 1 Distributed Computing, An Introduction
Byte 0 Byte 1 Byte 2 Byte 3
Class Aaddress
Class Baddress
Class Caddress
Multicast address
Reserved address
0
j
1 0
I
1 1 0
-
I
1 1 1 0 Multicast Group
1 1 1 1 Reserved
Network address
o Host portion
Figure 1.17 The IPv4 address scheme.
You may wonder why it is necessary to have different classes of addresses...This
has to do with the number of computers that each individual network can
accommodate. Consider a Class Aaddress (see Figure 1.17): the 7 bits immedi-
ately following the prefix 0 are allotted for network identification, with the rest
of the 32-8 = 24 bits devoted to the identification of hosts within a network.
Therefore, each class A network can support 2
24
(roughly 16 million) hosts,
although there can be no more than 2', or 128, such networks. Using the same
analysis, you can see that each of the 2
14
(16,384) class Bnetwork addresses can
accommodate up to 2
16
(65,536) hosts. likewise, there are far more Class C net-
works than there are Class Bnetworks, but each Class C network can support far
fewer hosts.
As has already been mentioned, we will seldom have occasion to identify IP
hosts using the 32bit address string. On the rare occasions when we do use a
numerical network address, we most likely will use the so-called dotted decimal
notation instead. The dotted decimal notation of an IP address uses a decimal
value for each of the 4 bytes in the IP address.
For example, suppose the dotted-decimal notation for a particular Internet
address is 129.65.24.50. The 32bit binary expansion of the notation is as follows:

01000001
00011000
00110010
Since the leading bit sequence is 10, the address is a Class Baddress. Within the
class, the network portion is identified by the remaining bits in the first 2 bytes,
that is, 00000101000001, and the host portion consists of the values in the last
2 bytes> or OOOllOOOOOllOOI0. For convenience, the binary prefix for class
identification is often included as part of the network portion of the address, so
we would say that this particular address is at network 129.65 and at host
address 24.50 on that network.
1.6 Network Basics 49
Here is another example. Given the address 224.0.0.1. one can expand it as follows:
224.0.0.1
111:11\
0 o o o o o 1
The binary prefix 1110 signifies that this is a class 0, or multicast, address. Data
packets sent to this address should therefore be dellvered to the multicast group
Ooo<lOOCl()()()()()()()()Q()OOO)()()(KlOOl.
An IP network address is assigned by an authority, known as the IDtemet
Assigned Numbers Authority (lANA) [community-ml.org, 2Sj to an organiza-
tion such as a university or an Internet Service Provider (ISP). (Note: The assign-
ment of the authonty is dynamic. See http://www.wia.org/pub/iana.htmlfor a
history of the evolution of this authority.) Within each network, the assignment
of the host portion is internal to the organization. Typically, an organization
makes use of this portion of the address to subdivide its network into a hierar-
chy of subnets, with a unique host number assigned to each computer attached
to a subnet. For instance, the administrator of class B network 129.6S may
choose to designate byte 2 (that is, the leftmost 8 bits of the host portion) in the
address as a subnet identifier. Under this subnet scheme, the IP address
129.6S.32.3 identifies a host of ID 3 on a subnet of ID 32 on this particular net-
work.
Since the 1990s, the demand for IP addresses has skyrocketed, to the point that
the address space has been exhausted. The static addressing scheme we have just
described has sino; been augmented with numerous changes in response to
ever-increasing demand for addresses, including' the dynamic addressing
scheme that is popular with Internet Service Providers (ISPs) such as America
Online (AOL). Using dynamic addressing, an ISP or large organization can
extend the address space for a given IP network by pooling the addresses. For
example, a static class Bnetwork address may accommodate up to 2
16
or 6S,S36
static hosts. By pooling the approximately 6S thousand addresses and allocating
each to an active session on an as-needed basis, it is possible to support millions
of IP hosts, assuming that no more than 6S thousand are active at the same
time. For this reason, when you access the Internet through an ISp, the IP
address of your computer may vary from one logon session to the next.
Most of us have problems memorizing a 32-bit string, even with the aid of the
dotted decimal notation. Hence, a symbolic name for identifying a host is
preferable. That is why the Domain Name System (DNS) was adopted by the
Internet community. The acronym DNS also expands to Domain Name Service,
which refers to the service provided by a Domain Name System. Every time you
use email or browse a Web page, you identify an Internet host using a domain
name based on the DNS protocol.
50 CHAPTER 1 Distributed Computing, An Introduction
Every domain name contains two or more components separated by dots. In an
address such as acme.com, the last component, com in this case, is called the
top-level domain. To the left of the dot in that name, acme in this case, is what
is called the second-level domain. It is also possible to have subdomains, such
as marketing.acme.com. Domain names are not case-sensitive; that is, there is
no distinction between the uppercase and the lowercase of the same character
when a name is spelled.
Currently, the top-level domains are classified as shown in Table 1.2 [Brain, 15).
Table 1.2 Top-level Domain Names
.com For commercial entities, which anyone, anywhere in the world, can
register.
.net Originally designated for organizations directly involved in Internet
operations. This domain is increasingly being used by businesses when
the desired name under .com is already registered by another organiza.
tion. Today anyone can register a name in the .net domain.
.org For miscellaneous organizations, including nonprofits.
.edu For four-year accredited institutions of higher learning.
.gov For U.S. federal government entities.
.mil For the U.S. military.
Country For individual countries based on the International Standards
codes Organization; for example, .ca for Canada, and .IP for Japan. See
[Connolly, 181 if you are interested in a list of the country codes.
The second-level domain combined with the first-level domain (e.g.,
calpoly.edu) typically, but not necessarily, maps to the network portion of an IP
address while the rest of the domain name (e.g., www.csc) serves to identify the
subnet, if any, and the host name. See Figure LIB for a pictorial depiction.
Each domain name is mapped to a corresponding IP address, although the map-
ping may not be permanent. For example, the domain name ebay.com currently
maps to the IP address 216.32.120.133. The mapping of a domain name to its
current corresponding IP address, and vice versa, can be performed using a net-
work service known as DNS naming resolution. Exercise 4 shows you a way to
experiment with this service.
Finally, the domain name local/lOst can be used to refer to the computer on
which the process is run. The name is always mapped to the IP address 127.0.0.1
and simply addresses "this computer."
Once a host is located using either an IP address or domain name, we can then
identify individual resources on that host. In the following paragraphs we will
1.6 Network Basics 51
(Root domain)
.au ... .ea ... .us ... .ZW
~
Country code
.com .gov .edu .mil .net .org
ucsb.edu calpoly.edu
A \ ~ ~
CS . eee
Figure 1.18 Domain name hierarchy.
esc .... ee english ... wireless
look at three examples of such schemes for identifying a process, an email recip-
ient, and Web documents, respectively.
Identifying Processes with Protocol Ports
Specifying the correct domain name or its corresponding IP address allows us to
locate a computer or host on the Internet. But in network applications, data
needs to be delivered to a specific process running on a computer. Thus we need
a naming scheme to allow us to uniquely identify such a process. There are any
number of possible schemes to do so. For example, one possibility is to make
use of a unique process identifier (PID) assigned to the process by the operat-
ing system (see exercise 4). On the Internet, the protocol for process identifica-
52 CHAPTER 1 Distributed Computing, An Introduction
tion involves the use of a logical entity known as a protocol rort, or a port for
short. Recall that the transport layer in the Internet architecture is responsible
for dispatching data to processes, and that t",o well-known protocols are in use
at this layer: TCP and UDP. Each of these protocols uses a separate set of ports
on each host for this purpose. A process that wishes to exchange data with
another process using either TCP or UDP must be assigned one of the ports. To
send data to a process currently associated with port p on host H, an application
must address the data to (H, p) in the code. In Ipv4, there are 2
16
ports (port 0
to port 65,535) on each host, under either TCP or UDP. For example, when
browsing a Web site, you are typically making use of the service of a process
running on TCP port 80 on the host (such as www.calpoly.edu) you specified.
In the TCP and UDP protocols, protocol numbers 0 through 1023 (2
10
) are
reserved for well-known services. Known as well-known ports, these numbers
are assigned by the Internet Assigned N u m ~ r s Authority (lANA) [isoc.org, 38],
and on some systems these numbers can only be used by system processes or by
programs executed by privileged users. Each popular network service, such as
tel net, FTP, HTTP, or 5MTP, is assigned one of these port numbers (23, 21, 80,
25, respectively) where the process providing the service can be expected to be
reached. We will have many occasions to specify purt numbers in our coding
examples.
Email Addresses
An email address takes the form of username@DomainName. For example,
mliu@Csc.calpoly.edu identifies the author of this book. When you send an
email identifying that email address as the recipient, a mailer program on the IP
host with the specified domain name delivers the email to the mailbox of the
specified user on that system, in this case the author of this book.
URLs
Users of Web browsers are familiar with Uniform Resource Locators (URLs).
When you enter a name string such as http://www.csc.calpoly.edu in the
browser to visit a particular Web site, you are specifying a URL.
A URL is a naming scheme under the more general scheme known as Uniform
Resource Identifiers (URls). URJs are short strings that identify resources on the
Web, including documents, images, downloadable files, services, and electronic
mailboxes. The URJ scheme provides a uniform way of addressing these resources
under a variety of naming schemes used in individual application protocols such
as HTTP, FTP, and Internet mail. By design, the URJ scheme is an extensible
scheme so that it can support more protocol naming schemes over time.
URL is an informal term associated with popular URI schemes for protocols such
as HTTP, FTP, and mail to.
1.6 Network Basics 53
The Uniform Resource Name (URN) is a scheme specified by RFC2141 and
related documents, intended to serve as persistent, location-independent,
resource identifiers. AURN provides persistent names within a namespace, thus
allowing a permanent object to be mirrored over several known sites; if a site is
unavailable, the object could be found/resolved at another site. Several propos-
als for URNs eXist, but none of them has been widely adopted yet [aboutdo-
mains.com, 16].
Although informal, the URL is by far the best known of these terms. AURL pro-
vides a nonpersistent (that is, not necessarily permanent) means to uniquely
identify an object within a namespace. Anamespace, in the context of a nam-
ing system, refers to the set of names that the system provides. In its most gen-
eral form, the format of a URL is
<protocol>/ / <user>:<password>@<host-ID>:<port-number>/<directory path>
where
<protocol> is the exact but case-insensitive name of the application-layer
protocol you wish to use to access the resource; for example, HITP if you are
attempting to access a Web browser;
<user>:<password> is for access authorization, if required by the protocol;
<host-ID> is the domain name or dotted-decimal IP address of the host that
provides the service allowing you to access the protocol; for example,
www.calpoly.edu;
<port-number> is the transport-layer protocol port for the process that pro-
vides the service on the remote host; for example, 80 (by default) for HITP
or Web servers;
<directory path> specifies where in the file system of the remote host the
resource can be located; for example, -mliulCsc102/index.html.
When you are entering a URL in a browser, you may skip the specifications of the
protocol (in which case HITP is assumed), the user:password (not used in HITP),
the port-number (80 by default), and the directory path (the root of the docu-
ment directory hierarchy is assumed). For example, the URL www.csc.
calpoly.edu entered to Netscape specifies the home page of the California
Polytechnic State University at San Luis Obispo, to be fetched from the host
with the domain name www.csc.calpoly.edu. running on port 80.
A shortened form of a URL, termed a relative URL, can be used at times.
During a session when a document (say http://www.csc.calpoly.edu/index.
html) is accessed, you can use a relative URL to name another file in the same
directory, to be fetched via the same Web server. For example, if another file
exists in that same directory calied courses.html, then the URL courses.html
can be named in that document in lieu of the full URL, http://www.csc.
calpoly.edu/courses.html.
54 CHAPTER 1 Distributed Computing, An Introduction
Unicode is a standard
for representing charac-
ters. According to the
Unicode Home Page,
"Unicode p r o ~ i d e s a
unique [numerical repre-
sentation] for every
character, no matter
what the platform, no
matter what the pro-
gram, no matter what
the language"
[unicode.org, 29].
Extensible Name Service
Extensible Name Service (XNS) is an Internet naming service managed by the
XNS Public Trust Organization (XNSORG), an independent, open-forum organ-
ization. The service supports a naming scheme that allows a single, universal
address to be used by a user to perform "communications of all types--email,
phone, fax, Web pages, instant messaging, even postal mail. ... As a naming
and addressing service, XNS operates at a higher level than DNS. DNS is
designed to resolve a name into the address of an Internet host computer. XNS
is designed to resolve a universal address into any type of address on any type
of communications network. You could say that XNS is to DNS what DNS is to
a phone number (and, in fact, XNS uses DNS to resolve the Internet address of
an XNS agency)" [omg.org, 27]. An XNS is a character string. There are three
types of XNS names: personal names, business names, and general names, each
of which starts with a unique leading character (=, @, and +, respectively) and
each of which can contain up to 64 Unicode characters.
Name Resolution
Whenever a symbolic name is used to identify a resource, the name must be
translated to the corresponding physical address in order to locate the resource.
We have already seen that a domain name such as
someComputer.someDivision.someCompany.com
for an Internet host must be translated to the numerical address, say
129.65.123.7, of that particular computer. The process of the translation is
called name resolution, or more simply, name lookup.
To perform name resolution, a database (also called a directory or a registry)
must exist containing the mappings between symbolic names and physical
names. If the namespace of a naming scheme is of a limited size, then it is pos-
sible to perform name resolution manually. In the case of the DNS or XNS, a
manual process is out of the question; instead, a network service has to be pro-
vided to support online name resolution.
For the DNS, the name lookup service is provided by machines that are called
DNS servers. A central authority maintains the name database and sees to it
that the database is distributed throughout the Internet to the DNS servers.
When a domain name is specified-whether entered into a browser or coded in
a program being executed-the name is submitted to the nearest DNS server for
resolution. If the nearest server does not have the mapping, that server forwards
the request to another DNS server. 'The propagation of the request continues
until the name is resolved, at which time the mapping is sent back to the
process that originated the request.
In later chapters of this book we will have many occasions to work with nam-
ing schemes and their supporting facilities.
1.7 Software Engineering Basics 55
1.7 Software Engineering Basics
Software engineering is a discipline in computer science that covers the process
of developing applications. Although this book provides the technical back-
ground for building network applications, it is not intended to cover the process
of developing such applications. At the same time, some of the basic concepts
from the discipline of software engineering will be relevant to our discussions.
These concepts are introduced in this section.
Procedural versus Object-Oriented Programming
In building network applications, there are two main classes of programming
languages: procedural language and object-oriented language. (Although there
are other classes of languages, such as functional language, they are not widely
used in network applications.)
Procedural languages-the C language being the primary example-use proce-
dures to break down the complexity of the tasks of an application. For example,
an application may be coded using a procedure (also called a function,
although in some contexts the term procedure is used for a void function) to per-
form the input, another procedure to perform the computation, and a third pro-
cedure for generating the output.
Object-oriented languages. exemplified by Java, the language chosen for this
book, use objects to encapsulate the details. Each object simulates an object in
real life, carrying state data as well as behaviors. State data is represented as
instance data (in Java) or data members (in C++). Behaviors are represented as
methods.
The Unified Modeling Language
An important step in software engineering is the production of artifacts, or doc
uments, to record the conceptual design of the application being developed. For
readability, these documents should be written using a universal set of notations
and languages. The Unified Modeling Language (UML), developed by the
Object Management Group [omg.org, 27J. is such a facility. UML prOVides a
common set of language and notations "for specifying, visualizing, construct
ing, and documenting the artifacts of software systems" [omg.org, 271.
OMG-UML provides a rich set of tools for all aspects of software engineering.
the coverage of which belongs in software engineering courses. In this book we
will occasionally make use of one of the notations: UML class diagrams (and
only a subset of them), for documenting the relationships of some of the Java
classes that appear in our presentation. Figure I.l9 presents the subset of class
diagrams you will see in" this book.
56 CHAPTER 1 Distributed Computing, An Introduction
Aclass or interface is A class that implements a lalla interface
represented as follows: is represented as follows:
Class/Interface
Class Name
Name
Attributes Attributes
0
Interface Name
Operations
Operations
Class Bdepends Class Bimplements Class Binherits
on Class A. Interface A. from Class A.
Class A Interface A Class A
Attributes Attributes
Operations Operations Operations
A
.;.
i
,
, ,
, ,
, ,
, ,
, ,
Class B C1assB Class B
Attributes Attributes Attributes
Operations Operations Operations
Note: The style of the lines and the shape of arrowheads are significant.
Figure 1.19 Asubset of UML class diagrams.
The Architecture of Distributed Applications
The idea of using a multilayer architecture to organize the functionalities of a
data network can be applied to distributed applications. Figure 1.20 presents an
example of such an architecture.
Presentation
Application (Business) Logic
Figure 1.20 The architecture of distributed applications.
1.7 Software Engineering Basics 57
Using this architecture, the functionalities of a distributed application can be
classified in three layers:
The presentation layer provides the user interface. For example, if the appli-
cation is a shopping cart, this layer generates the set of Web pages that are
.iewable by a shopper using a browser.
The application logic layer provides the computation for the application.
This layer is also called the business logic layer for enterprise applications.
In a shopping cart application, this layer is responsible for such tasks as credit
verification and computing the dollar amounts of the orders, sales tax, and
delivery cost.
The service layer prOvides the underlying services needed to support the
functionalities of the top two layers. Services may include data access facili-
ties (such as a database management system), directory services for name
lookups (such as the Domain Name Service), and interprocess communica-
tion (which allows data to be exchanged among processes).
Of these layers, the service layer will be the focus of this book. The other two
layers belong in the realm of software engineering.
Toolkits, Frameworks, and (omponents
Toolkits, frameworks, and components are terms associated with software engi-
neering for enterprise systems (that is, large-scale commercial applications).
In the context of software development, a toolkit or framework is a collection
of classes, tools, and programming samples. As examples, the Java Development
Tookit ODK) is such a collection for developing Java programs, while Microsoft's
.NET framework is meant for building Web-based applications. It is assumed
that you are proficient with the JDK for devel0ping Java programs; other tool-
kits for distributed computing (for example, the Java Socket Toolkit) will be cov-
ered later in this book.
Component-based software development is an approach to building enterprise
software systems. Using this approach, software is developed and evolved by
assembling selected pre-engineered, pretested, and reusable software compo-
nents. This approach promotes software reuse and has the potential to signifi-
cantly reduce the cost and errors of development [Pour, 37J. The Enterprise Java
Bean (EJB) and Microsoft's Component Object Model (COM) are pl,atforms that
support component-based applications. Although these platforms are important
to enterprise distributed computing, their coverage is beyond the scope of this
book.
58 CHAPTER 1 Distributed Computing, An Introduction
Summary
In this introductory chapter, we have discussed the following topics:
What is meant by distributed computing and how it is related to or differ-
ent from terms such as distributed system and parallel computing.
The basic concepts of operating systems that are important to our study.
Such concepts include processes and threads.
Basic concepts in data communication that are relevant to this book. Such
topics include
Network architectures: the OSI model and the Internet model
Cono,,'Ction-oriented communication versus connectionless communication
Naming schemes for network resources, including
- The Domain Name System (DNS)
- The Extensible Name System (XNS)
- Protocol port numbers
- Uniform Resource Identifier (URI) and Uniform Resource Locator
(URL)
- Email addresses
Basic concepts in software engineering that are important to our study.
Such concepts include
Procedural programming compared to object-oriented programming
Class diagrams using the notations of the Unified Modeling Language
(UML)
The three-layered architecture of distributed applications consisting of (i)
the presentation layer, (ii) the application or business logic layer, and (iii)
the service layer
The terms toolkit, framework, and component in the context of software
engineering
Exercises
1. Distributed Computing
a. Consider distributed computing as defined in this chapter. For each of the
following activities, determine and explain whether it is an example of
distributed computing:
i. Using Excel on a stand-alone personal computer
ii. Web surfing
iii. Instant messaging
iv. Compiling and testing a Cobol program on a department machine that
has no network connection
v. Using the electronic mail on your department's computer to send a
message to yourself .
vi. Using apster.com to download music
b. In this exercise we will use a simplified mathematical model to analyze
failures in a distributed system. Explain your answers.
Suppose each computer in this question has a probability p of failing at
any time, p < 1.
i. If II computers are interconnected and the availability of each com-
puter is needed to maintain a service provided using distributed com-
puting involving these computers,
a. What is the probability p that the service will not be available at any
time, assuming that no other components in the distributed system
will fail? Express p as a mathematical function of Il and p.
b. Based on your answer for part a, what is the probability p when the
computing is not distributed at all, that is, for the case of Il = I?
c. Based on your answer for part a, use p =0.2 and Il =3 to compute
the probability p. How does that probability compare with the fail-
ure probability if the same computing is performed using mono-
lithic computing, that is, on one computer only?
ii. Now suppose the service provided using distributed computing
requires only one of the three computers, with the other two comput-
ers serving as backups (that is, each of the three computers, on its own,
is capable of providing the service). What is the probability that the
service will not be available at any time, assuming that no other com-
ponents in the distributed system will fail? How does the failure prob-
ability of this system compare with the failure probability if the same
computing is performed using monolithic computing, that is, on one
computer only?
Exercises 59
Napster.com is a digital
music service.
AudioGalaxy and KaZaA
offer similar services.
60 CHAPTER 1 Distributed Computing, An Introduction
e. Do research on either the Internet worm [Eichin and Rochlis, 21] or a virus
attack such as the I-Love-You virus [Zetter, 22) and summarize what each
one is and how it happened. Why are such occurrences significant in dis-
tributed computing? Can you think of some measures to avoid these prob-
lems?
d. Do research on fldistributed computing
ll
(Of, more accurately, collaborative
computing) projects such as seti@home [setiathome.ssI.berkeley.edu, 10)
and genome@home [genomeathome.stanford.edu, 23]. Choose one of
them. Write a report to (i) explain the objective of the project, (ii) explain
how the computing is performed in a distributed system, and (iii) explain
what you have to do to participate in the project.
e. Do research on the early days of the Internet (see sources
[vlmp.museophile.com, 1], [zakon.org, 2], [silkroad.com, 3J, or [Hafner and
Lyon, 4], for example) and write a short report on one of the key organiza-
tions and one of the prominent figures in the history of the Internet.
2. Concurrent Programming
a. Look up the online API specification for Java (java.sun.com, 20].
Choose the link for the Runnable interface and then the Thread class.
Browse each carefully, reading the specifications for the methods of each.
i. According to the specifications, which of the two, Runnable interface or
Thread class, is preferred if you only intend to implement the run
method? Why?
ii. What does the Thread class method sleep do? Write the Java state-
ment(s) that appears in the code for a thread to suspend the execution
of the thread for 5 seconds.
iii. What does the Thread class method activeCount do? What should the
method return in a program where three threads are spawned?
iv. The Thread class method stop is said to be deprecated. What is meant by
a deprecated method?
v. How many methods are there in the Runnable interface? Name each.
vi. How do you use the Rlmnable interface to create a thread? Explain.
b. Compile and run the Java class fiies shown in Figure 1.11 and provided in
the program sample folder. What is the outcome? Capture the output of
the run and write a paragraph to explain the output, paying special atten-
tion to the order of the lines of output.
e. Compile and run the Java class files shown in Figure 1.12. What is the out-
come? Capture the output of the run and write a paragraph to explain the
output, paying special attention to the order of the lines of output. Also,
how does the output compare with the output from part b (the second
part)?
d. Consider the following Java classes:
i. What do you expect the outcome to be when RunTilread3 is executed?
Compile and run it.
ii. Comment out the word syllcllronized in the heading of the method
update. Compile and run RUIIThread3 again. What is the outcome?
Explain.
public class RunThr.ad.3
(
public atatic void main (Strlnq(] Arg8)
(
lnt originalThreadCount - Thre.d.activecount( );
for (int i-Oj i+,+) (
Thread p new Thread{new SomeThread3{;
p.startC )i
System.out.printlnC-thread count- + Thread.activecount( ;
}
while (Thread.activecount{ ) > oriqinalThreadCount )(
II loop until all child threads have exited.
}
Count + SomeThread3.count)i
I
}/Iend class RunThreads3
clas8 &o.erhrdl implements Runnable {
static int.count-O;
SOrneThread3 ( )
super ( };
public void rune)
update( );
}
static public ayuchroai d void update ( }{
int mycount - count;
lnt second - (int)(Math.random( ) * 500.0);
try (
Thread.eleep(second);
}
catcb (Interr.uptedException e) {
}
myCount++;
count .. myCount;
System.out.println(-count--+count+
-; thread count.. - + Threed.activecount( ;
I
} Ilend class SomeThread3
Exercises 61
62 CHAPTER 1 Distributed Computing, An Introduction
3. Connection-Oriented versus Connectionless Communication
In this exercise we will use a simplified mathematical model to analyze the
trade-off between connectionoriented communication and connectionless
communication. Explain your answer.
On a certain network both forms of communication are proVided:
Using connection-oriented communication, it takes 50 seconds to estab-
lish a connection, after which a packet of up to 10 characters can be sent
in 1.0 seconds over the connection, in either direction.
Using connectionless communication, a packet of up to 10 characters can
be sent in 1.2 seconds (the sending of each packet takes slightly longer
than in the connection-oriented case, since each packet must find its way
to the receiver).
Suppose processes A and B exchange messages on this network. A initiates
the communication and sends to B a message of lOa characters, which are
partitioned into 10 packets. In reply, B sends a message of SO characters,
which are partitioned into S packets.
Assuming that there is no delay other than the time it takes for establish-
ing a connection (in the connection-oriented case) and for packet transmis-
sion:
a. How long does the session between A and B last, using connection-ori-
ented communication? Explain.
b. How long does the session between A and B last, using connectionless
communication? Explain.
c. How much data (in number of characters) must be exchanged between A
and B in order for the connection-oriented communication to yield a
shorter session than the connectionless communication? Explain.
4. Naming
a. What is the size of the address space (that is, the total number of addresses
allowable) in each of the five classes of the IPv4 addresses? Show your
computation.
b. Find out the II' network address assigned to your organization. What class
(A thorugh E) is it?
c. Find out the domain name of the Web server host of your organization.
What is its II' address?
d. A network program, Il5lookup, can be used to obtain DNS name-lookup
service. You can invoke this program in at least three ways:
On a.UNIX system, enter '15lookup at the system prompt.
On a Windows system, enter I/slookup at the prompt in a command
prompt window.
Browse to the site http://cc-www.uia.ac.be/ds/nsiookup.html.
Use the service to complete the follOWing table:
IP Address Domain Name
Exercises 63
127.0.0.1
ifLuio.no
ie.technion.ac.il
204.198.135.62
224.0.1.24
cse.cuhk.edu.hk
129.65.2.119
www.mit.edu
e. Complete the following table:
IP Address
18.181.0.31
Domain
;.me
Class of
Address (A-E)
Net 10 (in
dotted decimal
notation)
Host ID (in
dotted decimal
notation)
129.65.2.119
204.198.135.62
224.0.1.24
f. Using the country code top-level domains listed by the Internet Assigned
Numbers Authority liana.org, 19J to help, find out the domain name
country code for the following nations: ,
Armenia, Brazil, Canada, Cuba, Germany, Spain, France, Guatemala,
India, Mexico, Qatar, Singapore, Sweden, 1 Salvador, Turkey.
Identify the nation for each of the following country codes:
Td, tv, ZW, nz, ph, pk, eg, bt, ao.
g. Consider this URI: http://www.someSite.org:8081/foo/index.htm.
i. What is the protocol specified?
ii. What is the host name of the service?
iii. What is the port number of the process that provides the service?
iv. Where is the document located?
h. Look up the well-known port number assignments by browsing the page
hIIp://www.iana.org/assignments/port-numbers.
i. Which port number is assigned to each of these services: (i) FTP, (ii)
telnet, (iii) SMTP,and (iv) World Wide Web HTTP? Are these services
available using TCP, UDP, or both?
64 CHAPTER 1 Distributed Computing, An Introduction
ii. What services are assigned to ports 13 and 17, respectively?
iii. On a UNIX system, or from the command-prompt window of a
Windows system, one way that you can access a network service is by
issuing a command such as the following:
telnet<space><domain name or lP address of a system that you
know><space><port number assigned to the service>
For example, the command telnet foo.com 13 will access the service
provided by the process running on port 13 on the Internet host
telnet.foo.
Try using this method to access the services offered on port 13 on a
machine you know. Describe the outcome.
i. Instead of using the Internet scheme of using a protocol port number as
part of an address to deliver data to a process on a given host, consider an
alternative scheme where the process is located using a unique process ID
(PID), such as that assigned to each active process by the Unix operating
system. Note that a PID is assigned dynamically to a process when the
process is started, so that it is not possible to know ahead of a program's
execution what the ID will be at run time. The range of values for PLDs also
varies from system to system. What, if any, is the problem with such an
addressing scheme?
j. A naming scheme is said to allow location transparency [community-
ml.org, 25] if the scheme allows objects to be addressed without explicit
knowiedge of their physical location. For example, the U.S. phone num-
ber system is location transparent, since a caller does not need to know the
whereabouts of the callee when dialing up. The U.S. Postal Service address
system, on the other hand, does not allow location transparency, since
you must address the recipient with his/her physical address (excluding
post office box numbers, that is).
Consider each of the following naming schemes. For each, determine
whether it is location transparent. Justify your answer.
i. The Domain Name System (DNS).
ii. Uniform Resource Locator (URL)
iii. Uniform Resource Name (URN)
iv. Extensible Name Service (XNS)
5. UML Class Diagrams
a. Using the notations shown in Figure 1.19, draw the class diagram for the
classes shown in Figure 1.11.
b. Using the notations shown in Figure 1.19, draw the class diagram for the
classes shown in Figure 1.12.
References
1. The Virtual Museum of Computing, http://vlmp.mllseophile.com/computing.html
2. Hobbes' Internet Timeline-the definitive ARPAnet & Internet history,
llttp://wl.vw,zakotl.org/robert/intemet/time/ine/
3. The Silk Road Group, Ltd, A Brief History of Networking,
http://wwwsilkroad.com/llet-history.htm/
4. Katie Hafner and Matthew Lyon. Where Wizards Stay Up Late: The Origins of the
biUmer. ew York, NY: Simon &. Schuster, 1996.
5. Internet RFC/STD/FYI/BCP, Archives, http://www.faqs.org/rfcs/
6. Todd Campbell, "The first email message," PRETEXT MagaZine, 199B.
hnp://www.pretext.com/mar98I(eatures/slory2.hrm
7. http://www.sun,cvm/jini/overview/, September 2000.
8. webopedia, http://webopedia.internel.com
9. Alice E. Koniges. Industrial Strength Parallel Complltiflg. San Francisco, CA: Morgan
Kaufman Publishers, 2001.
10. SETI@home, the Search for Extraterrestrial Intelligence, http://setiathome.ssl.
berkeley.edu/
II. Java Resources,
12. William Stallings. Data and Computer Communications. Upper Saddle River, NJ:
Prentice Hall, 1999.
13. Andrew S. Tanenbaum. Computer Networks. Upper Saddle River, N): Prentice Hall,
1996.
14. Chuck Semeria, 3eom Corporation, Understanding II' Addressing: Everything You Ever
Wa,./led To Know, 1996, l1ttp://www.3com.com/olJ/er/pdfs/i/1(ra/corpi..fo/en_US/
50I 302.pdf
15. Marshall Brain, Question of the Day, http://www.howstuffworks.com/question549.htm.
HowStuffWorb
16. About Domains, about doma;m, http://www.aboutdomaills.comlNewslbasics.htm
17. The National Center for Supercomputing Applications (NCSA). A Beginner's Guide
to URLs, http://archive.nna.uiuc.edu/SDG/Experimental/demoweb/urJ-primer.htmf
18. Dan Connolly. Naming and Addressing: URIs, URLs, ..., http://www.w3.org/Addressing!
19. Internet Assigned Numbers Authority. Country Code Top-Level Domains,
http://www.ia.w.org/cctld/cctld.htm
20. Java 2 Platform v 1.4 API Specification, http://java.sun.com/j2se/I.4/tlocs/api/
imlex.html
21. Mark W. ichin and Jon A. Rochlis, Massachusetts Institute of Technology. With
Microscopes a"d Tweezers: An Anafysis of the 11ltemet Vims of 1988,
http://www.mit.edu/pef?pfe/eichin/virus/mai1l.htmf
22. Kim Zetter. PCWorld.com, uWhen Love Came To Town: A Virus Investigation,"
Monday, November 13, 2000.
23. The genome@home project homepage, http://gefJomeathome.stanford.edu/
References 65
66 CHAPTER 1 Distributed Computing, An Introduction
24. Internet Assigned Numbers Authority, hNp:/fwww.iana.org/
25. Reference Model for Open Distributed Processing (RMODP),
ISO/IEC IS 1074611TUT X.900, http://community.ml.org/RMODP/
26. XN5-frequently asked questions, http://www.xns.org/, XNSORG.
27. What Is OMG UML? http://www.omg.org/gettingstarted/whaUs_uml.htrn
28. UML Notation Guide-Version 1.1, http://www.in(ormatik./h./uebeck.de/-sWML!
UMLJ.l/
29. Unicode Home Page, http://www.unicode.org/
30. Michael Fischer, Nancy Lynch, and Michael S. Paterson. "Impossibility of distrib
uted consensus with one faulty process." Proceedings of tile 2nd ACM Symposium on
Prindples o(Database Systems, pages 1-7, 1983.
31. Pankaj Jalote. Fault Tolerance in Distributed Systems. Upper Saddle River, N]: Prentice
Hall, 1994.
32. Scott Oaks. lava Security. Sebastopol, CA: O'Reilly Press, 2001.
33. distributed.net: ode Zero, http://www.distributed.net/
34. Internet Security Alliance, http://www.isalliance.org/
35. Andrew Tanenbaum. Computer Networks. Upper Saddle River, NJ: Prentice Hal1,
1996.
36. hrtp://www.iana.orglassignments/port-lIllmbers
37. Gilda Pour. "Web-Based Architecture for Component-Based Api.:'''ation
Generators." The lrltematiortal MulffcolI(erence ill Computer Science, Las Vegas,
Nevada, 2002.
38. Internet Society (ISOq, All About the Internet: Historv of the Internet,
hrtp://www.isoc.org/i1Jtemet/liistory/

You might also like