Distributed 2

Download as rtf, pdf, or txt
Download as rtf, pdf, or txt
You are on page 1of 15

Wolaita Sodo University

School Of Informatics

CS Department

Course Name : Introduction to Distributed System

Assignment-2

Name : Nebiyu Tekle

ID : cs/we/133/12

1. Explain Naming in distributed system.


Naming in a distributed system refers to the process of assigning
unique identifiers or names to various entities such as nodes,
resources, services, and objects within the system. Naming plays a
critical role in enabling communication, coordination, and
interaction between different components in a distributed
environment. Here are some key aspects of naming in distributed
systems:
· Addressing Entities: Naming provides a way to address and
identify entities in a distributed system. Each entity is assigned
a unique name or identifier that can be used to reference it
during communication or interaction. This allows components
to locate and access specific resources or services across the
network.

· Location Transparency: Naming helps achieve location


transparency by abstracting the physical location of entities
from the application logic. Clients can refer to entities by their
names without needing to know their exact network addresses
or locations. This decoupling of names from locations enables
flexibility in system design and enhances scalability.

· Name Resolution: Name resolution is the process of mapping


names to network addresses or references that can be used to
communicate with the corresponding entities. Distributed
systems often employ naming services or mechanisms to
resolve names to actual locations dynamically. This allows
clients to look up names and obtain the necessary information
to establish connections with remote entities.

· Dynamic Binding: Naming enables dynamic binding, where


entities can be associated with names at runtime and change
their bindings as needed. This flexibility allows for dynamic
reconfiguration, load balancing, and fault tolerance in
distributed systems. Entities can migrate or replicate across
nodes while retaining their names, facilitating seamless
communication and interaction.

· Hierarchical Naming: Hierarchical naming structures can


be used to organize entities in a distributed system into logical
groupings or namespaces. This helps manage the complexity of
naming and provides a systematic way to structure and access
resources. Hierarchical naming schemes can improve
scalability, organization, and navigation within the system.

· Naming Conventions: Establishing naming conventions and


standards is essential for ensuring consistency and clarity in a
distributed system. Consistent naming practices help
developers understand the relationships between entities,
promote interoperability, and facilitate maintenance and
troubleshooting activities.

2. Explain more briefly naming resolution and


DNS in distributed system.

Naming resolution in a distributed system refers to the process of


mapping logical names or identifiers to physical network addresses
or references that can be used to establish communication with the
corresponding entities. In a distributed environment, where
resources and services may be located across multiple nodes or
systems, naming resolution plays a critical role in enabling clients to
locate and access remote entities.
DNS (Domain Name System) is a key component of naming
resolution in distributed systems. DNS is a hierarchical, distributed
database that translates domain names (e.g., www.example.com)
into IP addresses (e.g., 192.0.2.1) that can be used to route network
traffic to the appropriate servers hosting the requested resources.
DNS servers store and manage mappings between domain names
and IP addresses, allowing clients to resolve names to the
corresponding network locations.
When a client needs to communicate with a remote entity in a
distributed system, it typically starts by resolving the entity's name
to obtain its network address. The client sends a query to a DNS
server, which recursively resolves the name by querying other DNS
servers in the hierarchy until it finds the authoritative server
responsible for the specified domain. The authoritative server then
provides the client with the IP address associated with the requested
name, enabling the client to establish a connection with the remote
entity.
DNS in distributed systems provides a scalable and efficient
mechanism for resolving names to network addresses, facilitating
seamless communication and interaction between components
located across different nodes or networks. By leveraging DNS
services, distributed systems can achieve location transparency,
dynamic binding, and efficient name resolution, enhancing their
overall functionality and performance.

3. Explain the layered protocols, data formats


in each layer of OSI reference models in DS.
The OSI (Open Systems Interconnection) reference model is a
conceptual framework that defines a standard way to understand
and organize the functions of a network protocol stack. The OSI
model consists of seven layers, each responsible for specific tasks
related to data communication in a network. In a distributed
system, the OSI model can help in understanding the different
layers involved in communication between distributed components.
Here's an overview of the layered protocols and data formats in each
layer of the OSI reference model in a distributed system:

1. Physical Layer (Layer 1)


· Protocols: At the physical layer, protocols define the electrical,
mechanical, and functional specifications for transmitting raw
data over a physical medium.

· Data Format: The data format at this layer consists of raw bits
and signals that represent the actual data being transmitted
over the network, such as voltage levels, modulation schemes,
and transmission rates.

2. Data Link Layer (Layer 2)


· Protocols: The data link layer is responsible for establishing,
maintaining, and terminating connections between nodes on a
local network.

· Data Format: The data format at this layer includes frames,


which encapsulate data packets with headers and trailers
containing control information like source and destination
addresses, error detection codes, and flow control mechanisms.

3. Network Layer (Layer 3)


· Protocols: The network layer is concerned with routing
packets between different networks and addressing schemes to
ensure data reaches its destination.

· Data Format: The data format at this layer includes packets,


which contain network addresses (e.g., IP addresses), routing
information, and other network-specific metadata required for
packet forwarding and delivery.

4. Transport Layer (Layer 4)


· Protocols: The transport layer provides end-to-end
communication services like reliable data delivery, error
recovery, and flow control.

· Data Format: The data format at this layer includes segments


or datagrams that encapsulate application data along with
transport-layer headers containing sequence numbers, port
numbers, and checksums.

5. Session Layer (Layer 5)


· Protocols: The session layer manages communication sessions
between applications, establishing, maintaining, and
terminating connections.

· Data Format: The session layer does not have a specific data
format but may include session identifiers or control
information to manage communication sessions.

6. Presentation Layer (Layer 6)


· Protocols: The presentation layer is responsible for data
translation, encryption, compression, and formatting to ensure
compatibility between different systems.

· Data Format: The data format at this layer includes formatted


data structures, encryption algorithms, and encoding schemes
for data representation and transformation.

7. Application Layer (Layer 7)


· Protocols: The application layer provides interfaces for
applications to access network services and protocols for
specific application-level functions.

· Data Format: The data format at this layer includes messages


or requests exchanged between applications, along with
application-specific data structures and protocols.

4. Explain remote procedure call, Remote


Method Invocation in DS.
Remote Procedure Call (RPC) and Remote Method Invocation
(RMI) are both mechanisms used in distributed systems to enable
communication between processes or objects running on different
machines.

1. Remote Procedure Call (RPC):


RPC is a protocol that allows a program on one machine to call a
procedure or function on another machine, as if it were a local
procedure call. It abstracts the complexities of network
communication and provides a simple and transparent way for
processes to communicate across a network.
In RPC, the client program makes a procedure call to a remote
server program, which executes the requested procedure and
returns the result to the client. The client and server can be running
on different machines connected over a network. The client program
does not need to know the details of how the remote procedure is
executed; it only needs to know the procedure's name and
parameters.
RPC provides a level of abstraction that makes it easier to develop
distributed applications. It hides the complexities of network
communication, serialization, and marshaling of data, allowing
developers to focus on the logic of the application rather than the
low-level details of communication.

2. Remote Method Invocation (RMI):


RMI is a Java-specific implementation of RPC. It allows objects in a
Java Virtual Machine (JVM) to invoke methods on remote objects
running in other JVMs. RMI provides a way for distributed Java
programs to communicate and interact with each other.
RMI extends the concept of RPC by allowing objects to be passed as
parameters and return values in method invocations. This means
that not only can remote methods be called, but objects can also be
passed between the client and server.
RMI provides a transparent mechanism for remote method
invocation, where the client program can invoke methods on remote
objects as if they were local objects. Behind the scenes, RMI handles
the network communication, serialization, and deserialization of
objects, making it easier for developers to build distributed Java
applications.
5. Explain the following terms:

a. Message Oriented Communications


b. Stream Oriented Communication
c. Global Time or clock
d. Synchronization in distributed system.

a. Message Oriented Communications:


Message Oriented Communications is a communication paradigm
used in distributed systems where communication between processes
or components is based on the exchange of messages. In this
approach, processes send messages to each other, and the
communication is typically asynchronous, meaning that the sender
does not wait for an immediate response from the receiver.
Message-oriented communication provides a flexible and decoupled
way of communication, as processes can send messages
independently of each other. Messages can contain data,
instructions, or requests, and they are typically sent over a network
using protocols such as TCP/IP or UDP.

b. Stream Oriented Communication:


Stream Oriented Communication is another communication
paradigm used in distributed systems, where communication
between processes is based on the concept of streams. In this
approach, data is transmitted as a continuous stream of bytes, and
the sender and receiver interact with the stream in a sequential
manner.
Stream-oriented communication is typically used for continuous
data transfer, such as audio or video streaming. It provides a reliable
and ordered delivery of data, ensuring that the data is received in
the same order as it was sent.

c. Global Time or Clock:


In a distributed system, where multiple processes or components are
running on different machines, it becomes challenging to establish a
common notion of time. Global Time or Clock refers to a
mechanism or algorithm used to synchronize the clocks of different
machines in a distributed system.
The concept of global time is crucial for various distributed
algorithms and protocols, such as distributed transactions, event
ordering, and distributed consensus. It allows processes to reason
about the order of events and coordinate their actions based on a
shared understanding of time.
There are different approaches to achieving global time
synchronization, such as using a centralized time server, distributed
algorithms like the Network Time Protocol (NTP), or logical clocks
like the Lamport logical clock or Vector clock.

d. Synchronization in Distributed Systems:


Synchronization in distributed systems refers to the coordination
and management of concurrent activities or processes running on
different machines to ensure consistency and correctness.
In a distributed system, multiple processes may be executing
concurrently and accessing shared resources or data.
Synchronization mechanisms are used to prevent conflicts and
ensure that processes coordinate their actions properly.
Some common synchronization techniques used in distributed
systems include:
1. Mutual Exclusion: Ensuring that only one process can access a
shared resource or critical section at a time, preventing
conflicts and maintaining data integrity.
2. Locking: Using locks or semaphores to control access to shared
resources, allowing processes to acquire and release locks to
ensure exclusive access.
3. Distributed Transactions: Coordinating multiple operations
across different machines as part of a transaction, ensuring
atomicity, consistency, isolation, and durability (ACID
properties).
4. Consensus Algorithms: Achieving agreement among a group of
processes on a particular value or decision, even in the
presence of failures or network partitions.

6. Explain clock synchronization according to


Cristian’s and Berkley’s Algorithms.
Clock synchronization is a crucial aspect of distributed systems,
where multiple processes or components are running on different
machines. Clock synchronization refers to the process of aligning the
clocks of different machines to a common notion of time.
There are different algorithms and techniques used for clock
synchronization in distributed systems, including Cristian's
Algorithm and Berkeley's Algorithm.
1. Cristian's Algorithm:
Cristian's Algorithm is a simple clock synchronization algorithm
that uses a time server to synchronize the clocks of different
machines in a distributed system. The algorithm works as follows:

· A client process sends a request to the time server, asking for


the current time.

· The time server responds with its current time.

· The client process calculates the time it took for the request to
reach the server and for the response to return.

· The client adjusts its clock by adding the round-trip time to the
server's time.

2. Berkeley's Algorithm:
Berkeley's Algorithm is a more sophisticated clock synchronization
algorithm that uses a time daemon to synchronize the clocks of
different machines in a distributed system. The algorithm works as
follows:

· The time daemon periodically polls the clocks of all the


machines in the system and calculates the average time.

· The time daemon sends the average time to all the machines in
the system.

· Each machine adjusts its clock by the difference between its


current time and the average time.
Berkeley's Algorithm accounts for variations in network delay and
clock drift by periodically recalculating the average time and
adjusting the clocks of all the machines in the system. It provides a
more accurate and robust clock synchronization mechanism than
Cristian's Algorithm.

7. Explain logical clock synchronization by


using Lamport’s Algorithms and vector clock in
distributed system.

Logical clock synchronization is a fundamental concept in


distributed systems, allowing processes to establish a partial
ordering of events and maintain causality among distributed events.
Two well-known algorithms for logical clock synchronization are
Lamport's Logical Clocks and Vector Clocks.

1. Lamport's Logical Clocks:


Lamport's Logical Clocks, proposed by Leslie Lamport, provide a
simple mechanism for establishing a partial ordering of events in a
distributed system. The key idea behind Lamport's Logical Clocks is
to assign a logical timestamp to each event, allowing processes to
compare the ordering of events across different machines.
The algorithm works as follows:

· Each process maintains a logical clock, initially set to 0.

· When an event occurs at a process, it timestamps the event


with its current logical clock value and increments the logical
clock.

· When a process sends a message, it includes its current logical


clock value in the message.
· Upon receiving a message, the receiving process updates its
logical clock to be greater than the maximum of its current
logical clock and the timestamp in the received message, then
increments its logical clock.
Lamport's Logical Clocks ensure that if event A happened before
event B according to the logical clocks, then event A causally
precedes event B. However, it does not capture the concurrency of
events that are causally independent.

2. Vector Clocks:
Vector Clocks extend the concept of logical clocks to capture the
concurrency of events and provide a more precise ordering of events
in a distributed system. Each process maintains a vector of logical
clock values, with each element of the vector corresponding to a
different process in the system.
The algorithm works as follows:

· Each process maintains a vector clock initialized with zeros for


all processes.

· When an event occurs at a process, it increments its own entry


in the vector clock.

· When a process sends a message, it includes its current vector


clock in the message.

· Upon receiving a message, the receiving process updates its


vector clock by taking the maximum of each element in its
vector clock and the corresponding element in the received
vector clock, then increments its own entry in the vector clock.
Vector Clocks provide a more precise ordering of events by
capturing the concurrency of events that are causally independent.
They allow processes to determine the relationship between events,
including causality and concurrency, in a distributed system.

You might also like