Distributed Shared Memory

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 109

Distributed Shared Memory

Memory Basics
• Physical memory Alternatively referred to as the physical storage or the real
storage, physical memory is a term used to describe the total amount of memory
installed in the computer. For example, if the computer has two 64MB memory
modules installed, it has a total of 128MB of physical memory.

• Global Memory Computer storage that can be used by a number of processors


connected together in a multiprocessor system.

• Virtual Memory is a part of the hard disk which is used as a memory. It has a set of
memory addresses and stores the instructions or the data.
• Shared Memory Global physical memory equally accessible to all processors
• Programming ease and portability
• Increased contention and limit scalability

• Distributed memory Multiple independent processing nodes connected by a general


interconnection network
• Scalable, but requires message passing
• Programmer manages data distribution and communication
• All systems providing a shared-memory abstraction on distributed memory
system belongs to the DSM category

• DSM system hides remote communication mechanism from programmer

• Relatively easy modification and efficient execution of existing shared memory


system application

• Scalability and cost are similar to the underlying distributed system


Types of Shared Memory
Architectures
On – Chip Memory
Bus based Multiprocessors in
Shared Memory
Ring based Multiprocessors in
Shared Memory
• Ring-Based Multiprocessors- Memnet is a ring-based multiprocessors, In
Memnet, a single address space is divided into a private part and a shared part.

• The private part is divided up into regions so that each machine has a piece for its
stacks and other unshared data and code. The shared part is common to all
machines The machines in a ring-based multiprocessor can be much more loosely
coupled.

• All the machines in Memnet are connected together in a modified token passing
ring. The ring consists of 20 parallel wires, which together allow 16 data bits and 4
control bits to be sent every 100 nsec, for a data rate of 160 Mbps.
Switched Multiprocessors in
Shared Memory

• Switched Multiprocessors-Build the system as multiple clusters and connect the


clusters using an inter cluster bus. As long as most CPUs communicate primarily
within their own cluster, there will be relatively little inter cluster traffic.

• If one inter cluster bus proves to be inadequate, add a second inter cluster bus, or
arrange the clusters in a tree or grid.

• If still more bandwidth is needed, collect a bus, tree, or grid of clusters together
into a super-cluster, and break the system into multiple super clusters.

• The super clusters can be connected by a bus, tree, or grid shows a system with
three levels of buses.
Multiprocessor Architecture

• A global, common memory exists


• The memory blocks maybe cached in local memories
• Connection is fast and tight (common bus, switch etc.)
• Main goal is performance, based on the locality principle
Distributed Shared Memory Architecture

• No global memory
• The local memories form a global, virtual memory
• Connection is relatively slow and loose (LAN, MAN …)
• Scalability
• Transparency of
• information sharing
• communication
• Cluster based architecture
• Why DSM?
• Direct information sharing programming
paradigm (transparency)
• Multilevel memory access (locality)
• Wealth of existing programs (portability)
• Large physical memory
• Scalable multiprocessor system
DSM
• Distributed shared memory (DSM) tries to combine the advantages of two
classes of systems: shared memory systems, having a single global physical
memory, equally accessible to all processors, and distributed memory
systems, that consist of multiple processing nodes communicating by
means of message passing.

• A DSM system provides a logical abstraction of shared memory which is


built using a set of interconnected nodes having physically distributed
memories.

• The DSM system, implemented in hardware and/or software, hides the


remote communication mechanism from the application writer, so the
ease of programming and the portability typical of shared memory
systems, as well as the scalability and cost-effectiveness of distributed
memory systems are both inherited
• Provide the usual programming model of shared memory in a
generally loosely coupled – distributed environment.
• Shared Memory
• Easy to program
• Difficult to build
• Tight coupling in hardware

• Distributed Memory
• Scales well
• Difficult to program
• Message passing or RPC based
• Solution: DSM (Distributed Shared Memory)
• Logically shared memory
• Physically distributed local memories

• Distributed shared memory (DSM) is a form of memory architecture where the


(physically separate) memories can be addressed as one (logically shared) address
space. Here, the term shared does not mean that there is a single centralized
memory but shared essentially means that the address space is shared (same
physical address on two processors refers to the same location in memory)

• Page based
• Shared pages
• Demand paging between nodes

• Variable based
• Shared Variables

• Object based
• Shared encapsulation units
• Easier to optimize
Page Based Distributed
Shared Memory
• Basic Design- Distributed Shared Memory emulate the cache of a multiprocessor
using the MMU( Memory Management Unit) and operating system software.

• In a DSM system, the address space is divided up into chunks, with the chunks being
• spread over all the processors in the system.

• When a processor references an address that is not local, a trap occurs, and the DSM
software fetches the chunk containing the address and restarts the faulting
instruction, which now completes successfully.

• Replication- To replicate chunks that are read only, for example, program text, read
only constants, or other read-only data structures.
• (a) Chunks of address space distributed among four machines.
• (b) Situation after CPU 1 references chunk 10.
• (c) Situation if chunk 10 is read only and replication is used.
Granularity
• The problem of false sharing is that although the variables are unrelated, since
they appear by accident on the same page, when a process uses one of them, it
also gets the other.

• The larger the effective page size, the more often false sharing will occur, and
conversely, the smaller the effective page size, the less often it will occur.

• Clever compilers that understand the problem and place variables in the address
space accordingly can help reduce false sharing and improve performance
• Finding the owner- how to find the owner of the page.

 First solution is by doing a broadcast, asking for the owner of the specified page to
respond.

 Second solution is that, one process is designated as the page manager. It is the
job of the manager to keep track of who owns each page.

• When a process, P, wants to read a page it does not have or wants to write a page
it does not own, it sends a message to the page manager telling which operation it
wants to perform and on which page.

• The manager then sends back a message telling who the owner is. P now contacts
the owner to get the page and/or the ownership, as required.

• Two approaches to find ownership location using four message protocol & using
three message protocol
• Finding the Copies-The detail of how all the copies are found when they must be
invalidated. Two possibilities present.

 one is to broadcast a message giving the page number and ask all processors
holding the page to invalidate it. This approach works only if broadcast messages
are totally reliable and can never be lost.
 Second possibility is to have the owner or page manager maintain a list or copyset
telling which processors hold which pages.

• When a page must be invalidated, the old owner, new owner, or page manager
sends a message to each processor holding the page and waits for an
acknowledgement. When each message has been acknowledged, the invalidation
is complete.
• Page Replacement-In a DSM system, it can happen that a page is needed but there is no
free page frame in memory to hold it. When this situation occurs, a page must be evicted
from memory to make room for the needed page.
• Two sub problems immediately arise: which page to evict and where to put it.

• Different approaches are there-


 Use least recently used (LRU) algorithm.
 A replicated page that another process owns is always a prime candidate to
evict because it is known that another copy exists.
 If no replicated pages are suitable candidates, a non replicated page must be
chosen

Two possibilities for non replicated pages


 First is to write it to a disk, if present.
 The other is to hand it off to another processor.
• Synchronization- In a DSM system, processes often need to synchronize their actions. E.g.
mutual exclusion, in which only one process at a time may execute a certain part of the code.

• In a multiprocessor, the TESTAND- SET-LOCK (TSL) instruction is often used to implement


mutual exclusion. A variable is set to 0 when no process is in the critical section and to 1 when
one process is.

• If one process, A, is inside the critical region and another process, B, (on a different machine)
wants to enter it, B will sit in a tight loop testing the variable, waiting for it to go to zero.

• Use synchronization manager (or managers) that accept messages asking to enter and leave
critical regions, lock and unlock variables, and so on, sending back replies when the work is
done.

• When a region cannot be entered or a variable cannot be locked, no reply is sent back
immediately, causing the sender to block. When the region becomes available or the variable
can be locked, a message is sent back.
Shared variable Distributed
Shared Memory
• Munin- Munin is a DSM system that is fundamentally based on software objects.
The goal of the Munin project is to take existing multiprocessor programs, make
minor changes to them, and have them run efficiently on multicomputer systems
using a form of DSM.

• Synchronization for mutual exclusion is handled in a special way. Lock variables may
be declared, and library procedures are provided for locking and unlocking them.
Barriers, condition variables, and other synchronization variables are also supported.

• Release Consistency-Writes to shared variables must occur inside critical regions;


reads can occur inside or outside. While a process is active inside a critical region, the
system gives no guarantees about the consistency of shared variables, but when a
critical region is exited, the shared variables modified since the last release are
brought up to date on all machines.
• Multiple Protocols-Munin also uses other techniques for improving performance.
• Declaration of shared variable by one of four categories:

 Read-only- Read-only variables are easiest. When a reference to a readonly


variable causes a page fault, Munin looks up the variable in the variable directory,
finds out who owns it, and asks the owner for a copy of the required page.

 Migratory- Migratory shared variables use the acquire/release protocol.


They are used inside critical regions and must be protected by synchronization
variables. The idea is that these variables migrate from machine to machine as critical
regions are entered and exited.

 Write-shared- A write-shared variable is used when the programmer has indicated


that it is safe for two or more processes to write on it at the same time, for example,
an array in which different processes can concurrently access different sub arrays.
• Directories-Munin uses directories to locate pages containing shared variables.
When a fault occurs on a reference to a shared variable, Munin hashes the virtual
address that caused the fault to find the variable's entry in the shared variable
directory.

• Synchronization-Munin maintains a second directory for synchronization


variables. When a process wants to acquire a lock, it first checks to see if it owns
the lock itself.

• If it does and the lock is free, the request is granted. If the lock is not local, it is
located using the synchronization directory, which keeps track of the probable
owner. If the lock is free, it is granted. If it is not free, the requester is added to the
tail of the queue.
Comparison of IPC paradigms
Advantages & Disadvantages of DSM

Advantages of DSM:-
• Simpler abstraction - programmer does not have to worry about data movement

• Easier portability - sequential programs can run directly on DSM systems gives
possibly better performance

• Locality of data - data moved in large blocks which helps programs with good
locality of reference having on-demand data movement

• Larger memory space - no need to do paging on disk

• Flexible communication - no need for sender and receiver to exist, can join and
leave DSM system without affecting the others

• Process migration simplified - one process can easily be moved to a different


machine since they all share the address space
Disadvantages of DSM:-
• Programmers need to understand consistency models, to write correct programs

• DSM implementations use asynchronous message-passing, and hence cannot be more


efficient than synchronous message-passing implementations

• By having control to DSM manager software, programmers cannot use their own
message-passing solutions.

• Synchronous Messages
• Synchronous messaging involves a client that waits for the server to respond to a
message. Messages are able to flow in both directions, to and from. Essentially it means
that synchronous messaging is a two way communication. i.e. Sender sends a message to
receiver and receiver receives this message and gives reply to the sender. Sender will not
send another message until get reply from receiver.
• Asynchronous Messages
• Asynchronous messaging involves a client that does not wait for a message from the
server. An event is used to trigger a message from a server. So even if the client is
down , the messaging will complete successfully. Asynchronous Messaging means
that, it is a one way communication and the flow of communication is one way only.
Design Issues In DSM
5 key design issues when accessing data in
the DSM address space
• DSM Implementation

• Consistency(DSM Memory coherence): Legal ordering of memory


references issued by a processor, as observed by other processors

• Thrashing

• Replication/Migration

• Responsibility of DSM management


Algorithms for implementing DSM
• Issues
- How to keep track of the location of remote data
- How to minimize communication overhead when accessing remote data
- How to access concurrently remote data at several nodes
1. The Central Server Algorithm
- Central server maintains all shared data
• Read request: returns data item
• Write request: updates data and returns acknowledgement message
- Implementation
• A timeout is used to resend a request if acknowledgment fails
• Associated sequence numbers can be used to detect duplicate write requests
• If an application’s request to access shared data fails repeatedly, a failure condition is sent to the
application
- Issues: performance and reliability
- Possible solutions
• Partition shared data between several servers
• Use a mapping function to distribute/locate data
Algorithms for implementing DSM
2. The Migration Algorithm
- Operation
• Ship (migrate) entire data object (page, block) containing data item to requesting location
• Allow only one node to access a shared data at a time

- Advantages
• Takes advantage of the locality of reference
• DSM can be integrated with VM ( virtual memory) at each node
- Make DSM page multiple of VM page size
- A locally held shared memory can be mapped into the VM page address space
- If page not local, fault-handler migrates page and removes it from address space at remote
node
- To locate a remote data object:
• Use a location server
• Maintain hints at each node
• Broadcast query
- Issues
• Only one node can access a data object at a time
• Thrashing can occur: to minimize it, set minimum time data object resides at a node
Algorithms for implementing DSM
3. The Read-Replication Algorithm
– Replicates data objects to multiple nodes
– DSM keeps track of location of data objects
– Multiple nodes can have read access or one node write access (multiple
readers-one writer protocol)
– After a write, all copies are invalidated or updated
– DSM has to keep track of locations of all copies of data objects.
Examples of implementations:
• IVY: owner node of data object knows all nodes that have copies
• PLUS: distributed linked-list tracks all nodes that have copies
– Advantage
• The read-replication can lead to substantial performance improvements if
the ratio of reads to writes is large
Algorithms for implementing DSM
4. The Full–Replication Algorithm
- Extension of read-replication algorithm: multiple nodes can read
and multiple nodes can write (multiple-readers, multiple-writers
protocol)
- Issue: consistency of data for multiple writers
- Solution: use of gap-free sequencer
• All writes sent to sequencer
• Sequencer assigns sequence number and sends write request
to all sites that have copies
• Each node performs writes according to sequence numbers
• A gap in sequence numbers indicates a missing write request:
node asks for retransmission of missing write requests
DSM Memory coherence
• DSM are based on
- Replicated shared data objects
- Concurrent access of data objects at many nodes

• Coherent memory: when value returned by read operation is


the expected value (e.g., value of most recent write)

• Mechanism that control/synchronizes accesses is needed to


maintain memory coherence
DSM Memory Coherence algorithms

• Maintain coherence among replicas


• Single reader/single writer algorithms
• Prohibits replication, central server algorithm
• One unique server handles all requests from other nodes to shared data
• Only one copy of data item can exist at one time
• Improvement- distribution of responsibilities for parts of shared address space
and static distribution of data
• Performance is very low
• Does not use the parallel potential of multiple read or write

• Multiple reader/single writer algorithms


• Reduce cost of read operations because read is the most used pattern in
parallel applications
• Only one host can update a copy
• One write will invalidate other replicated copies which increases the cost of
write operation
• Multiple reader/Multiple writer algorithms
• Allows replication of data blocks with both read and write
• Cache coherency is difficult to maintain. Updates must be distributed to all other
copies on remote sites
• Write update protocol
• High coherence traffic
Memory consistency models
• Consistency requirements vary from application to application. A
consistency model basically refers to the degree of consistency that
has to be maintained for the shared – memory data for the
memory to work correctly for a certain set of applications.
or
• It is defined as a set of rules that applications must obey if they
want DSM system to provide the degree of consistency guaranteed
by the consistency model.

or

• The consistency model defines the ordering of writes and


reads to different memory locations
Consistency models
• Refers to how recent the shared memory updates are
visible to all the other processes running on different
machines
Strict consistency
• A shared-memory system is said to support the strict consistency model if the value
returned by a read operation on a memory address is always the same as the value
written by the most recent write operation to that address, irrespective of the
locations of the processes performing the read and write operations. That is, all
writes instantaneously become visible to all processes."
   
Sequential consistency
• Sequential consistency
• A shared-memory system is said to support the sequential consistency model if all
processes see the same order of all memory access operations on the shared
memory. The exact order in which the memory access operations are interleaved
does not matter. ... If one process sees one of the orderings of ... three operations
and another process sees a different one, the memory is not a sequentially
consistent memory."
Causal consistency
• "The causal consistency model ... relaxes the requirement of the
sequential model for better concurrency. Unlike the sequential consistency
model, in the causal consistency model, all processes see only those
memory reference operations in the same (correct) order that are
potentially causally related. Memory reference operations that are not
potentially causally related may be seen by different processes in different
orders."
PRAM /FIFO consistency
• (Pipelined RAM), which means: Writes done by a single process are received
by all other processes in the order in which they were issued, but writes
from different processes may be seen in a different order by different
processes.
Processor consistency
• Order in which the memory operations are seen by two processors need
not be identical, but the order of writes issued by each processor must be
preserved
• PRAM +coherency on the same data item - all processes agree on the
order of write operations to the same data item.
• In Example 1 the simple system follows Processor Consistency, as all the
writes by each processor are seen in the order they occurred in by the
other processors, and the transactions are coherent. Example 2 is NOT
Processor Consistent, as the writes by P1 and P3 are seen out of order by
P2 and P4 respectively.
Weak consistency model
• Accesses to synchronization variables are sequentially consistent
• No access to a synchronization variable is issued by a processor before all
previous read/write data accesses have been performed (i.e., synch waits
until all ongoing accesses complete)
• No read/write data access is issued by a processor before a previous
access to a synchronization variable has been performed (i.e., all new
accesses must wait until synch is performed)
• Use a special variable called the synchronization variable
Release consistency
• The synchronization access (synch(S)) in the weak consistency model can
be refined as a pair of acquire(S) and release(S) accesses. Shared variables
in the critical section are made consistent when the release operation is
performed.
• (i.e., S “locks” access to shared variables it protects, and release is not
completed until all accesses to them are also completed).
Entry /Scope consistency

• Acquire and Release are applied to general variables.


• Each variable has an implicit synchronization variable that may be
acquired to prevent concurrent access to it.
Summary of Consistency Models
Consistency models not using synchronization operations.
Consistency Description
Strict Absolute time ordering of all shared accesses matters.
All processes must see all shared accesses in the same order. Accesses are
Linearizability
furthermore ordered according to a (nonunique) global timestamp
All processes see all shared accesses in the same order. Accesses are not ordered in
Sequential
time
Causal All processes see causally-related shared accesses in the same order.
All processes see writes from each other in the order they were used. Writes from
FIFO
different processes may not always be seen in that order

Models with synchronization operations.


Consistency Description
Weak Shared data can be counted on to be consistent only after a synchronization is done
Release Shared data are made consistent when a critical region is exited
Entry Shared data pertaining to a critical region are made consistent when a critical region is
entered.
Comparison of consistency models

• Models differ by difficulty of implementation, ease of use, and


performance
 
• strict consistency — most restrictive, but hard to implement

• sequential consistency — widely used, intuitive semantics, not much


extra burden on programmer but does not allow much concurrency

• causal & PRAM consistency — allow more concurrency, but have


non-intuitive semantics, and put more of a burden on the
programmer to avoid doing things that require more consistency

• weak and release consistency — intuitive semantics, but put extra


burden on the programmer
Thrashing
• Thrashing is said to occur when the system spends a large amount of time
transferring shared data blocks from one node to another, compared to the
time spent doing the useful work of executing application processes. It is a
serious performance problem with DSM systems that allow data blocks to
migrate from one node to another.

Thrashing may occur in the following situations :

• 1. When interleaved data accesses made by processes on two or more nodes


causes a data block to move back and forth from one node to another in quick
succession (a ping – pong effect)

• 2. When blocks with read only permissions are repeatedly invalidated soon
after they are replicated.
Techniques to reduce thrashing
• 1.Providing application – controlled locks : Locking data to
prevent other nodes from accessing that for a short period of
time can reduce thrashing. An application controlled lock can
be associated with each data block to implement this method.

• 2. Nailing a block to a node for a minimum amount of time :


Another method to reduce thrashing is to disallow a block to
be taken away from a node until a minimum amount of time t
elapses after its allocation to that node. The time t can either
be fixed statically or be turned dynamically on the basis of
access patterns.

• 3.Tailoring coherence semantics to use – object based sharing


Replication v/s migration
• how to keep track of the location of remote data

• how to overcome the communication delays and high


overhead associated with execution of communication
protocols

• how to make shared data concurrently accessible at several


nodes to improve system performance
Replication v/s migration

• Non replicated, Non migrating pages


• All requests for the page have to be sent to the owner of the page
• Easy to enforce sequential consistency — owner orders all access request
• No concurrency

Non replicated, migrating pages


• All requests for the page have to be sent to the owner of the page
• Each time a remote page is accessed, it migrates to the processor that
accessed it
• Easy to enforce sequential consistency — only processes on that processor
can access the page
• No concurrency
Replication v/s migration
Replicated, migrating pages
• All requests for the page have to be sent to the owner of the page
• Each time a remote page is accessed, it’s copied to the processor that
accessed it
• Multiple read operations can be done concurrently
• Hard to enforce sequential consistency — must invalidate (most common
approach) or update other copies of the page during a write operation

Replicated, non migrating pages


• Replicated at fixed locations
• All requests to the page have to be sent to one of the owners of the page
• Hard to enforce sequential consistency — must update other copies of
the page during a write operation
Responsibility of DSM Management
• Algorithms for data location and consistency management:

– Centralized manager algorithm


– Broadcast algorithm
– Fixed Distributed manager algorithm
– Dynamic distributed manager algorithm
Centralized Manager algorithm
Broadcast algorithm

Replicates
Fixed Distributed manager algorithm
Dynamic distributed manager algorithm
Distributed File System(DFS)
• Distributed file systems support the sharing of information in the form of files and hardware
resources.

• Distributed File System Design has two parts:


 File service Interface(operations on files - read, write, append)
 Directory service Interface (mkdir, rmdir, file creation and deletion)

• • The File Service Interface


 File Properties
 File Service model

• File Properties
 • Byte sequence vs. data structure- A file can be structured as a sequence of records. The operating system
either maintains the file as a B-tree or other suitable data structure, or uses hash tables to locate records
quickly.

 • Attributes (owner, creation/modified date, size, permissions)- A files can have attributes, which are pieces
of information about the file typical attributes are the owner, size, creation date, and access permissions.
 • Immutable vs. mutable files-Once a file has been created, it cannot be changed. Such a file
is said to be immutable.

• Having files be immutable makes it much easier to support file caching and replication
because it eliminates all the problems associated with having to update all copies of a file
whenever it changes.

 • Protection via Capabilities vs. Protection via Access Control Lists- Protection in distributed
systems uses : capabilities and access control lists.

• With capabilities, each user has a kind of ticket, called a capability for each object to which it
has access.
• The capability specifies which kinds of accesses are permitted (e.g., reading is allowed but
writing is not).

• All access control list schemes associate with each file a list of users who may access the file
and how.
• The UNIX scheme, with bits for controlling reading, writing, and executing each file separately
for the owner, the owner's group, and everyone else is a simplified access control list.
• The File Service Model
• (a) Upload/download model- In the upload/download model, the file service
provides only two major operations: read file and write file. The read file operation
transfers an entire file from one of the file servers to the requesting client. The
write file operation transfers an entire file the other way, from client to server

• Advantage
 Simple

• Problems
 Wasteful: what if client needs small piece?
 Problematic: what if client doesn’t have enough space?
 Consistency: what if others need to modify the same file?
• (b) Remote access model- In this model, the file service provides a large number
of operations for opening and closing files, reading and writing parts of files,
moving around within files, examining and changing file attributes, and so on.

• Advantages:
 Client gets only what’s needed
 Server can manage coherent view of file system

• Problem:
 Possible server and network congestion
 Same data may be requested repeatedly
File Service Model
Directory Server Interface in DFS Design
DFS Implementation
• File usage
• System Structure
• Caching
• Replication
• Sun's Network File System
File usage

• Most files are <10 Kbytes


 Feasible to transfer entire files (simpler)
 Still have to support long files
 Most files have short lifetimes( keep them local)
 Few files are shared
System structure
• Client-server system is stateless if: Servers maintain
no state information about clients

• Client-server system is stateful if: Servers hold


information about the client
System structure
• Stateful Server
 Server maintains client-specific state
 Shorter requests
 Better performance in processing requests
 Cache coherence is possible
 Server can know who’s accessing what
 File locking is possible

• Stateless Server
 Server maintains no information on client access
 Each request must identify file and offsets
 Server can crash and recover (No state to lose)
 Client can crash and recover (No open/close needed)
 No server space used for state of the client
 No limits on number of open files
 No problems if a client crashes
Best known examples?
• The UNIX NFS file system is stateless.
Bill Joy: “Once they replaced my file server during the evening
while my machine was idle. The next day I resumed work
right where I had left off, and didn’t even notice the change!”

• Database systems are usually stateful:


Client reads database of available seats on plane, information
stays valid during transaction
Caching in DFS Implementation

• Caching Location- It refers to the place where the cached data is stored.
Whether it is server’s main memory or client disk or client’s main memory.

• Four different places of cache location


• Server’s disk
• Server’s buffer cache
• Client’s buffer cache
• Client’s disk
Cache Update Policy
• Write-through

• Delayed-write

• Write-on-close (variation of delayed-write)


Cache Update Policy

• Write-through – all writes be propagated to stable storage immediately


– Reliable, but poor performance

• Delayed-write – modification written to cache and then written through to


server later

• Write-on-close – modification written back to server when file close


– Reduces intermediate read and write traffic while file is open
Replication in DFS Implementation
• Replication Transparency-multiple copies of selected files are maintained, with
each copy on a separate file server. The reasons for offering such a service are:

 To increase reliability by having independent backups of each file. If one server


goes down, or is even lost permanently, no data are lost. For many applications,
this property is extremely desirable.

 To split the workload over multiple servers. As the system grows in size, having all
the files on one server can become a performance bottleneck. By having files
replicated on two or more servers, the least heavily loaded one can be used.
• Three ways of replication

 Explicit File Replication-In this approach programmer control the entire


process. When a process makes a file, it does so on one specific server.
Then it can make additional copies on other servers, if desired. If the directory
server permits multiple copies of a file, the network addresses of all copies can
be associated with the file name, so that when the name is looked up, all
copies will be found.

 Lazy replication- In this approach only one copy of each file is created, on
some server. Later, the server itself makes replicas on other servers
automatically, without the programmer's knowledge.

 Group communication- In this all write system calls are simultaneously


transmitted to all the servers, so extra copies are made at the same time the
original is made.
Network File System in DFS Implementation
• NFS Architecture-NFS is to allow an arbitrary collection of clients and servers to
share a common file system.

• NFS Protocols-A protocol is a set of requests sent by clients to servers, along with
the corresponding replies sent by the servers back to the clients.

 NFS protocol handles mounting.


 NFS needs a separate, additional mechanism to handle locking.
 NFS uses the UNIX protection mechanism, with the rwx bits for the owner, group,
and others.
 All the keys used for the authentication, as well as other information are
maintained by the NIS (Network Information Service). The NIS was formerly
known as the yellow pages. Its function is to store (key, value) pairs. when a key is
provided, it returns the corresponding value.
NFS Architecture
• The basic NFS architecture for UNIX systems.
File System Model
Operation v3 v4 Description
Create Yes No Create a regular file
Create No Yes Create a nonregular file
Link Yes Yes Create a hard link to a file
Symlink Yes No Create a symbolic link to a file
Mkdir Yes No Create a subdirectory in a given directory
Mknod Yes No Create a special file
Rename Yes Yes Change the name of a file
Rmdir Yes No Remove an empty subdirectory from a directory
Open No Yes Open a file
Close No Yes Close a file
Lookup Yes Yes Look up a file by means of a file name
Readdir Yes Yes Read the entries in a directory
Readlink Yes Yes Read the path name stored in a symbolic link
Getattr Yes Yes Read the attribute values for a file
Setattr Yes Yes Set one or more attribute values for a file
Read Yes Yes Read the data contained in a file
Write Yes Yes Write data to a file

• list of file system operations supported by NFS.


Communication

a) Reading data from a file in NFS version 3.


b) Reading data using a compound procedure in version 4.
Naming (1)
• Mounting (part of) a remote file system in NFS.
Naming (2)
• Mounting nested directories from multiple servers in
NFS.
File Attributes (1)
Attribute Description
TYPE The type of the file (regular, directory, symbolic link)
SIZE The length of the file in bytes
Indicator for a client to see if and/or when the file has
CHANGE
changed
FSID Server-unique identifier of the file's file system

• Some general mandatory file attributes in NFS.


File Attributes (2)
Attribute Description
ACL an access control list associated with the file
FILEHANDLE The server-provided file handle of this file
FILEID A file-system unique identifier for this file
FS_LOCATIONS Locations in the network where this file system may be found
OWNER The character-string name of the file's owner
TIME_ACCESS Time when the file data were last accessed
TIME_MODIFY Time when the file data were last modified
TIME_CREATE Time when the file was created

• Some general recommended file attributes.


Semantics of File Sharing (1)
a) On a single processor, when a
read follows a write, the value
returned by the read is the value
just written.
b) In a distributed system with
caching, obsolete values may be
returned.
Semantics of File Sharing (2)
Method Comment
UNIX semantics Every operation on a file is instantly visible to all processes
Session semantics No changes are visible to other processes until the file is closed
Immutable files No updates are possible; simplifies sharing and replication
Transaction All changes occur atomically

• Four ways of dealing with the shared files in a distributed system.


File Locking in NFS
Operation Description
Lock Creates a lock for a range of bytes
Lockt Test whether a conflicting lock has been granted
Locku Remove a lock from a range of bytes
Renew Renew the leas on a specified lock

• NFS version 4 operations related to file locking.


Client Caching (1)
• Client-side caching in NFS.
Client Caching (2)
• Using the NFS version 4 callback mechanism to recall file
delegation.
Access Control
Operation Description
Read_data Permission to read the data contained in a file
Write_data Permission to to modify a file's data
Append_data Permission to to append data to a file
Execute Permission to to execute a file
List_directory Permission to to list the contents of a directory
Add_file Permission to to add a new file t5o a directory
Add_subdirectory Permission to to create a subdirectory to a directory
Delete Permission to to delete a file
Delete_child Permission to to delete a file or directory within a directory
Read_acl Permission to to read the ACL
Write_acl Permission to to write the ACL
Read_attributes The ability to read the other basic attributes of a file
Write_attributes Permission to to change the other basic attributes of a file
Read_named_attrs Permission to to read the named attributes of a file
Write_named_attrs Permission to to write the named attributes of a file
Write_owner Permission to to change the owner
Synchronize Permission to to access a file locally at the server with synchronous reads and writes

• The classification of operations recognized by NFS with respect to access control.


NAMING
IN
DISTRIBUTED SYSTEMS
Why naming is important?

• Names are used to


– Share resources
– Uniquely identify entities
– To refer locations, and so on…

• Name resolution allows a process to access the named entity


Naming in Distributed Systems
• The naming facility of a distributed operating system enables users and programs to assign
character-string names to objects and subsequently use these names to refer to those
objects.

• The locating facility, which is an integral part of the naming facility, maps an object's name
to the object‘s location in a distributed system.

• The naming and locating facilities jointly form a naming system that provides the users with
an abstraction of an object that hides the details of how and where an object is actually
located in the network.

• It provides a further level of abstraction when dealing with object replicas. Given an object
name, it returns a set of the locations of the object's replicas.

• The naming system plays a very important role in achieving the goal of
 location transparency,
 facilitating transparent migration and replication of objects
 object sharing
DESIRABLE FEATURES
OF A GOOD NAMING SYSTEM
• 1. Location transparency. An object's name should be independent of the physical
connectivity or topology of the system, or the current location of the object.

• 2. Location independency. The name of an object need not be changed when the object's
location changes. Furthermore, a user should be able to access an object by its same name
irrespective of the node from where he or she accesses it (user migration).

• 3. Scalability. Distributed systems vary in size ranging from one with a few nodes to one
with many nodes , a change in the system scale should not require any change in the
naming or locating mechanisms.

• 4. Uniform naming convention. A good naming system should use the same naming
convention for all types of objects in the system.

• 5. Multiple user-defined names for the same object. a naming system must provide the
flexibility to assign multiple user-defined names to the same object. In this case, it should
be possible for a user to change or delete his or her name for the object without affecting
those of other users.
• 6. Group naming. A naming system should allow many different objects to be
identified by the same name. Such a facility is useful to support broadcast facility
or to group objects for conferencing or other applications.

• 7. Meaningful names. Meaningful names typically indicate something about the


contents or function of their referents, are easily transmitted between users, and
are easy to remember and use.

• 8. Performance. The most important performance measurement of a naming
system is the amount of time needed to map an object's name to its attributes,
such as its location.

• 9. Fault tolerance. A naming system should be capable of tolerating, to some


extent, faults that occur due to the failure of a node or a communication link in a
distributed system network.
• 11. Locating the nearest replica. When a naming system supports the use of multiple
copies of the same object, it is important that the object-locating mechanism of the
naming system should always supply the location of the nearest replica of the desired
object. This is because the efficiency of the object accessing operation will be
affected if the object-locating mechanism does not take this point into consideration .
Human-Oriented and System-Oriented
Names
Human-Oriented and System-Oriented
Names

• Human-oriented name is generally a character-string that is meaningful to its


users. For example, /user/sinha/project-file-1 is a human-oriented name. Human-
oriented names are defined by their users. Human-oriented names are also known
as high-level names because they can be easily remembered by their users.

• System-oriented names are needed to be used efficiently by the system. These


names generally are bit patterns of fixed size that can be easily manipulated and
stored by machines. These names generally are bit patterns of fixed size that can
be easily manipulated and stored by machines. They are automatically generated
by the system.
UNIT-2 Assignment Questions
• What is DSM? Explain about different types of architectures.
• Explain design and implementation issues of DSM.
• What is thrashing? Explain the methods to solve thrashing problem in
DSM.
• Explain naming in distributed system. What is flat naming and structured
naming? Explain object location mechanism.
• Explain the implementation of DFS.

• Write short notes on:-


1. Granularity
2. Consistency models with example
3. Stateful and Stateless servers
4. Distributed file system

You might also like