Distributed Shared
Memory
Distributed Operating
Systems
Distributed Shared Memory
( DSM )
1
Distributed shared memory
DSM paradigm provides process with shared address
space.
Primitives for shared memory:
1. Read(address)
2. Write(address, data)
Shared memory paradigm gives the system a illusion of
physically shared memory.
DSM refers to shared memory paradigm applied to
loosely coupled distributed memory systems.
2
Cont….
Shared memory exists only virtually.
Similar concept to virtual memory.
DSM also known as DSVM.
DSM provides a virtual address space shared among
processes on loosely coupled processors.
DSM is basically an abstraction that integrates the local
memory of different machine into a single logical entity
shared by cooperating processes.
3
Distributed shared memory
4
DSM Architecture
Each node of the system consist of one or more CPUs
and memory unit.
Nodes are connected by high speed communication
network.
Simple message passing system for nodes to exchange
information.
Main memory of individual nodes is used to cache pieces
of shared memory space.
Reduces network latency
5
Cont….
Memory mapping manager routine maps local memory
to shared virtual memory.
Shared memory space is partitioned into blocks.
Data caching is used in DSM system to reduce network
latency.
The basic unit of caching is a memory block.
6
Cont….
If data is not available in local memory network block
fault is generated.
On Network block fault:
The missing block is migrated from the remote node to the
client process’s node and OS maps it into the application’s
address space.
Data blocks keep migrating from one node to another on
demand but no communication is visible to the user processes.
7
Design and implementation issues
1. Granularity
2. Structure of Shared memory
3. Memory coherence and access synchronization
4. Data location and access
5. Replacement strategy
6. Thrashing
7. Heterogeneity
8
Granularity
Refers to the block size of DSM.
The unit of sharing & the unit of data transfer across the
network when a network block fault occurs.
Possible unit are a few word, a page or few pages.
9
Structure of Shared memory
Structure refers to the layout of the shared data in
memory.
Dependent on the type of applications that the DSM
system is intended to support.
10
Memory Coherence & Access Synchronization
Replicated and shared data items may
simultaneously be available in the main memories of
a number of nodes.
Memory coherence problem
Deals with the consistency of a piece of shared data lying
in the main memories of two or more nodes.
11
Data location and access
To share data in a DSM, it should be possible to locate
and retrieve the data accessed by a user process.
12
Replacement strategy
Ifthe local memory of a node is full, a cache miss at that
node implies not only a fetch of accessed data block from
a remote node but also a replacement.
Data block must be replaced by the new data block.
13
Thrashing
Data block migrate between nodes on demand.
Therefore if two nodes compete for write access to a
single data item, the corresponding data block may be
transferred back.
14
Heterogeneity
The DSM system built for homogeneous system need not
address the heterogeneity issue.
However, if the underlying system environment is
heterogeneous, the DSM system must be designed to take
care of heterogeneity so that it functions properly with
machines having different architectures.
15
Granularity
Most visible parameter in the design of DSM system is
block size.
Sending small packet of data is more expensive than
sending large packet.
16
Cont…
Factors influencing block size selection:
1. Paging overhead
2. Directory size
3. Thrashing
4. False sharing
17
Paging overhead
A process is likely to access a large region of its
shared address space in a small amount of time.
The paging overhead is less for large block size as
compared to the paging overhead for small block size.
18
Directory size
The larger the block size, the smaller the
directory.
Result: reduced directory management overhead
for larger block size.
19
Thrashing
The problem of thrashing may occur:
when data item in the same data block are being
updated by multiple nodes at the same time.
with any block size, more likely with larger block
size.
20
False sharing
Occurs when two different processes access
two unrelated variable that reside in the
same data block.
21
Cont…
The larger is the block size, higher is the probability of
false sharing.
False sharing of a block may lead to a thrashing problem.
22
Cont…
Using page size as block size
Relative advantage and disadvantages of small
and large block size makes it difficult for DSM
designer to decide on a proper block size.
On Intel Core2 Duo 64-bit system, the page size
is 4kB, which is normal for almost every desktop
PC architectures.
Linux kernel 2.6.x versions support large pages
(4MB).
23
Advantages of using page size as
block size
Use of existing page fault schemes to trigger a DSM page
fault.
Allows the access right control.
Page size do not impose undue communication overhead
at the time of network page fault.
Page size is a suitable data entity unit with respect to
memory contention.
24
Structure of Shared-Memory Space
Structure defines the abstract view of the shared
memory space for application programmers.
The structure and granularity of a DSM system are
closely related.
25
Cont…
Three approaches:
1. No structuring
2. Structuring by data type
3. Structuring as a database
26
No structuring
The shared memory space is simply a linear
array of words.
Advantages:
Convenient to choose any suitable page size as the
unit of sharing.
A fixed grain size may be used for all application.
Simple and easy to design such a DSM system.
27
Structuring by data type
The shared memory space is structured as a collection of
objects in the source language.
The granularity in such DSM system is an object.
DSM system uses variable grain size to match the size of
the object/variable being accessed by the application.
Complicates the design and implementation.
28
Structuring as a database
Structures the shared memory like a database.
Shared memory space is ordered as an associative memory
called tuple space.
To perform update, old data item in the DSM are replaced by
new data item.
Processes select tuples by specifying the number of their
fields and their values or type.
Access to shared data is non-transparent.
29
Consistency Models
Consistency requirement vary from application to
application.
Refers to the degree of consistency that has to be
maintained for the shared memory data.
If a system supports a stronger consistency model,
weaker consistency model is automatically supported
but the converse is not true.
30
Types of Consistency Models
1. Strict Consistency model
2. Sequential Consistency model
3. Casual consistency model
4. Pipelined Random Access Memory consistency
model (PRAM)
5. Processor Consistency model
6. Weak consistency model
7. Release consistency model
31
Strict Consistency Model
The strongest form with most stringent consistency
requirement.
Value returned by a read operation on a memory
address is always the same as the value written by
the most recent write operation to that address.
All writes instantaneously become visible to all
processes.
Implementation requires the existence of an absolute
global time to synchronize clocks of all nodes.
Practically impossible.
32
Sequential Consistency Model
Proposed by Lamport in 1979.
All processes see the same order of all memory access
operations on the shared memory.
Exact order of access operations are interleaved & does
not matter.
33
Cont…
Example :
Operations performed in order
1. Read(r1)
2. write(w1)
3. Read(r2)
34
Cont…
Only acceptable ordering
for a strictly consistency memory
(r1, w1, r2)
For sequential consistency model
Any of the orderings (r1,w1,r2), (r1,r2,w1), (w1,r1,r2),
(w1,r2,r1), (r2,r1,w1), (r2,w1,r1), is correct
if all processes see the same ordering.
35
Cont…
The consistency requirement of the sequential
consistency model is weaker than that of the strict
consistency model.
A sequentially consistent memory provide one-copy
/single-copy semantics.
Acceptable by most applications.
36
Casual Consistency Model (PCR)
Proposed by hutto and ahemad in 1990.
All processes see only those memory reference
operations in the correct order that are potentially
casually related (PCR).
Otherwise seen by different processes in different order.
Required to construct and maintain dependency graphs
for memory access operations.
37
Pipelined Random Access Memory
(PRAM) Consistency model
Proposed by Lipton and Sandberg in 1988.
A weaker consistency semantics.
Ensures that all write operations performed by a single
process are seen by all other processes in the order in
which they were performed as if all write operations
performed by a single process are in a pipeline.
38
PRAM
P1 – W11 & W12
P2 – W21 & W22
P3 can see these as ((W11,W12), (W21,W22))
P4 can see these as ((W21,W22), (W11,W12))
39
Cont…
Advantages:
Simple and easy to implement.
Good performance
Limitation:
All processes may not agree on the same order of memory
reference operations.
40
Processor Consistency model
Proposed by Goodman in 1989.
Very similar to PRAM model with additional restriction of
memory coherence.
Memory coherence:
For any memory location, all processes agree on the same order
of all write operation to that location.
Processor consistency ensures that all write operations
performed on the same location are seen by all
processes in the same order.
41
Weak consistency model
Proposed by Dubois in 1988.
Many applications may demand that:
It is not necessary to show the change in memory
done by every write operation to other processes.
Isolated accesses to shared variable are rare.
42
Cont…
Idea of weak consistency:
Better performance can be achieved if consistency
is enforced on a group of memory reference
operations rather than on individual memory
reference operations.
Uses a special variable called a
synchronization variable to synchronize
memory.
43
Cont…
Requirements:
1. All accesses to synchronization variables must obey
sequential consistency semantics.
2. All previous write operations must be completed
everywhere before an access to a synchronization
variable is allowed.
3. All previous accesses to synchronization variables must
be completed before access to a non-synchronization
variable is allowed.
4. Better performance at the cost of putting extra burden on the
programmers.
44
Release consistency model
Enhancement of weak consistency model.
Use of two synchronization variables:
Acquire
Release
45
Cont…
Acquire
Used to tell the system that a process is
entering Critical section.
Results in propagating changes made by other
nodes to process's node.
46
Cont…
Release
Used to tell the system that a process has just
exited critical section.
Results in propagating changes made by the
process to other nodes.
47
Cont…
A variation of release consistency is lazy
release consistency proposed by Keleher in
1992.
Modifications are not sent to other nodes at the
time of release but only on demand.
Better performance.
48
Implementing sequential consistency
model
Protocol for implementing depends on
whether the DSM system allows:
replication & / or
Migration
Of shared memory data blocks.
49
Cont…
Strategies:
1. Nonreplicated, Nonmigrating blocks (NRNMB)
2. Nonreplicated, Migrating blocks (NRMB)
3. Replicated, migrating blocks (RMB)
4. Replicated, Nonmigrating blocks (RNMB)
50
NRNMBS
Simplest strategy.
Each block of the shared memory has a single
copy whose location is always fixed.
51
Cont…
52
Cont…
Enforcing sequential consistency is trivial.
Method is simple and easy to implement.
Drawbacks:
Serializing data access creates a bottleneck.
Parallelism is not possible.
53
Cont…
Data locating in the NRNMB strategy:
There is a single copy of each block in the entire
system.
The location of a block never changes.
Requires a simple mapping function to map a block
of a node.
54
NRMBS
Each block of the shared memory has a single
copy in the entire system.
Migration is allowed.
Owner node of a block changes as soon as the
block is migrating to a new node.
Only the processes executing on one node
can read or write a given data item at any
time.
55
Cont…
56
Cont…
Advantage :
No communications cost are incurred when a
process accesses data currently held locally.
Advantage of data access locality.
Drawbacks:
Prone to thrashing problem.
No parallelism.
57
Data locating in the NRMB strategy
There is a single copy of each block, the location
of a block keeps changing dynamically.
58
Cont…
Following method used:
1. Broadcasting
2. Centralized server algorithm
3. Fixed distributed server algorithm
4. Dynamic distributed server algorithm
59
Broadcasting
Each node maintains an owned block table
that contains an entry for each block for
which the node is the current owner.
60
Cont…
61
Cont….
On a fault,
Fault handler of the faulting node broadcasts a
read/write request on the network.
Disadvantage:
Not scalable.
62
Centralized server algorithm
63
Cont…
A centralized server maintains a block table
that contains the location information for all
block in the shared memory space.
Drawbacks:
A centralized server serializes location
queries, reducing parallelism.
The failure of the centralized server will cause
the DSM system to stop functioning.
64
Fixed distributed server algorithm
65
Cont..
A direct extension of the centralized server
scheme.
There is a block manager on several nodes.
Each block manager is given a predetermined
subset of data blocks to manage.
On fault, the mapping functions is used by
the currently accessed block.
66
Dynamic distributed server algorithm
67
Cont…
Does not use any block manager and
attempts to keep track of the ownership
information of all block in each node.
Each node has a block table.
Contains the ownership information for all blocks.
Use of probable owner.
When fault occurs, the faulting node extracts 68
from its block table the node information.
RMB - Replicated, migrating blocks
Used to increase parallelism.
Read operations are carried out in parallel at multiple
nodes by accessing the local copy of the data.
69
Cont…
Replication tends to increase the cost of write
operation.
Problem to maintain consistency.
Extra expense if the read/write ratio is large.
70
Cont…
Protocols to ensure sequential consistency:
1. Write-invalidate
2. Write-update
71
Write-invalidate
All copies of a piece of data except one are invalidated
before a write can be performed on it.
After invalidation of a block, only the node that performs
the write operation on the block holds the modified
version of the block.
72
Cont…
73
Write-update
A write operation is carried out by updating all copies of
the data on which the write is performed.
On write fault, the fault handler copies the accessed
block from one of the block’s current node to its own
node.
The write operation completes only after all the copies of
the block have been successfully updated.
74
Cont…
75
Cont…
Assign sequence number to the modification and
multicasts that to all the nodes where a replica of the
data block to be modified is located.
The write operations are processed at each node in
sequence number order.
If the verification fails, node request the sequencer
for a retransmission of the missing modification.
76
Cont…
77
Cont…
Write-update approach is very expensive.
Inthe write-invalidate approach, updates are only
propagated when data is read and several updates
can take place before communication is necessary.
Use of status tag associated with each block.
Indicates the block is valid/shared/ read-only / writable.
78
Cont…
Read request:
If there is a local block containing the data and if it
is valid, the request is satisfied by accessing the
local copy of data.
Otherwise, the fault handler of the requesting node
generates a read fault.
79
Cont…
Write request:
If there is a local block containing the data and if it
is valid and writable, the request is immediately
satisfied by accessing the local copy of the data.
Otherwise, the fault handler of the requesting node
generates a write fault and obtain a valid copy of
the block and changes its status to writable.
80
Data Locating in the RMB strategy
Write-invalidate protocol
1. Locating the owner of a block.
2. Keeping track of the node that currently has a
valid copy of the block.
81
Cont…
Following algorithms may be used:
1. Broadcasting
2. Centralized-server algorithm
3. Fixed distributed-server algorithm
4. Dynamic distributed-server algorithm
82
Cont… : Broadcasting
83
Cont…
Each node has an owned block table.
Contains an entry for each block for which the node is the
owner.
A copy-set field in each entry
Contains a list of nodes that currently have a valid copy of the
corresponding block.
When the read fault occurs, the faulting node sends a
broadcast read request on the network.
When a write fault occurs, the node sends a broadcast
write request on the network. 84
Centralized-server algorithm
85
Cont…
Similar to the centralized-server algorithm of the NRMB
strategy.
Each entry of the block table, managed by the
centralized server, has:
an owner-node field.
Indicates the current owner node of the block.
a copy-set field
Contains a list of nodes having a valid copy of the block.
86
Cont…
When read/write fault occurs, the node send a
read/write fault request for the accessed block to the
centralized server.
87
Fixed distributed-server algorithm
88
Cont…
Role of the centralized server is distributed to several
distributed servers.
There is a block manager on several node.
When fault occurs, the mapping function is used to find
the location of the block manager.
89
Dynamic distributed-server algorithm
90
Cont…
Each node has a block table that contains an entry for all
block in the shared memory space.
When a fault occurs, the fault handler of the faulting
node extracts the probable owner node information,
send a request for the block to that node.
On receiving the block and copy set information, node
sends an in validation request to all nodes in the copy-
set.
91
Cont…
To reduce the length of the chain of node to be
traversed to reach the true owner of a block, the
probable owner field of a block in a node is updated:
1. Whenever the node receives an invalidation request.
2. Whenever the node relinquishes ownership, that is on a
write fault.
3. Whenever the node forwards a fault request.
92
Cont…
First two cases the probable owner field is changed to
the new owner of the block.
Third case the probable owner field is changed to the
original faulting node.
93
Replicated, Non-migrating Block
A shared memory block may be replicated at multiple
node of the system but the location of each replica is
fixed.
All replicas of a block are kept consistent by updating
them all in case of a write access.
Sequential consistency is ensured by using a global
sequencer to sequence the write operation of all nodes.
94
Data locating in the RNMB strategy
Following characteristics:
1. The replica location of a block never change.
2. All replicas of a data block are kept consistent.
3. Only a read request can be directly sent to one of the
node having a replica and all write requests have to
be sent to the sequencer.
95
96
Cont….
The sequencer assign the next sequence number to
the requested modification.
It multicasts the modification with this sequence
number to all the nodes listed in the replica set field.
97
Munin : A Release Consistent DSM
System
Pg 259 – Pg 262
98
Replacement strategy
DSM system allows shared memory block to
be dynamically migrated/replicated.
Issues:
Which block should be replaced to make space for
a newly required block?
Where should the replaced block be placed?
99
Which block to replace?
Classification of replacement algorithms:
1. Usage based verses non-usage based
2. Fixed space verses variable space
100
Cont…
Usage based algorithms
Keep track of the history of usage of a cache line and use
this information to make replacement decisions.
Example : Least recently used.
Suitable for DSM
Due to data access locality feature.
101
Cont…
Non-usage based algorithms
Do
not take the record of use of cache lines into account
when doing replacement.
Example:First in first out and Rand (random or
pseudorandom).
102
Cont…
Fixed space algorithms
Assumes that the cache size is fixed.
Variable space algorithms
Assumes that the cache size can be changed dynamically
depending on the need.
A fetch does not imply a replacement, and a swap-out
can take place without a corresponding fetch.
Not suitable for a DSM system.
103
Cont…
Classification of each memory block of a
node:
Unused
a free memory block that is not currently being
used.
Nil
a block that has been invalidated.
104
Cont…
Read-only
a block for which the node has only read access
right.
Read-owned
a block for which the node has only read access
right but it also owner of the block.
Writable
a block for which the node has write access
permission.
105
Cont…
Replacement priorities
1. Both unused and nil block have the highest
replacement priority.
2. The read-only block have the next replacement
priority.
3. Read-owned and writable block for which
replica(s) exist on some other node (s) have the
next replacement priority.
4. Read-owned and writable block for which only this
node has a copy have the lowest replacement
priority.
106
Where to place a replaced block
Approaches for storing a useful block:
1. Using secondary store.
1. The block is simply transferred on to a local disk.
2. It does not waste any memory space.
2. Using the memory space of other nodes.
Methods require each node to maintain a table of
free memory space in all other nodes.
107
Thrashing
Occurs when the system spends a large amount of time
transferring shared data blocks from one node to
another.
Degrades system performance considerably.
108
Cont…
Situations for thrashing:
1. When interleaved data accesses is made by
processes on two or more node.
2. When blocks with read only permission are
repeatedly invalidated soon after they are
replicated.
109
Cont…
Methods for solving Thrashing problems:
1. Providing application controlled locks.
Locking data to prevent other node from
accessing that data for a short period of time can
reduce Thrashing.
110
Cont…
2. Nailing a block to a node for a minimum
amount of time
1. Disallows a block to a be taken away from a node
until a minimum amount of time t elapses after its
allocation to that node.
2. Drawback: Difficult to choose the appropriate
value for the time t.
111
Cont…
3. Tailoring the coherence algorithm to the shared-
data usage patterns.
Thrashing can also be minimized by using different
coherence protocols for shared data having
different characteristics. For example, the
coherence protocol used in Munin for write shared
variables avoids the false sharing problem, which
ultimately results in the avoidance of thrashing.
112
Reading
Pg 266 – Pg 270
113
Advantage of DSM
1. Simpler Abstraction
1. The shared memory programming paradigm shields
the application programmers from many such low
level concern.
2. Advantage:
1. Simpler abstraction to the application programmers of
loosely coupled distributed memory machine.
114
Advantage of DSM
2. Better portability of distributed application programs
The access protocol is consistent with the way
sequential application access data.
More natural transition from sequential to distributed
application.
Worse performance of application if they use
message passing directly.
115
Cont…
3. Better performance of some application
Some application using DSM can outperform their
message passing counterparts due to:
Locality of data
On-demand data movement
Larger memory space
116
Cont…
Locality of data
The computation model of DSM is to make the data
more accessible by moving it around.
Results in reduced overall communication cost for
such application.
117
Cont…
On demand data moment
The computation data modeled of DSM also
facilitates on demand moment of data as they are
being accessed.
Time needed for the data exchange phase is often
dictated by the throughput of existing
communication bottlenecks.
On demand data movements facility provided by
DSM eliminates the data exchange phase.
118
Cont…
4. Flexible communication environment
The message passing paradigm requires recipients
identification and coexistence of the sender and
receiver processes.
Here, the sender process need not specify the
identity of the receiver processes of data.
119
Cont…
5. Ease of process migration
Migration of a process is tedious and time
consuming.
The computation model of DSM provides the facility
of on demand migration of data between
processors.
120