Shared Memory Architecture
Shared memory systems
Shared memory systems form a major category of multiprocessors.
All processors share a global memory.
Communication between tasks running on different processors is performed through
writing to and reading from the global memory.
All interprocessor coordination and synchronization is also accomplished via the global
memory.
A shared memory computer system consists of a set of independent processors, a set of
memory modules, and an interconnection network.
problems related to designing a shared memory system:
performance degradation due to contention
Performance degradation might happen when multiple processors are trying to access the
shared memory simultaneously. A typical design might use caches to solve the contention
problem.
coherence problems, having multiple copies of data, spread throughout the caches,
might lead to a coherence problem. The copies in the caches are coherent if they are all
equal to the same value. However, if one of the processors writes over the value of one of
the copies, then the copy becomes inconsistent because it no longer equals the value of
the other copies.
CLASSIFICATION OF SHARED MEMORY SYSTEMS
The simplest shared memory system consists of one memory module (M) that can be accessed
from two processors (P1 and P2).
Requests arrive at the memory module through its two ports. An arbitration unit within the
memory module passes requests through to a memory controller. If the memory module is not
busy and a single request arrives, then the arbitration unit passes that request to the memory
controller and the request is satisfied.
The module is placed in the busy state while a request is being serviced. If a new request
arrives while the memory is busy servicing a previous request, the memory module sends a
wait signal, through the memory controller, to the processor making the new request.
In response, the requesting processor may hold its request on the line until the memory
becomes free or it may repeat its request some time later.
If the arbitration unit receives two requests, it selects one of them and passes it to the memory
controller. Again, the denied request can be either held to be served next or it may be repeated
some time later.
CLASSIFICATION OF SHARED MEMORY SYSTEMS
Based on the interconnection network used, shared memory systems can be categorized in
the following categories.
Uniform Memory Access (UMA)
Nonuniform Memory Access (NUMA)
Cache-Only Memory Architecture (COMA)
CLASSIFICATION OF SHARED MEMORY SYSTEMS
Uniform Memory Access (UMA)
A shared memory is accessible by all processors through an interconnection network in the
same way a single processor accesses its memory.
All processors have equal access time to any memory location.
The interconnection network used in the UMA can be a single bus, multiple buses, or a
crossbar switch.
Because access to shared memory is balanced, these systems are also called SMP (symmetric
multiprocessor) systems.
Each processor has equal opportunity to read/write to memory, including equal access speed.
CLASSIFICATION OF SHARED MEMORY SYSTEMS
Uniform Memory Access (UMA)
A typical bus-structured SMP computer, attempts to reduce contention for the bus by
fetching instructions and data directly from each individual cache, as much as possible. In
the extreme, the bus contention might be reduced to zero after the cache memories are
loaded from the global memory, because it is possible for all instructions and data to be
completely contained within the cache. This memory organization is the most popular
among shared memory systems.
CLASSIFICATION OF SHARED MEMORY SYSTEMS
Nonuniform Memory Access (UMA)
each processor has part of the shared memory attached. The memory has a single address
space. Therefore, any processor could access any memory location directly using its real
address. However, the access time to modules depends on the distance to the processor.
This results in a nonuniform memory access time.
A number of architectures are used to interconnect processors to memory modules in a
NUMA. Among these are the tree and the hierarchical bus networks.
CLASSIFICATION OF SHARED MEMORY SYSTEMS
Cache-Only Memory Architecture (COMA)
Similar to the NUMA, each processor has part of the shared memory in the COMA. However,
in this case the shared memory consists of cache memory.
A COMA system requires that data be migrated to the processor requesting it. There is no
memory hierarchy and the address space is made of all the caches. There is a cache directory
(D) that helps in remote cache access.
CLASSIFICATION OF SHARED MEMORY SYSTEMS
Cache-Only Memory Architecture (COMA)
BUS-BASED SYMMETRIC MULTIPROCESSORS
A typical bus-based design uses caches to solve the bus contention problem.
High speed caches connected to each processor on one side and the bus on the other side mean
that local copies of instructions and data can be supplied at the highest possible rate.
If the local processor finds all of its instructions and data in the local cache, we say the hit rate
is 100%.
The miss rate of a cache is the fraction of the references that cannot be satisfied by the cache,
and so must be copied from the global memory, across the bus, into the cache, and then passed
on to the local processor.
One of the goals of the cache is to maintain a high hit rate, or low miss rate under high
processor loads. A high hit rate means the processors are not using the bus as much.
BUS-BASED SYMMETRIC MULTIPROCESSORS
BUS-BASED SYMMETRIC MULTIPROCESSORS
BASIC CACHE COHERENCY METHODS
Multiple copies of data, spread throughout the caches, lead to a coherence problem among
the caches. The copies in the caches are coherent if they all equal the same value. However, if
one of the processors writes over the value of one of the copies, then the copy becomes
inconsistent because it no longer equals the value of the other copies. If data are allowed to
become inconsistent (incoherent), incorrect results will be propagated through the system,
leading to incorrect final results. Cache coherence algorithms are needed to maintain a level
of consistency throughout the parallel system.
• Cache–Memory Coherence.
• Cache–Cache Coherence.
• Shared Memory System Coherence
BASIC CACHE COHERENCY METHODS
Cache–Memory Coherence.
In a single cache system, coherence between memory and the cache is maintained using
one of two policies:
(1) write-through,
(2) write-back.
When a task running on a processor P requests the data in memory location X, for
example, the contents of X are copied to the cache, where it is passed on to P. When P
updates the value of X in the cache, the other copy in memory also needs to be updated in
order to maintain consistency.
In write-through, the memory is updated every time the cache is updated,
In write-back, the memory is updated only when the block in the cache is being replaced.
BASIC CACHE COHERENCY METHODS
Cache–Memory Coherence.
BASIC CACHE COHERENCY METHODS
Cache–Cache Coherence
In multiprocessing system, when a task running on processor P requests the data in global
memory location X, for example, the contents of X are copied to processor P’s local cache, where
it is passed on to P. Now, suppose processor Q also accesses X. What happens if Q wants to write
a new value over the old value of X? There are two fundamental cache coherence policies:
(1) write-invalidate,
(2) write-update.
Write-invalidate maintains consistency by reading from local caches until a write occurs. When
any processor updates the value of X through a write, posting a dirty bit for X invalidates all other
copies. For example, processor Q invalidates all other copies of X when it writes a new value into
its cache. This sets the dirty bit for X. Q can continue to change X without further notifications to
other caches because Q has the only valid copy of X. However, when processor P wants to read X,
it must wait until X is updated and the dirty bit is cleared.
Write-update maintains consistency by immediately updating all copies in all caches. All dirty
bits are set during each write operation. After all copies have been updated, all dirty bits are
cleared
BASIC CACHE COHERENCY METHODS
Cache–Cache Coherence
BASIC CACHE COHERENCY METHODS
Shared Memory System Coherence
• If we permit a write-update and write-through directly on global memory location X,
the bus would start to get busy and ultimately all processors would be idle while
waiting for writes to complete.
• In write-update and write-back, only copies in all caches are updated.
Write-Invalidate and Write-Through
In this simple protocol the memory is always consistent with the most recently
updated cache copy. Multiple processors can read block copies from main
memory safely until one processor updates its copy. At this time, all cache copies
are invalidated and the memory is updated to remain consistent.
Write-Invalidate and Write-Back
In this protocol a valid block can be owned by memory and shared in multiple caches that can
contain only the shared copies of the block.
Multiple processors can safely read these blocks from their caches until one processor updates
its copy. At this time, the writer becomes the only owner of the valid block and all other copies
are invalidated.
Write-Invalidate and Write-Back
Write-Update and Partial Write-Through
In this protocol an update to one cache is written to memory at the same time it is broadcast to
other caches sharing the updated block. These caches snoop on the bus and perform updates
to their local copies. There is also a special bus line, which is asserted to indicate that at least
one other cache is sharing the block.
Write-Update and Partial Write-Through
Write-Update and Write-Back
This protocol is similar to the previous one except that instead of writing through to the
memory whenever a shared block is updated, memory updates are done only when the
block is being replaced.