Cache Coherence and Synchronization - Tutorialspoint
Cache Coherence and Synchronization - Tutorialspoint
In this chapter, we will discuss the cache coherence protocols to cope with the multicache
inconsistency problems.
As multiple processors operate in parallel, and independently multiple caches may possess different
copies of the same memory block, this creates cache coherence problem. Cache coherence
schemes help to avoid this problem by maintaining a uniform state for each cached block of data.
Let X be an element of shared data which has been referenced by two processors, P1 and P2. In the
beginning, three copies of X are consistent. If the processor P1 writes a new data X1 into the cache,
by using write-through policy, the same copy will be written immediately into the shared memory. In
this case, inconsistency occurs between cache memory and the main memory. When a write-back
policy is used, the main memory will be updated when the modified data in the cache is replaced or
invalidated.
Snoopy protocols achieve data consistency between the cache memory and the shared memory
through a bus-based memory system. Write-invalidate and write-update policies are used for
maintaining cache consistency.
https://www.tutorialspoint.com/parallel_computer_architecture/parallel_computer_architecture_cache_coherence_synchronizatio… 1/7
08/12/2019 Cache Coherence and Synchronization - Tutorialspoint
In this case, we have three processors P1, P2, and P3 having a consistent copy of data element ‘X’ in
their local cache memory and in the shared memory (Figure-a). Processor P1 writes X1 in its cache
memory using write-invalidate protocol. So, all other copies are invalidated via the bus. It is denoted
by ‘I’ (Figure-b). Invalidated blocks are also known as dirty, i.e. they should not be used. The write-
https://www.tutorialspoint.com/parallel_computer_architecture/parallel_computer_architecture_cache_coherence_synchronizatio… 2/7
08/12/2019 Cache Coherence and Synchronization - Tutorialspoint
update protocol updates all the cache copies via the bus. By using write back cache, the memory
copy is also updated (Figure-c).
Following events and actions occur on the execution of memory-access and invalidation commands −
Read-miss − When a processor wants to read a block and it is not in the cache, a read-miss
occurs. This initiates a bus-read operation. If no dirty copy exists, then the main memory that
has a consistent copy, supplies a copy to the requesting cache memory. If a dirty copy exists
in a remote cache memory, that cache will restrain the main memory and send a copy to the
requesting cache memory. In both the cases, the cache copy will enter the valid state after a
read miss.
Write-hit − If the copy is in dirty or reserved state, write is done locally and the new state is
dirty. If the new state is valid, write-invalidate command is broadcasted to all the caches,
invalidating their copies. When the shared memory is written through, the resulting state is
reserved after this first write.
Write-miss − If a processor fails to write in the local cache memory, the copy must come
either from the main memory or from a remote cache memory with a dirty block. This is done
by sending a read-invalidate command, which will invalidate all cache copies. Then the local
copy is updated with dirty state.
Read-hit − Read-hit is always performed in local cache memory without causing a transition
of state or using the snoopy bus for invalidation.
Block replacement − When a copy is dirty, it is to be written back to the main memory by
block replacement method. However, when the copy is either in valid or reserved or invalid
https://www.tutorialspoint.com/parallel_computer_architecture/parallel_computer_architecture_cache_coherence_synchronizatio… 3/7
08/12/2019 Cache Coherence and Synchronization - Tutorialspoint
Directory-Based Protocols
By using a multistage network for building a large multiprocessor with hundreds of processors, the
snoopy cache protocols need to be modified to suit the network capabilities. Broadcasting being very
expensive to perform in a multistage network, the consistency commands is sent only to those caches
that keep a copy of the block. This is the reason for development of directory-based protocols for
network-connected multiprocessors.
In a directory-based protocols system, data to be shared are placed in a common directory that
maintains the coherence among the caches. Here, the directory acts as a filter where the processors
ask permission to load an entry from the primary memory to its cache memory. If an entry is changed
the directory either updates it or invalidates the other caches with that entry.
Maintaining cache coherency is a problem in multiprocessor system when the processors contain local
cache memory. Data inconsistency between different caches easily occurs in this system.
The major concern areas are −
Sharing of writable data
Process migration
I/O activity
When two processors (P1 and P2) have same data element (X) in their local caches and one process
(P1) writes to the data element (X), as the caches are write-through local cache of P1, the main
memory is also updated. Now when P2 tries to read data element (X), it does not find X because the
data element in the cache of P2 has become outdated.
https://www.tutorialspoint.com/parallel_computer_architecture/parallel_computer_architecture_cache_coherence_synchronizatio… 4/7
08/12/2019 Cache Coherence and Synchronization - Tutorialspoint
Process migration
In the first stage, cache of P1 has data element X, whereas P2 does not have anything. A process on
P2 first writes on X and then migrates to P1. Now, the process starts reading data element X, but as
the processor P1 has outdated data the process cannot read it. So, a process on P1 writes to the data
element X and then migrates to P2. After migration, a process on P2 starts reading the data element X
but it finds an outdated version of X in the main memory.
I/O activity
As illustrated in the figure, an I/O device is added to the bus in a two-processor multiprocessor
architecture. In the beginning, both the caches contain the data element X. When the I/O device
receives a new element X, it stores the new element directly in the main memory. Now, when either P1
or P2 (assume P1) tries to read element X it gets an outdated copy. So, P1 writes to element X. Now, if
I/O device tries to transmit X it gets an outdated copy.
https://www.tutorialspoint.com/parallel_computer_architecture/parallel_computer_architecture_cache_coherence_synchronizatio… 5/7
08/12/2019 Cache Coherence and Synchronization - Tutorialspoint
Uniform Memory Access (UMA) architecture means the shared memory is the same for all processors
in the system. Popular classes of UMA machines, which are commonly used for (file-) servers, are the
so-called Symmetric Multiprocessors (SMPs). In an SMP, all system resources like memory, disks,
other I/O devices, etc. are accessible by the processors in a uniform manner.
https://www.tutorialspoint.com/parallel_computer_architecture/parallel_computer_architecture_cache_coherence_synchronizatio… 6/7
08/12/2019 Cache Coherence and Synchronization - Tutorialspoint
COMA tends to be more flexible than CC-NUMA because COMA transparently supports the
migration and replication of data without the need of the OS.
COMA machines are expensive and complex to build because they need non-standard
memory management hardware and the coherency protocol is harder to implement.
Remote accesses in COMA are often slower than those in CC-NUMA since the tree network
needs to be traversed to find the data.
https://www.tutorialspoint.com/parallel_computer_architecture/parallel_computer_architecture_cache_coherence_synchronizatio… 7/7