Cache Coherence: Write-Invalidate Snooping Protocol For Write-Back
Cache Coherence: Write-Invalidate Snooping Protocol For Write-Back
Cache Coherence: Write-Invalidate Snooping Protocol For Write-Back
Search...
Search
http://slideplayer.com/slide/224800/
Upload
Log in
Download presentation
Cache coherence
1 of 21
Similar presentations
06/12/2016 09:27 AM
Embed
http://slideplayer.com/slide/224800/
Download presentation
Download presentation
0 Comments
Sort by Oldest
We think you have liked this presentation. If you wish to
download it, please recommend it to your friends in any social
system. Share buttons are a little bit lower. Thank you!
Buttons:
Cancel
Download
1
2
Cache
memory with very short access time used for storage of frequently
used instructions or data webster.com Modern desktop have at
least three caches: TLB translation lookaside buer I-Cache
instruction cache D-Cachedata cache
in a cache hit
06/12/2016 09:27 AM
Presentation
"Cache
& SpinLocks
Haim.writing
A...
writing
There areUdi
two& basic
approaches: Write8 Cache
http://slideplayer.com/slide/224800/
10
11
Cancel
Download
Cache coherence The coherence of caches is obtained if the
12
Cache
coherence
Cache
coherence
mechanisms
13
14
3 of 21
06/12/2016 09:27 AM
write invalidate
whenUdi
a&
write
operation
is observed to a
Presentation
"Cacheprotocol
& SpinLocks
Haim.
A...
http://slideplayer.com/slide/224800/
location that a cache has a copy of. There are two implementation for
the invalidate protocol: Write-update When a local cache block is
updated, the new data block is broadcast to all caches containing a
Download presentation
copy of the block for updating them Write-invalidate Invalidate all
remote copies of cache when a local cache block is updated.
We think Coherence
you have liked
this presentation.
you wish to
coherence
protocol
example: IfWritedownload it, please recommend it to your friends in any social
invalidate Snooping Protocol For Write-through Writes invalidate all
system. Share buttons are a little bit lower. Thank you!
other caches
15
Cache
16
Buttons:
Cache coherence Write-invalidate Snooping Protocol For
17
Invalid
18
of the four following states : Modied - The cache line is present only
in the current cache, and is dirty; it has been modied from the value
in main memory. The cache is required to write the data back to main
memory at some time in the future, before permitting any other read
of the (no longer valid) main memory state. The write-back changes
the line to the Exclusive state. Exclusive - The cache line is present
only in the current cache, but is clean; it matches main memory. It
may be changed to the Shared state at any time, in response to a read
request. Alternatively, it may be changed to the Modied state when
writing to it. Shared - Indicates that this cache line may be stored in
other caches of the machine and is "clean" ; it matches the main
memory. The line may be discarded (changed to the Invalid state) at
any time. Invalid - Indicates that this cache line is invalid (unused). To
summarize, the MESI is an extension of MSI algo. The MESI adds
division between modifying cache point the exist only in my cache
AND modifying cache point the exist also in other caches
19
06/12/2016 09:27 AM
Presentation
"Cache
& SpinLocks
A... what is done by the
What
is done Udi
by &
theHaim.
OS and
21 Cache
http://slideplayer.com/slide/224800/
22
23
Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]
Cancel
Download
test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
[locked] ret
24
Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]
test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
[locked] ret
25
Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]
test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
[locked] ret
26
Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]
test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
[locked] ret
27
Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]
test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
[locked] ret
28
Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]
test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
[locked] ret
29
Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]
test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
[locked] ret
30
Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]
test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
ret The other CPU action
5 of[locked]
21
06/12/2016 09:27 AM
Presentation
"Cacheand
& SpinLocks
Udi & Haim.
spin lock spin_lock:
movA...
eax, 1 xchg eax, [locked]
31 Caching
http://slideplayer.com/slide/224800/
test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
[locked] ret
32
Download presentation
Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]
test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
[locked] ret
We think you have liked this presentation. If you wish to
download it, please recommend it to your friends in any social
spin lock spin_lock: mov eax, 1 xchg eax, [locked]
33 Caching and system.
Share buttons are a little bit lower. Thank you!
test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
Buttons:
[locked] ret
34
Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]
test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
[locked] ret
35
Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]
test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 Cancel
xchg eax, Download
[locked] ret
36
Caching and spin lock spin_lock: mov eax, 1 xchg eax, [locked]
test eax, eax jnz spin_lock ret spin_unlock: mov eax, 0 xchg eax,
[locked] ret
37
Caching and spin lock spin_lock: mov eax, [locked] test eax,
eax jnz spin_lock mov eax, 1 xchg eax, [locked] test eax, eax jnz
spin_lock ret spin_unlock: mov eax, 0 xchg eax, [locked] ret
38
39
40
41
42
06/12/2016 09:27 AM
atomic_inc(lock->next_ticket);
!= lock->current_ticket)
} void
Presentation
"Cache & SpinLockswhile
Udi &(tHaim.
A...
spin_unlock(spinlock_t *lock){ lock->current_ticket++;
spinlock_t { int current_ticket; int next_ticket; }
43
http://slideplayer.com/slide/224800/
struct
Caching andDownload
ticket lockpresentation
void spin_lock(spinlock_t *lock){ t =
44
atomic_inc(lock->next_ticket);
Buttons: while (t != lock->current_ticket) } void
spin_unlock(spinlock_t *lock){ lock->current_ticket++; } struct
spinlock_t { int current_ticket; int next_ticket; } SPIN
45
46
47
48
49
50
51
52
06/12/2016 09:27 AM
atomic_inc(lock->next_ticket);
!= lock->current_ticket)
} void
Presentation
"Cache & SpinLockswhile
Udi &(tHaim.
A...
spin_unlock(spinlock_t *lock){ lock->current_ticket++;
spinlock_t { int current_ticket; int next_ticket; }
53
http://slideplayer.com/slide/224800/
struct
Caching andDownload
ticket lockpresentation
void spin_lock(spinlock_t *lock){ t =
54
atomic_inc(lock->next_ticket);
Buttons: while (t != lock->current_ticket) } void
spin_unlock(spinlock_t *lock){ lock->current_ticket++; } struct
spinlock_t { int current_ticket; int next_ticket; }
55
56
57
58
59
60
Interrupt
61
62
8 of 21
06/12/2016 09:27 AM
Presentation
"Cache &
SpinLocks
Udi
Haim.
A...
Cont
unsigned
int&irq
The
interrupt number being
63 Interrupt
http://slideplayer.com/slide/224800/
64
Download presentation
66
67
Top & Bottom Half Cont Two dierent mechanisms that may
68
the same CPU as the function that rst schedules them Interrupt
handler can be secure that a tasklet does not begin executing before
the handler has completed
69
while the tasklet is running, so locking between the tasklet and the
interrupt handler may still be required They may be scheduled to run
multiple times, but tasklet scheduling is not cumulative, the tasklet
runs only once, even if it is requested repeatedly before it is launched
70
they run only once, but tasklets can run in parallel with other tasklets
on SMP systems, so locking between tasklets are required
71
Tasklet
Example
void
short_do_tasklet(unsigned
long);
06/12/2016 09:27 AM
Presentation
"Cache & SpinLocks
Udi &
Haim.
Higher latency
but
are A...
allowed to sleep Invoke a
72 Workqueue
http://slideplayer.com/slide/224800/
73
74
Locks
75
76
77
to: Race condition - uncontrolled access to shared data. Starvation where a process is perpetually denied necessary resources. Without
those resources, the program can never nish its task. Deadlock - is
a situation in which two or more competing actions are each waiting
for the other to nish, and thus neither ever doe.
78
Race
Condition
Example
for
race
condition:
Lock;
if
(!dptr->data[s_pos])
{
dptr->data[s_pos]
=
kmalloc(quantum,
GFP_KERNEL); if (!dptr->data[s_pos]) goto __cleanup; } UnLock; Leads
to memory leak !!!
79
dptr->data[s_pos]
=
kmalloc(quantum,
GFP_KERNEL);
if
(!dptr->data[s_pos]) return -1; } UnLock; Leads to deadlock!!! goto
__cleanup;
80
06/12/2016 09:27 AM
resources "Cache
are, by&their
nature,Udi
shared,
andA...
software resources also
Presentation
SpinLocks
& Haim.
http://slideplayer.com/slide/224800/
81
Download presentation
82
83
Spinlock Cont if the kernel control path nds the lock locked, it
84
loses the processor due to: the driver call a function which put the
process to sleep (e.g copy_from_user) kernel preemption kicks in higher priority process push the driver code aside
85
will not be free in the near future In the best case, if another thread
tries to acquire the locks it will spin for long time In the worst case
deadlock can occur
86
spinlock must be atomic and can not go to sleep (and sometimes not
even handle interrupt) Preemption is disabled on processor which
hold a spinlock
87
88
routine executes in the same processor as the code that took out the
lock originally? Deadlock !!!
89
90
11 of 21
06/12/2016 09:27 AM
Presentation
"Cache & API
SpinLocks
Udi &lock
Haim.
A...
Spinlock
Initialize
APIs:
91
spinlock_t
lock
http://slideplayer.com/slide/224800/
92
Download presentation
94
95
Cancel
Download
Spinlock API Cont Un lock APIs: void spin_unlock(spinlock_t
96
97
98
99
100
spinlock 2.if(count >0) i.count-- 3.else i.insert calling task to the tail of
wait_list ii.set wakeup ag to 0 iii.repeat: release spinlock put task to
sleep acquire spinlock if (wakeup ag ==1) exit repeat section
4.release spinlock
12 of 21
101
06/12/2016 09:27 AM
spinlock 2.if(wait_list
is empty) Udi
i.count
++ 3.else
Presentation
"Cache & SpinLocks
& Haim.
A... i.node = get wait_list
http://slideplayer.com/slide/224800/
102
presentation
SemaphoreDownload
APIs Create
a semaphore with initalize counter
103
interrupted by a signal)
int down_interruptible(struct semaphore
Buttons:
*sem) Try lock (never sleep) int down_trylock(struct semaphore
*sem) Un lock semaphore void up(struct semaphore *sem)
104
all callers Many tasks break down into two distinct types of work:
Readers Writers
105
Cancel
Download
RW Semaphore Allow multiple concurrent readers Optimize
106
lock at once writer may continue waiting for the lock while new
reader threads are able to acquire the lock write starvation
107
RW
Semaphore
API
RW
Semaphore
struct:
struct
108
RW
Semaphore
API
Initialize
RW
semaphore
109
110
111
06/12/2016 09:27 AM
count; spinlock_t
struct
struct task_struct
Presentation
"Cache wait_lock;
& SpinLocks
Udilist_head
& Haim. wait_list;
A...
http://slideplayer.com/slide/224800/
*owner; };
112
Download presentation
time (binary semaphore) Only the owner of the mutex can unlock the
mutex Recursive locks Improvement: try to spin for acquisition when
we nd that there are no pending waiters and the lock owner is
We think you have liked this presentation. If you wish to
currently running on a (dierent) CPU (it is likely to release the lock
download it, please recommend it to your friends in any social
soon)
system. Share buttons are a little bit lower. Thank you!
113
114
115
116
117
118
writer is active) free access for readers Cons reader may sometimes
be forced to read the same data several times until it gets a valid copy
generally cannot be used to protect data structures involving pointers,
because the reader may be following a pointer that is invalid while the
writer is changing the data structure
119
Seqlocks
API
Initialize
seqlocks:
seqlock_t
lock
120
14 of 21
06/12/2016 09:27 AM
Presentation
& SpinLocks
Udi &
Haim. A... write
Seqlocks
API Cont
Obtaining
121 "Cache
access:
void
http://slideplayer.com/slide/224800/
122
123
Seqlocks
124
125
126
structure Copies data from the old one, Replaces the pointer that is
seen by the read code At this point from reader perspective, the
change is complete. any code entering the critical section sees the
new version of the data
127
pointer (reader might have reference for the pointer) Since all code
holding references to this data structure must (by the rules) be
atomic, we know that once every processor on the system has been
scheduled at least once, all references must be gone. RCU sets aside
a callback that waits until all processors have scheduled; that callback
is then run to perform the cleanup work
128
129
15 of 21
06/12/2016 09:27 AM
return (oldValue
Presentation
"Cache==
& LOCKED);
SpinLocks} Udi & Haim. A...
130
http://slideplayer.com/slide/224800/
131
132
133
134
the dequeue ticket This permits the next waiting thread to enter the
critical section
135
136
137
Linux Scalability
138
142
143
144
operations
only BUT - Still generates lots of trac and contention
16 of
21
06/12/2016 09:27 AM
coherence"Cache
trac on
every successful
access
Presentation
& SpinLocks
Udi & lock
Haim.
A...
145
http://slideplayer.com/slide/224800/
146
147
simple act of spinning for a lock clearly is not going to be good for
performance Cache contention would appear to be less of an issue
(CPU spinning on a lock will cache its contents in a shared mode) No
Cancel
Download
cache bouncing should occur until the CPU owning the lock releases it
(Releasing the lock and its acquisition by another CPU requires writing
to the lock, and that requires exclusive cache access)
148
149
Kernel code will acquire a lock to work with (and, usually, modify) a
structure's contents. Often, changing a eld within the protected
structure will require access to the same cache line that holds the
structure's spinlock. Case 1: lock is uncontended, that access is not a
problem, the CPU owning the lock probably owns the cache line as
well. Case 2: lock is contended, there will be one or more other CPUs
constantly querying its value, obtaining shared access to that same
cache line and depriving the lock holder of the exclusive access it
needs. A subsequent modication of data within the aected cache
line will thus incur a cache miss. So CPUs querying a contended lock
can slow the lock owner considerably, even though that owner is not
accessing the lock directly.
150
151
17 of 21
06/12/2016 09:27 AM
Presentation
"Cache
& SpinLocks
Udi &
Haim.than
A... spinning tightly and
Lock
With Backo
Rather
152 Spin
http://slideplayer.com/slide/224800/
153
154
Spin Lock With Backo Cons Cons too much looping will
cause the lock to sit idle before the owner of the next ticket notices
Cancel
Download
that its turn has come; that, too, will hurt performance All threads
spin on the same shared location causing cache-coherence trac on
every successful lock access
155
156
157
Array
Lock
Acquire
int arr_lock_lock
(struct arr_lock
158
159
160
Array Lock Cons How to solve it: Pad array elements so that
06/12/2016 09:27 AM
distinct elements
mapped to
distinct
cache
Presentation
"Cache are
& SpinLocks
Udi
& Haim.
A...lines
161
http://slideplayer.com/slide/224800/
Download presentation
162
163
164
and-increment
(nr_requests)
//get
ticket
while(ticket
!=nr_releases[(ticket+1) %2]) //wait for my turn Lock release:
nr_releases[(ticket % 2)]+=2 //increment by 2
165
166
MCS Lock List Based Queue Lock Goals: Reduce bus trafc
167
168
169
170
171
06/12/2016 09:27 AM
Presentation
"Cache & Cache
SpinLocks
Haim. A...given below is used to
Line Udi
the &technique
172 Exclusive
http://slideplayer.com/slide/224800/
173
174
Exclusive
Memory pool
Lookaside Caches Allocating many objects of
Buttons:
the same size, over and over in the kernel. API kmem_cache_t
*kmem_cache_create(const char *name, size_t size, size_t oset,
unsigned long ags, void (*constructor)(void *, kmem_cache_t *,
unsigned long ags), void (*destructor)(void *, kmem_cache_t *, 5
unsigned long ags)); void *kmem_cache_alloc(kmem_cache_t
*cache, int ags); void kmem_cache_free(kmem_cache_t *cache,
const void *obj); int kmem_cache_destroy(kmem_cache_t *cache);
Cancel
Download
ags = SLAB_HWCACHE_ALIGN This ag requires each data object to
be aligned to a cache line;
175
int
mempool_resize(mempool_t *pool, int new_min_nr, int gfp_mask);
void mempool_destroy(mempool_t *pool);
176
177
Testing
178
179
06/12/2016 09:27 AM
ioctl control
then&following:
of spinlock:
ticket lock Array
Presentation
"Cache
SpinLockscreation
Udi & Haim.
A...
http://slideplayer.com/slide/224800/
lock MCS lock acquire spinlock (the which was created) Release
spinlock (the which was created)
180
Download
presentation
Testing Application
- User
The test application will generate
each run a dierent type of spinlock The test application will run
several fork\threads (one for each CPU core) Each thread will run on
We think you have liked this presentation. If you wish to
separate (unique) core (sched_setanity)
download it, please recommend it to your friends in any social
system. Share
buttons
are Pseudo
a little bitCode:
lower.fopen
Thank you!
Testing Application
User
Cont
181
182
183
Cancel
Testing Application User Cont Pseudo Code:
fopen Download
184
Testing
Application
User
Cont
Inside
thread
185
Thank you
Feedback
About project
Privacy Policy
SlidePlayer
Feedback
Terms of Service
Search...
21 of 21
Search
06/12/2016 09:27 AM