Skip to content

Commit 77e4424

Browse files
author
Ingo Molnar
committed
Merge branch 'linus' into x86/kprobes
2 parents d54191b + 4515889 commit 77e4424

File tree

1,728 files changed

+60983
-186945
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,728 files changed

+60983
-186945
lines changed

Documentation/IRQ-affinity.txt

Lines changed: 28 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,26 @@
1+
ChangeLog:
2+
Started by Ingo Molnar <mingo@redhat.com>
3+
Update by Max Krasnyansky <maxk@qualcomm.com>
14

2-
SMP IRQ affinity, started by Ingo Molnar <mingo@redhat.com>
3-
5+
SMP IRQ affinity
46

57
/proc/irq/IRQ#/smp_affinity specifies which target CPUs are permitted
68
for a given IRQ source. It's a bitmask of allowed CPUs. It's not allowed
79
to turn off all CPUs, and if an IRQ controller does not support IRQ
810
affinity then the value will not change from the default 0xffffffff.
911

12+
/proc/irq/default_smp_affinity specifies default affinity mask that applies
13+
to all non-active IRQs. Once IRQ is allocated/activated its affinity bitmask
14+
will be set to the default mask. It can then be changed as described above.
15+
Default mask is 0xffffffff.
16+
1017
Here is an example of restricting IRQ44 (eth1) to CPU0-3 then restricting
11-
the IRQ to CPU4-7 (this is an 8-CPU SMP box):
18+
it to CPU4-7 (this is an 8-CPU SMP box):
1219

20+
[root@moon 44]# cd /proc/irq/44
1321
[root@moon 44]# cat smp_affinity
1422
ffffffff
23+
1524
[root@moon 44]# echo 0f > smp_affinity
1625
[root@moon 44]# cat smp_affinity
1726
0000000f
@@ -21,17 +30,27 @@ PING hell (195.4.7.3): 56 data bytes
2130
--- hell ping statistics ---
2231
6029 packets transmitted, 6027 packets received, 0% packet loss
2332
round-trip min/avg/max = 0.1/0.1/0.4 ms
24-
[root@moon 44]# cat /proc/interrupts | grep 44:
25-
44: 0 1785 1785 1783 1783 1
26-
1 0 IO-APIC-level eth1
33+
[root@moon 44]# cat /proc/interrupts | grep 'CPU\|44:'
34+
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
35+
44: 1068 1785 1785 1783 0 0 0 0 IO-APIC-level eth1
36+
37+
As can be seen from the line above IRQ44 was delivered only to the first four
38+
processors (0-3).
39+
Now lets restrict that IRQ to CPU(4-7).
40+
2741
[root@moon 44]# echo f0 > smp_affinity
42+
[root@moon 44]# cat smp_affinity
43+
000000f0
2844
[root@moon 44]# ping -f h
2945
PING hell (195.4.7.3): 56 data bytes
3046
..
3147
--- hell ping statistics ---
3248
2779 packets transmitted, 2777 packets received, 0% packet loss
3349
round-trip min/avg/max = 0.1/0.5/585.4 ms
34-
[root@moon 44]# cat /proc/interrupts | grep 44:
35-
44: 1068 1785 1785 1784 1784 1069 1070 1069 IO-APIC-level eth1
36-
[root@moon 44]#
50+
[root@moon 44]# cat /proc/interrupts | 'CPU\|44:'
51+
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
52+
44: 1068 1785 1785 1783 1784 1069 1070 1069 IO-APIC-level eth1
53+
54+
This time around IRQ44 was delivered only to the last four processors.
55+
i.e counters for the CPU0-3 did not change.
3756

Documentation/RCU/NMI-RCU.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,9 @@ Since NMI handlers disable preemption, synchronize_sched() is guaranteed
9393
not to return until all ongoing NMI handlers exit. It is therefore safe
9494
to free up the handler's data as soon as synchronize_sched() returns.
9595

96+
Important note: for this to work, the architecture in question must
97+
invoke irq_enter() and irq_exit() on NMI entry and exit, respectively.
98+
9699

97100
Answer to Quick Quiz
98101

Documentation/RCU/RTFP.txt

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,10 @@ of each iteration. Unfortunately, chaotic relaxation requires highly
5252
structured data, such as the matrices used in scientific programs, and
5353
is thus inapplicable to most data structures in operating-system kernels.
5454

55+
In 1992, Henry (now Alexia) Massalin completed a dissertation advising
56+
parallel programmers to defer processing when feasible to simplify
57+
synchronization. RCU makes extremely heavy use of this advice.
58+
5559
In 1993, Jacobson [Jacobson93] verbally described what is perhaps the
5660
simplest deferred-free technique: simply waiting a fixed amount of time
5761
before freeing blocks awaiting deferred free. Jacobson did not describe
@@ -138,6 +142,13 @@ blocking in read-side critical sections appeared [PaulEMcKenney2006c],
138142
Robert Olsson described an RCU-protected trie-hash combination
139143
[RobertOlsson2006a].
140144

145+
2007 saw the journal version of the award-winning RCU paper from 2006
146+
[ThomasEHart2007a], as well as a paper demonstrating use of Promela
147+
and Spin to mechanically verify an optimization to Oleg Nesterov's
148+
QRCU [PaulEMcKenney2007QRCUspin], a design document describing
149+
preemptible RCU [PaulEMcKenney2007PreemptibleRCU], and the three-part
150+
LWN "What is RCU?" series [PaulEMcKenney2007WhatIsRCUFundamentally,
151+
PaulEMcKenney2008WhatIsRCUUsage, and PaulEMcKenney2008WhatIsRCUAPI].
141152

142153
Bibtex Entries
143154

@@ -202,6 +213,20 @@ Bibtex Entries
202213
,Year="1991"
203214
}
204215

216+
@phdthesis{HMassalinPhD
217+
,author="H. Massalin"
218+
,title="Synthesis: An Efficient Implementation of Fundamental Operating
219+
System Services"
220+
,school="Columbia University"
221+
,address="New York, NY"
222+
,year="1992"
223+
,annotation="
224+
Mondo optimizing compiler.
225+
Wait-free stuff.
226+
Good advice: defer work to avoid synchronization.
227+
"
228+
}
229+
205230
@unpublished{Jacobson93
206231
,author="Van Jacobson"
207232
,title="Avoid Read-Side Locking Via Delayed Free"
@@ -635,3 +660,86 @@ Revised:
635660
"
636661
}
637662

663+
@unpublished{PaulEMcKenney2007PreemptibleRCU
664+
,Author="Paul E. McKenney"
665+
,Title="The design of preemptible read-copy-update"
666+
,month="October"
667+
,day="8"
668+
,year="2007"
669+
,note="Available:
670+
\url{http://lwn.net/Articles/253651/}
671+
[Viewed October 25, 2007]"
672+
,annotation="
673+
LWN article describing the design of preemptible RCU.
674+
"
675+
}
676+
677+
########################################################################
678+
#
679+
# "What is RCU?" LWN series.
680+
#
681+
682+
@unpublished{PaulEMcKenney2007WhatIsRCUFundamentally
683+
,Author="Paul E. McKenney and Jonathan Walpole"
684+
,Title="What is {RCU}, Fundamentally?"
685+
,month="December"
686+
,day="17"
687+
,year="2007"
688+
,note="Available:
689+
\url{http://lwn.net/Articles/262464/}
690+
[Viewed December 27, 2007]"
691+
,annotation="
692+
Lays out the three basic components of RCU: (1) publish-subscribe,
693+
(2) wait for pre-existing readers to complete, and (2) maintain
694+
multiple versions.
695+
"
696+
}
697+
698+
@unpublished{PaulEMcKenney2008WhatIsRCUUsage
699+
,Author="Paul E. McKenney"
700+
,Title="What is {RCU}? Part 2: Usage"
701+
,month="January"
702+
,day="4"
703+
,year="2008"
704+
,note="Available:
705+
\url{http://lwn.net/Articles/263130/}
706+
[Viewed January 4, 2008]"
707+
,annotation="
708+
Lays out six uses of RCU:
709+
1. RCU is a Reader-Writer Lock Replacement
710+
2. RCU is a Restricted Reference-Counting Mechanism
711+
3. RCU is a Bulk Reference-Counting Mechanism
712+
4. RCU is a Poor Man's Garbage Collector
713+
5. RCU is a Way of Providing Existence Guarantees
714+
6. RCU is a Way of Waiting for Things to Finish
715+
"
716+
}
717+
718+
@unpublished{PaulEMcKenney2008WhatIsRCUAPI
719+
,Author="Paul E. McKenney"
720+
,Title="{RCU} part 3: the {RCU} {API}"
721+
,month="January"
722+
,day="17"
723+
,year="2008"
724+
,note="Available:
725+
\url{http://lwn.net/Articles/264090/}
726+
[Viewed January 10, 2008]"
727+
,annotation="
728+
Gives an overview of the Linux-kernel RCU API and a brief annotated RCU
729+
bibliography.
730+
"
731+
}
732+
733+
@article{DinakarGuniguntala2008IBMSysJ
734+
,author="D. Guniguntala and P. E. McKenney and J. Triplett and J. Walpole"
735+
,title="The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with {Linux}"
736+
,Year="2008"
737+
,Month="April"
738+
,journal="IBM Systems Journal"
739+
,volume="47"
740+
,number="2"
741+
,pages="@@-@@"
742+
,annotation="
743+
RCU, realtime RCU, sleepable RCU, performance.
744+
"
745+
}

Documentation/RCU/checklist.txt

Lines changed: 60 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,13 @@ over a rather long period of time, but improvements are always welcome!
1313
detailed performance measurements show that RCU is nonetheless
1414
the right tool for the job.
1515

16-
The other exception would be where performance is not an issue,
17-
and RCU provides a simpler implementation. An example of this
18-
situation is the dynamic NMI code in the Linux 2.6 kernel,
19-
at least on architectures where NMIs are rare.
16+
Another exception is where performance is not an issue, and RCU
17+
provides a simpler implementation. An example of this situation
18+
is the dynamic NMI code in the Linux 2.6 kernel, at least on
19+
architectures where NMIs are rare.
20+
21+
Yet another exception is where the low real-time latency of RCU's
22+
read-side primitives is critically important.
2023

2124
1. Does the update code have proper mutual exclusion?
2225

@@ -39,9 +42,10 @@ over a rather long period of time, but improvements are always welcome!
3942

4043
2. Do the RCU read-side critical sections make proper use of
4144
rcu_read_lock() and friends? These primitives are needed
42-
to suppress preemption (or bottom halves, in the case of
43-
rcu_read_lock_bh()) in the read-side critical sections,
44-
and are also an excellent aid to readability.
45+
to prevent grace periods from ending prematurely, which
46+
could result in data being unceremoniously freed out from
47+
under your read-side code, which can greatly increase the
48+
actuarial risk of your kernel.
4549

4650
As a rough rule of thumb, any dereference of an RCU-protected
4751
pointer must be covered by rcu_read_lock() or rcu_read_lock_bh()
@@ -54,15 +58,30 @@ over a rather long period of time, but improvements are always welcome!
5458
be running while updates are in progress. There are a number
5559
of ways to handle this concurrency, depending on the situation:
5660

57-
a. Make updates appear atomic to readers. For example,
61+
a. Use the RCU variants of the list and hlist update
62+
primitives to add, remove, and replace elements on an
63+
RCU-protected list. Alternatively, use the RCU-protected
64+
trees that have been added to the Linux kernel.
65+
66+
This is almost always the best approach.
67+
68+
b. Proceed as in (a) above, but also maintain per-element
69+
locks (that are acquired by both readers and writers)
70+
that guard per-element state. Of course, fields that
71+
the readers refrain from accessing can be guarded by the
72+
update-side lock.
73+
74+
This works quite well, also.
75+
76+
c. Make updates appear atomic to readers. For example,
5877
pointer updates to properly aligned fields will appear
5978
atomic, as will individual atomic primitives. Operations
6079
performed under a lock and sequences of multiple atomic
6180
primitives will -not- appear to be atomic.
6281

63-
This is almost always the best approach.
82+
This can work, but is starting to get a bit tricky.
6483

65-
b. Carefully order the updates and the reads so that
84+
d. Carefully order the updates and the reads so that
6685
readers see valid data at all phases of the update.
6786
This is often more difficult than it sounds, especially
6887
given modern CPUs' tendency to reorder memory references.
@@ -123,18 +142,22 @@ over a rather long period of time, but improvements are always welcome!
123142
when publicizing a pointer to a structure that can
124143
be traversed by an RCU read-side critical section.
125144

126-
5. If call_rcu(), or a related primitive such as call_rcu_bh(),
127-
is used, the callback function must be written to be called
128-
from softirq context. In particular, it cannot block.
145+
5. If call_rcu(), or a related primitive such as call_rcu_bh() or
146+
call_rcu_sched(), is used, the callback function must be
147+
written to be called from softirq context. In particular,
148+
it cannot block.
129149

130150
6. Since synchronize_rcu() can block, it cannot be called from
131-
any sort of irq context.
151+
any sort of irq context. Ditto for synchronize_sched() and
152+
synchronize_srcu().
132153

133154
7. If the updater uses call_rcu(), then the corresponding readers
134155
must use rcu_read_lock() and rcu_read_unlock(). If the updater
135156
uses call_rcu_bh(), then the corresponding readers must use
136-
rcu_read_lock_bh() and rcu_read_unlock_bh(). Mixing things up
137-
will result in confusion and broken kernels.
157+
rcu_read_lock_bh() and rcu_read_unlock_bh(). If the updater
158+
uses call_rcu_sched(), then the corresponding readers must
159+
disable preemption. Mixing things up will result in confusion
160+
and broken kernels.
138161

139162
One exception to this rule: rcu_read_lock() and rcu_read_unlock()
140163
may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh()
@@ -143,9 +166,9 @@ over a rather long period of time, but improvements are always welcome!
143166
such cases is a must, of course! And the jury is still out on
144167
whether the increased speed is worth it.
145168

146-
8. Although synchronize_rcu() is a bit slower than is call_rcu(),
147-
it usually results in simpler code. So, unless update
148-
performance is critically important or the updaters cannot block,
169+
8. Although synchronize_rcu() is slower than is call_rcu(), it
170+
usually results in simpler code. So, unless update performance
171+
is critically important or the updaters cannot block,
149172
synchronize_rcu() should be used in preference to call_rcu().
150173

151174
An especially important property of the synchronize_rcu()
@@ -187,23 +210,23 @@ over a rather long period of time, but improvements are always welcome!
187210
number of updates per grace period.
188211

189212
9. All RCU list-traversal primitives, which include
190-
list_for_each_rcu(), list_for_each_entry_rcu(),
213+
rcu_dereference(), list_for_each_rcu(), list_for_each_entry_rcu(),
191214
list_for_each_continue_rcu(), and list_for_each_safe_rcu(),
192-
must be within an RCU read-side critical section. RCU
215+
must be either within an RCU read-side critical section or
216+
must be protected by appropriate update-side locks. RCU
193217
read-side critical sections are delimited by rcu_read_lock()
194218
and rcu_read_unlock(), or by similar primitives such as
195219
rcu_read_lock_bh() and rcu_read_unlock_bh().
196220

197-
Use of the _rcu() list-traversal primitives outside of an
198-
RCU read-side critical section causes no harm other than
199-
a slight performance degradation on Alpha CPUs. It can
200-
also be quite helpful in reducing code bloat when common
201-
code is shared between readers and updaters.
221+
The reason that it is permissible to use RCU list-traversal
222+
primitives when the update-side lock is held is that doing so
223+
can be quite helpful in reducing code bloat when common code is
224+
shared between readers and updaters.
202225

203226
10. Conversely, if you are in an RCU read-side critical section,
204-
you -must- use the "_rcu()" variants of the list macros.
205-
Failing to do so will break Alpha and confuse people reading
206-
your code.
227+
and you don't hold the appropriate update-side lock, you -must-
228+
use the "_rcu()" variants of the list macros. Failing to do so
229+
will break Alpha and confuse people reading your code.
207230

208231
11. Note that synchronize_rcu() -only- guarantees to wait until
209232
all currently executing rcu_read_lock()-protected RCU read-side
@@ -230,6 +253,14 @@ over a rather long period of time, but improvements are always welcome!
230253
must use whatever locking or other synchronization is required
231254
to safely access and/or modify that data structure.
232255

256+
RCU callbacks are -usually- executed on the same CPU that executed
257+
the corresponding call_rcu(), call_rcu_bh(), or call_rcu_sched(),
258+
but are by -no- means guaranteed to be. For example, if a given
259+
CPU goes offline while having an RCU callback pending, then that
260+
RCU callback will execute on some surviving CPU. (If this was
261+
not the case, a self-spawning RCU callback would prevent the
262+
victim CPU from ever going offline.)
263+
233264
14. SRCU (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu())
234265
may only be invoked from process context. Unlike other forms of
235266
RCU, it -is- permissible to block in an SRCU read-side critical

0 commit comments

Comments
 (0)