@@ -23,8 +23,6 @@ <h3>Introduction</h3>
23
23
The < tt > rcu_segcblist</ tt > Structure</ a >
24
24
< li > < a href ="#The rcu_data Structure ">
25
25
The < tt > rcu_data</ tt > Structure</ a >
26
- < li > < a href ="#The rcu_dynticks Structure ">
27
- The < tt > rcu_dynticks</ tt > Structure</ a >
28
26
< li > < a href ="#The rcu_head Structure ">
29
27
The < tt > rcu_head</ tt > Structure</ a >
30
28
< li > < a href ="#RCU-Specific Fields in the task_struct Structure ">
@@ -127,9 +125,11 @@ <h3><a name="Data-Structure Relationships">Data-Structure Relationships</a></h3>
127
125
</ p > < p > RCU currently permits up to a four-level tree, which on a 64-bit system
128
126
accommodates up to 4,194,304 CPUs, though only a mere 524,288 CPUs for
129
127
32-bit systems.
130
- On the other hand, you can set < tt > CONFIG_RCU_FANOUT</ tt > to be
131
- as small as 2 if you wish, which would permit only 16 CPUs, which
132
- is useful for testing.
128
+ On the other hand, you can set both < tt > CONFIG_RCU_FANOUT</ tt > and
129
+ < tt > CONFIG_RCU_FANOUT_LEAF</ tt > to be as small as 2, which would result
130
+ in a 16-CPU test using a 4-level tree.
131
+ This can be useful for testing large-system capabilities on small test
132
+ machines.
133
133
134
134
</ p > < p > This multi-level combining tree allows us to get most of the
135
135
performance and scalability
@@ -154,44 +154,9 @@ <h3><a name="Data-Structure Relationships">Data-Structure Relationships</a></h3>
154
154
keeping lock contention under control at all tree levels regardless
155
155
of the level of loading on the system.
156
156
157
- </ p > < p > The Linux kernel actually supports multiple flavors of RCU
158
- running concurrently, so RCU builds separate data structures for each
159
- flavor.
160
- For example, for < tt > CONFIG_TREE_RCU=y</ tt > kernels, RCU provides
161
- rcu_sched and rcu_bh, as shown below:
162
-
163
- </ p > < p > < img src ="BigTreeClassicRCUBH.svg " alt ="BigTreeClassicRCUBH.svg " width ="33% ">
164
-
165
- </ p > < p > Energy efficiency is increasingly important, and for that
166
- reason the Linux kernel provides < tt > CONFIG_NO_HZ_IDLE</ tt > , which
167
- turns off the scheduling-clock interrupts on idle CPUs, which in
168
- turn allows those CPUs to attain deeper sleep states and to consume
169
- less energy.
170
- CPUs whose scheduling-clock interrupts have been turned off are
171
- said to be in < i > dyntick-idle mode</ i > .
172
- RCU must handle dyntick-idle CPUs specially
173
- because RCU would otherwise wake up each CPU on every grace period,
174
- which would defeat the whole purpose of < tt > CONFIG_NO_HZ_IDLE</ tt > .
175
- RCU uses the < tt > rcu_dynticks</ tt > structure to track
176
- which CPUs are in dyntick idle mode, as shown below:
177
-
178
- </ p > < p > < img src ="BigTreeClassicRCUBHdyntick.svg " alt ="BigTreeClassicRCUBHdyntick.svg " width ="33% ">
179
-
180
- </ p > < p > However, if a CPU is in dyntick-idle mode, it is in that mode
181
- for all flavors of RCU.
182
- Therefore, a single < tt > rcu_dynticks</ tt > structure is allocated per
183
- CPU, and all of a given CPU's < tt > rcu_data</ tt > structures share
184
- that < tt > rcu_dynticks</ tt > , as shown in the figure.
185
-
186
- </ p > < p > Kernels built with < tt > CONFIG_PREEMPT_RCU</ tt > support
187
- rcu_preempt in addition to rcu_sched and rcu_bh, as shown below:
188
-
189
- </ p > < p > < img src ="BigTreePreemptRCUBHdyntick.svg " alt ="BigTreePreemptRCUBHdyntick.svg " width ="35% ">
190
-
191
157
</ p > < p > RCU updaters wait for normal grace periods by registering
192
158
RCU callbacks, either directly via < tt > call_rcu()</ tt > and
193
159
friends (namely < tt > call_rcu_bh()</ tt > and < tt > call_rcu_sched()</ tt > ),
194
- there being a separate interface per flavor of RCU)
195
160
or indirectly via < tt > synchronize_rcu()</ tt > and friends.
196
161
RCU callbacks are represented by < tt > rcu_head</ tt > structures,
197
162
which are queued on < tt > rcu_data</ tt > structures while they are
@@ -214,9 +179,6 @@ <h3><a name="Data-Structure Relationships">Data-Structure Relationships</a></h3>
214
179
< li > Each < tt > rcu_node</ tt > structure has a spinlock.
215
180
< li > The fields in < tt > rcu_data</ tt > are private to the corresponding
216
181
CPU, although a few can be read and written by other CPUs.
217
- < li > Similarly, the fields in < tt > rcu_dynticks</ tt > are private
218
- to the corresponding CPU, although a few can be read by
219
- other CPUs.
220
182
</ ol >
221
183
222
184
< p > It is important to note that different data structures can have
@@ -272,11 +234,6 @@ <h3><a name="Data-Structure Relationships">Data-Structure Relationships</a></h3>
272
234
access to this information from the corresponding CPU.
273
235
Finally, this structure records past dyntick-idle state
274
236
for the corresponding CPU and also tracks statistics.
275
- < li > < tt > rcu_dynticks</ tt > :
276
- This per-CPU structure tracks the current dyntick-idle
277
- state for the corresponding CPU.
278
- Unlike the other three structures, the < tt > rcu_dynticks</ tt >
279
- structure is not replicated per RCU flavor.
280
237
< li > < tt > rcu_head</ tt > :
281
238
This structure represents RCU callbacks, and is the
282
239
only structure allocated and managed by RCU users.
@@ -287,14 +244,14 @@ <h3><a name="Data-Structure Relationships">Data-Structure Relationships</a></h3>
287
244
< p > If all you wanted from this article was a general notion of how
288
245
RCU's data structures are related, you are done.
289
246
Otherwise, each of the following sections give more details on
290
- the < tt > rcu_state</ tt > , < tt > rcu_node</ tt > , < tt > rcu_data</ tt > ,
291
- and < tt > rcu_dynticks </ tt > data structures.
247
+ the < tt > rcu_state</ tt > , < tt > rcu_node</ tt > and < tt > rcu_data</ tt > data
248
+ structures.
292
249
293
250
< h3 > < a name ="The rcu_state Structure ">
294
251
The < tt > rcu_state</ tt > Structure</ a > </ h3 >
295
252
296
253
< p > The < tt > rcu_state</ tt > structure is the base structure that
297
- represents a flavor of RCU.
254
+ represents the state of RCU in the system .
298
255
This structure forms the interconnection between the
299
256
< tt > rcu_node</ tt > and < tt > rcu_data</ tt > structures,
300
257
tracks grace periods, contains the lock used to
@@ -389,7 +346,7 @@ <h5>Grace-Period Tracking</h5>
389
346
The bottom two bits are the state of the current grace period,
390
347
which can be zero for not yet started or one for in progress.
391
348
In other words, if the bottom two bits of < tt > ->gp_seq</ tt > are
392
- zero, the corresponding flavor of RCU is idle.
349
+ zero, then RCU is idle.
393
350
Any other value in the bottom two bits indicates that something is broken.
394
351
This field is protected by the root < tt > rcu_node</ tt > structure's
395
352
< tt > ->lock</ tt > field.
@@ -419,10 +376,10 @@ <h5>Miscellaneous</h5>
419
376
grace period in jiffies.
420
377
It is protected by the root < tt > rcu_node</ tt > 's < tt > ->lock</ tt > .
421
378
422
- < p > The < tt > ->name</ tt > field points to the name of the RCU flavor
423
- (for example, “rcu_sched ”), and is constant.
424
- The < tt > ->abbr </ tt > field contains a one-character abbreviation,
425
- for example, “s” for RCU-sched .
379
+ < p > The < tt > ->name</ tt > and < tt > ->abbr </ tt > fields distinguish
380
+ between preemptible RCU ( “rcu_preempt ” and “p”)
381
+ and non-preemptible RCU (“rcu_sched” and “s”).
382
+ These fields are used for diagnostic and tracing purposes .
426
383
427
384
< h3 > < a name ="The rcu_node Structure ">
428
385
The < tt > rcu_node</ tt > Structure</ a > </ h3 >
@@ -971,25 +928,31 @@ <h3><a name="The rcu_segcblist Structure">
971
928
pointer.
972
929
The reason for this is that all the ready-to-invoke callbacks
973
930
(that is, those in the < tt > RCU_DONE_TAIL</ tt > segment) are extracted
974
- all at once at callback-invocation time.
931
+ all at once at callback-invocation time (< tt > rcu_do_batch</ tt > ), due
932
+ to which < tt > ->head</ tt > may be set to NULL if there are no not-done
933
+ callbacks remaining in the < tt > rcu_segcblist</ tt > .
975
934
If callback invocation must be postponed, for example, because a
976
935
high-priority process just woke up on this CPU, then the remaining
977
- callbacks are placed back on the < tt > RCU_DONE_TAIL</ tt > segment.
978
- Either way, the < tt > ->len</ tt > and < tt > ->len_lazy</ tt > counts
979
- are adjusted after the corresponding callbacks have been invoked, and so
980
- again it is the < tt > ->len</ tt > count that accurately reflects whether
981
- or not there are callbacks associated with this < tt > rcu_segcblist</ tt >
982
- structure.
936
+ callbacks are placed back on the < tt > RCU_DONE_TAIL</ tt > segment and
937
+ < tt > ->head</ tt > once again points to the start of the segment.
938
+ In short, the head field can briefly be < tt > NULL</ tt > even though the
939
+ CPU has callbacks present the entire time.
940
+ Therefore, it is not appropriate to test the < tt > ->head</ tt > pointer
941
+ for < tt > NULL</ tt > .
942
+
943
+ < p > In contrast, the < tt > ->len</ tt > and < tt > ->len_lazy</ tt > counts
944
+ are adjusted only after the corresponding callbacks have been invoked.
945
+ This means that the < tt > ->len</ tt > count is zero only if
946
+ the < tt > rcu_segcblist</ tt > structure really is devoid of callbacks.
983
947
Of course, off-CPU sampling of the < tt > ->len</ tt > count requires
984
- the use of appropriate synchronization, for example, memory barriers.
948
+ careful use of appropriate synchronization, for example, memory barriers.
985
949
This synchronization can be a bit subtle, particularly in the case
986
950
of < tt > rcu_barrier()</ tt > .
987
951
988
952
< h3 > < a name ="The rcu_data Structure ">
989
953
The < tt > rcu_data</ tt > Structure</ a > </ h3 >
990
954
991
- < p > The < tt > rcu_data</ tt > maintains the per-CPU state for the
992
- corresponding flavor of RCU.
955
+ < p > The < tt > rcu_data</ tt > maintains the per-CPU state for the RCU subsystem.
993
956
The fields in this structure may be accessed only from the corresponding
994
957
CPU (and from tracing) unless otherwise stated.
995
958
This structure is the
@@ -1015,30 +978,19 @@ <h5>Connection to Other Data Structures</h5>
1015
978
1016
979
< pre >
1017
980
1 int cpu;
1018
- 2 struct rcu_state *rsp;
1019
- 3 struct rcu_node *mynode;
1020
- 4 struct rcu_dynticks *dynticks;
1021
- 5 unsigned long grpmask;
1022
- 6 bool beenonline;
981
+ 2 struct rcu_node *mynode;
982
+ 3 unsigned long grpmask;
983
+ 4 bool beenonline;
1023
984
</ pre >
1024
985
1025
986
< p > The < tt > ->cpu</ tt > field contains the number of the
1026
- corresponding CPU, the < tt > ->rsp</ tt > pointer references
1027
- the corresponding < tt > rcu_state</ tt > structure (and is most frequently
1028
- used to locate the name of the corresponding flavor of RCU for tracing),
1029
- and the < tt > ->mynode</ tt > field references the corresponding
1030
- < tt > rcu_node</ tt > structure.
987
+ corresponding CPU and the < tt > ->mynode</ tt > field references the
988
+ corresponding < tt > rcu_node</ tt > structure.
1031
989
The < tt > ->mynode</ tt > is used to propagate quiescent states
1032
990
up the combining tree.
1033
- < p > The < tt > ->dynticks</ tt > pointer references the
1034
- < tt > rcu_dynticks</ tt > structure corresponding to this
1035
- CPU.
1036
- Recall that a single per-CPU instance of the < tt > rcu_dynticks</ tt >
1037
- structure is shared among all flavors of RCU.
1038
- These first four fields are constant and therefore require not
1039
- synchronization.
991
+ These two fields are constant and therefore do not require synchronization.
1040
992
1041
- </ p > < p > The < tt > ->grpmask</ tt > field indicates the bit in
993
+ < p > The < tt > ->grpmask</ tt > field indicates the bit in
1042
994
the < tt > ->mynode->qsmask</ tt > corresponding to this
1043
995
< tt > rcu_data</ tt > structure, and is also used when propagating
1044
996
quiescent states.
@@ -1057,12 +1009,12 @@ <h5>Quiescent-State and Grace-Period Tracking</h5>
1057
1009
3 bool cpu_no_qs;
1058
1010
4 bool core_needs_qs;
1059
1011
5 bool gpwrap;
1060
- 6 unsigned long rcu_qs_ctr_snap;
1061
1012
</ pre >
1062
1013
1063
- < p > The < tt > ->gp_seq</ tt > and < tt > ->gp_seq_needed</ tt >
1064
- fields are the counterparts of the fields of the same name
1065
- in the < tt > rcu_state</ tt > and < tt > rcu_node</ tt > structures.
1014
+ < p > The < tt > ->gp_seq</ tt > field is the counterpart of the field of the same
1015
+ name in the < tt > rcu_state</ tt > and < tt > rcu_node</ tt > structures. The
1016
+ < tt > ->gp_seq_needed</ tt > field is the counterpart of the field of the same
1017
+ name in the rcu_node</ tt > structure.
1066
1018
They may each lag up to one behind their < tt > rcu_node</ tt >
1067
1019
counterparts, but in < tt > CONFIG_NO_HZ_IDLE</ tt > and
1068
1020
< tt > CONFIG_NO_HZ_FULL</ tt > kernels can lag
@@ -1103,10 +1055,6 @@ <h5>Quiescent-State and Grace-Period Tracking</h5>
1103
1055
< tt > gp_seq</ tt > counter is in danger of overflow, which
1104
1056
will cause the CPU to disregard the values of its counters on
1105
1057
its next exit from idle.
1106
- Finally, the < tt > rcu_qs_ctr_snap</ tt > field is used to detect
1107
- cases where a given operation has resulted in a quiescent state
1108
- for all flavors of RCU, for example, < tt > cond_resched()</ tt >
1109
- when RCU has indicated a need for quiescent states.
1110
1058
1111
1059
< h5 > RCU Callback Handling</ h5 >
1112
1060
@@ -1179,26 +1127,22 @@ <h5>Dyntick-Idle Handling</h5>
1179
1127
count the number of times this CPU is determined to be in
1180
1128
dyntick-idle state, and is used for tracing and debugging purposes.
1181
1129
1182
- < h3 > < a name ="The rcu_dynticks Structure ">
1183
- The < tt > rcu_dynticks</ tt > Structure</ a > </ h3 >
1184
-
1185
- < p > The < tt > rcu_dynticks</ tt > maintains the per-CPU dyntick-idle state
1186
- for the corresponding CPU.
1187
- Unlike the other structures, < tt > rcu_dynticks</ tt > is not
1188
- replicated over the different flavors of RCU.
1189
- The fields in this structure may be accessed only from the corresponding
1190
- CPU (and from tracing) unless otherwise stated.
1191
- Its fields are as follows:
1130
+ < p >
1131
+ This portion of the rcu_data structure is declared as follows:
1192
1132
1193
1133
< pre >
1194
1134
1 long dynticks_nesting;
1195
1135
2 long dynticks_nmi_nesting;
1196
1136
3 atomic_t dynticks;
1197
1137
4 bool rcu_need_heavy_qs;
1198
- 5 unsigned long rcu_qs_ctr;
1199
- 6 bool rcu_urgent_qs;
1138
+ 5 bool rcu_urgent_qs;
1200
1139
</ pre >
1201
1140
1141
+ < p > These fields in the rcu_data structure maintain the per-CPU dyntick-idle
1142
+ state for the corresponding CPU.
1143
+ The fields may be accessed only from the corresponding CPU (and from tracing)
1144
+ unless otherwise stated.
1145
+
1202
1146
< p > The < tt > ->dynticks_nesting</ tt > field counts the
1203
1147
nesting depth of process execution, so that in normal circumstances
1204
1148
this counter has value zero or one.
@@ -1240,19 +1184,12 @@ <h3><a name="The rcu_dynticks Structure">
1240
1184
This flag is checked by RCU's context-switch and < tt > cond_resched()</ tt >
1241
1185
code, which provide a momentary idle sojourn in response.
1242
1186
1243
- </ p > < p > The < tt > ->rcu_qs_ctr</ tt > field is used to record
1244
- quiescent states from < tt > cond_resched()</ tt > .
1245
- Because < tt > cond_resched()</ tt > can execute quite frequently, this
1246
- must be quite lightweight, as in a non-atomic increment of this
1247
- per-CPU field.
1248
-
1249
1187
</ p > < p > Finally, the < tt > ->rcu_urgent_qs</ tt > field is used to record
1250
- the fact that the RCU core code would really like to see a quiescent
1251
- state from the corresponding CPU, with the various other fields indicating
1252
- just how badly RCU wants this quiescent state.
1253
- This flag is checked by RCU's context-switch and < tt > cond_resched()</ tt >
1254
- code, which, if nothing else, non-atomically increment < tt > ->rcu_qs_ctr</ tt >
1255
- in response.
1188
+ the fact that the RCU core code would really like to see a quiescent state from
1189
+ the corresponding CPU, with the various other fields indicating just how badly
1190
+ RCU wants this quiescent state.
1191
+ This flag is checked by RCU's context-switch path
1192
+ (< tt > rcu_note_context_switch</ tt > ) and the cond_resched code.
1256
1193
1257
1194
< table >
1258
1195
< tr > < th > </ th > </ tr >
@@ -1425,11 +1362,11 @@ <h3><a name="Accessor Functions">
1425
1362
< h3 > < a name ="Summary ">
1426
1363
Summary</ a > </ h3 >
1427
1364
1428
- So each flavor of RCU is represented by an < tt > rcu_state</ tt > structure,
1365
+ So the state of RCU is represented by an < tt > rcu_state</ tt > structure,
1429
1366
which contains a combining tree of < tt > rcu_node</ tt > and
1430
1367
< tt > rcu_data</ tt > structures.
1431
1368
Finally, in < tt > CONFIG_NO_HZ_IDLE</ tt > kernels, each CPU's dyntick-idle
1432
- state is tracked by an < tt > rcu_dynticks </ tt > structure.
1369
+ state is tracked by dynticks-related fields in the < tt > rcu_data </ tt > structure.
1433
1370
1434
1371
If you made it this far, you are well prepared to read the code
1435
1372
walkthroughs in the other articles in this series.
0 commit comments