Skip to content

Commit 890f242

Browse files
committed
Merge tag 'rcu.2022.09.30a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney: - Documentation updates. This is the first in a series from an ongoing review of the RCU documentation. "Why are people thinking -that- about RCU? Oh. Because that is an entirely reasonable interpretation of its documentation." - Miscellaneous fixes. - Improved memory allocation and heuristics. - Improve rcu_nocbs diagnostic output. - Add full-sized polled RCU grace period state values. These are the same size as an rcu_head structure, which is double that of the traditional unsigned long state values that may still be obtained from et_state_synchronize_rcu(). The added size avoids missing overlapping grace periods. This benefit is that call_rcu() can be replaced by polling, which can be attractive in situations where RCU-protected data is aged out of memory. Early in the series, the size of this state value is three unsigned longs. Later in the series, the fastpaths in synchronize_rcu() and synchronize_rcu_expedited() are reworked to permit the full state to be represented by only two unsigned longs. This reworking slows these two functions down in SMP kernels running either on single-CPU systems or on systems with all but one CPU offlined, but this should not be a significant problem. And if it somehow becomes a problem in some yet-as-unforeseen situations, three-value state values can be provided for only those situations. Finally, a pair of functions named same_state_synchronize_rcu() and same_state_synchronize_rcu_full() allow grace-period state values to be compared for equality. This permits users to maintain lists of data structures having the same state value, removing the need for per-data-structure grace-period state values, thus decreasing memory footprint. - Polled SRCU grace-period updates, including adding tests to rcutorture and reducing the incidence of Tiny SRCU grace-period-state counter wrap. - Improve Tasks RCU diagnostics and quiescent-state detection. * tag 'rcu.2022.09.30a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (55 commits) rcutorture: Use the barrier operation specified by cur_ops rcu-tasks: Make RCU Tasks Trace check for userspace execution rcu-tasks: Ensure RCU Tasks Trace loops have quiescent states rcu-tasks: Convert RCU_LOCKDEP_WARN() to WARN_ONCE() srcu: Make Tiny SRCU use full-sized grace-period counters srcu: Make Tiny SRCU poll_state_synchronize_srcu() more precise srcu: Add GP and maximum requested GP to Tiny SRCU rcutorture output rcutorture: Make "srcud" option also test polled grace-period API rcutorture: Limit read-side polling-API testing rcu: Add functions to compare grace-period state values rcutorture: Expand rcu_torture_write_types() first "if" statement rcutorture: Use 1-suffixed variable in rcu_torture_write_types() check rcu: Make synchronize_rcu() fastpath update only boot-CPU counters rcutorture: Adjust rcu_poll_need_2gp() for rcu_gp_oldstate field removal rcu: Remove ->rgos_polled field from rcu_gp_oldstate structure rcu: Make synchronize_rcu_expedited() fast path update .expedited_sequence rcu: Remove expedited grace-period fast-path forward-progress helper rcu: Make synchronize_rcu() fast path update ->gp_seq counters rcu-tasks: Remove grace-period fast-path rcu-tasks helper rcu: Set rcu_data structures' initial ->gpwrap value to true ...
2 parents b8fb65e + 5c0ec49 commit 890f242

File tree

18 files changed

+813
-186
lines changed

18 files changed

+813
-186
lines changed

Documentation/RCU/checklist.rst

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -66,8 +66,13 @@ over a rather long period of time, but improvements are always welcome!
6666
As a rough rule of thumb, any dereference of an RCU-protected
6767
pointer must be covered by rcu_read_lock(), rcu_read_lock_bh(),
6868
rcu_read_lock_sched(), or by the appropriate update-side lock.
69-
Disabling of preemption can serve as rcu_read_lock_sched(), but
70-
is less readable and prevents lockdep from detecting locking issues.
69+
Explicit disabling of preemption (preempt_disable(), for example)
70+
can serve as rcu_read_lock_sched(), but is less readable and
71+
prevents lockdep from detecting locking issues.
72+
73+
Please not that you *cannot* rely on code known to be built
74+
only in non-preemptible kernels. Such code can and will break,
75+
especially in kernels built with CONFIG_PREEMPT_COUNT=y.
7176

7277
Letting RCU-protected pointers "leak" out of an RCU read-side
7378
critical section is every bit as bad as letting them leak out
@@ -185,6 +190,9 @@ over a rather long period of time, but improvements are always welcome!
185190

186191
5. If call_rcu() or call_srcu() is used, the callback function will
187192
be called from softirq context. In particular, it cannot block.
193+
If you need the callback to block, run that code in a workqueue
194+
handler scheduled from the callback. The queue_rcu_work()
195+
function does this for you in the case of call_rcu().
188196

189197
6. Since synchronize_rcu() can block, it cannot be called
190198
from any sort of irq context. The same rule applies
@@ -297,7 +305,8 @@ over a rather long period of time, but improvements are always welcome!
297305
the machine.
298306

299307
d. Periodically invoke synchronize_rcu(), permitting a limited
300-
number of updates per grace period.
308+
number of updates per grace period. Better yet, periodically
309+
invoke rcu_barrier() to wait for all outstanding callbacks.
301310

302311
The same cautions apply to call_srcu() and kfree_rcu().
303312

Documentation/RCU/rcu_dereference.rst

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -128,10 +128,16 @@ Follow these rules to keep your RCU code working properly:
128128
This sort of comparison occurs frequently when scanning
129129
RCU-protected circular linked lists.
130130

131-
Note that if checks for being within an RCU read-side
132-
critical section are not required and the pointer is never
133-
dereferenced, rcu_access_pointer() should be used in place
134-
of rcu_dereference().
131+
Note that if the pointer comparison is done outside
132+
of an RCU read-side critical section, and the pointer
133+
is never dereferenced, rcu_access_pointer() should be
134+
used in place of rcu_dereference(). In most cases,
135+
it is best to avoid accidental dereferences by testing
136+
the rcu_access_pointer() return value directly, without
137+
assigning it to a variable.
138+
139+
Within an RCU read-side critical section, there is little
140+
reason to use rcu_access_pointer().
135141

136142
- The comparison is against a pointer that references memory
137143
that was initialized "a long time ago." The reason

Documentation/RCU/whatisRCU.rst

Lines changed: 30 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,15 @@ What is RCU? -- "Read, Copy, Update"
66
Please note that the "What is RCU?" LWN series is an excellent place
77
to start learning about RCU:
88

9-
| 1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/
10-
| 2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/
11-
| 3. RCU part 3: the RCU API http://lwn.net/Articles/264090/
12-
| 4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/
13-
| 2010 Big API Table http://lwn.net/Articles/419086/
14-
| 5. The RCU API, 2014 Edition http://lwn.net/Articles/609904/
15-
| 2014 Big API Table http://lwn.net/Articles/609973/
9+
| 1. What is RCU, Fundamentally? https://lwn.net/Articles/262464/
10+
| 2. What is RCU? Part 2: Usage https://lwn.net/Articles/263130/
11+
| 3. RCU part 3: the RCU API https://lwn.net/Articles/264090/
12+
| 4. The RCU API, 2010 Edition https://lwn.net/Articles/418853/
13+
| 2010 Big API Table https://lwn.net/Articles/419086/
14+
| 5. The RCU API, 2014 Edition https://lwn.net/Articles/609904/
15+
| 2014 Big API Table https://lwn.net/Articles/609973/
16+
| 6. The RCU API, 2019 Edition https://lwn.net/Articles/777036/
17+
| 2019 Big API Table https://lwn.net/Articles/777165/
1618
1719

1820
What is RCU?
@@ -915,13 +917,18 @@ which an RCU reference is held include:
915917
The understanding that RCU provides a reference that only prevents a
916918
change of type is particularly visible with objects allocated from a
917919
slab cache marked ``SLAB_TYPESAFE_BY_RCU``. RCU operations may yield a
918-
reference to an object from such a cache that has been concurrently
919-
freed and the memory reallocated to a completely different object,
920-
though of the same type. In this case RCU doesn't even protect the
921-
identity of the object from changing, only its type. So the object
922-
found may not be the one expected, but it will be one where it is safe
923-
to take a reference or spinlock and then confirm that the identity
924-
matches the expectations.
920+
reference to an object from such a cache that has been concurrently freed
921+
and the memory reallocated to a completely different object, though of
922+
the same type. In this case RCU doesn't even protect the identity of the
923+
object from changing, only its type. So the object found may not be the
924+
one expected, but it will be one where it is safe to take a reference
925+
(and then potentially acquiring a spinlock), allowing subsequent code
926+
to check whether the identity matches expectations. It is tempting
927+
to simply acquire the spinlock without first taking the reference, but
928+
unfortunately any spinlock in a ``SLAB_TYPESAFE_BY_RCU`` object must be
929+
initialized after each and every call to kmem_cache_alloc(), which renders
930+
reference-free spinlock acquisition completely unsafe. Therefore, when
931+
using ``SLAB_TYPESAFE_BY_RCU``, make proper use of a reference counter.
925932

926933
With traditional reference counting -- such as that implemented by the
927934
kref library in Linux -- there is typically code that runs when the last
@@ -1057,14 +1064,20 @@ SRCU: Initialization/cleanup::
10571064
init_srcu_struct
10581065
cleanup_srcu_struct
10591066

1060-
All: lockdep-checked RCU-protected pointer access::
1067+
All: lockdep-checked RCU utility APIs::
10611068

1062-
rcu_access_pointer
1063-
rcu_dereference_raw
10641069
RCU_LOCKDEP_WARN
10651070
rcu_sleep_check
10661071
RCU_NONIDLE
10671072

1073+
All: Unchecked RCU-protected pointer access::
1074+
1075+
rcu_dereference_raw
1076+
1077+
All: Unchecked RCU-protected pointer access with dereferencing prohibited::
1078+
1079+
rcu_access_pointer
1080+
10681081
See the comment headers in the source code (or the docbook generated
10691082
from them) for more information.
10701083

include/linux/rcupdate.h

Lines changed: 37 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,31 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func);
4242
void rcu_barrier_tasks(void);
4343
void rcu_barrier_tasks_rude(void);
4444
void synchronize_rcu(void);
45+
46+
struct rcu_gp_oldstate;
4547
unsigned long get_completed_synchronize_rcu(void);
48+
void get_completed_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp);
49+
50+
// Maximum number of unsigned long values corresponding to
51+
// not-yet-completed RCU grace periods.
52+
#define NUM_ACTIVE_RCU_POLL_OLDSTATE 2
53+
54+
/**
55+
* same_state_synchronize_rcu - Are two old-state values identical?
56+
* @oldstate1: First old-state value.
57+
* @oldstate2: Second old-state value.
58+
*
59+
* The two old-state values must have been obtained from either
60+
* get_state_synchronize_rcu(), start_poll_synchronize_rcu(), or
61+
* get_completed_synchronize_rcu(). Returns @true if the two values are
62+
* identical and @false otherwise. This allows structures whose lifetimes
63+
* are tracked by old-state values to push these values to a list header,
64+
* allowing those structures to be slightly smaller.
65+
*/
66+
static inline bool same_state_synchronize_rcu(unsigned long oldstate1, unsigned long oldstate2)
67+
{
68+
return oldstate1 == oldstate2;
69+
}
4670

4771
#ifdef CONFIG_PREEMPT_RCU
4872

@@ -496,13 +520,21 @@ do { \
496520
* against NULL. Although rcu_access_pointer() may also be used in cases
497521
* where update-side locks prevent the value of the pointer from changing,
498522
* you should instead use rcu_dereference_protected() for this use case.
523+
* Within an RCU read-side critical section, there is little reason to
524+
* use rcu_access_pointer().
525+
*
526+
* It is usually best to test the rcu_access_pointer() return value
527+
* directly in order to avoid accidental dereferences being introduced
528+
* by later inattentive changes. In other words, assigning the
529+
* rcu_access_pointer() return value to a local variable results in an
530+
* accident waiting to happen.
499531
*
500532
* It is also permissible to use rcu_access_pointer() when read-side
501-
* access to the pointer was removed at least one grace period ago, as
502-
* is the case in the context of the RCU callback that is freeing up
503-
* the data, or after a synchronize_rcu() returns. This can be useful
504-
* when tearing down multi-linked structures after a grace period
505-
* has elapsed.
533+
* access to the pointer was removed at least one grace period ago, as is
534+
* the case in the context of the RCU callback that is freeing up the data,
535+
* or after a synchronize_rcu() returns. This can be useful when tearing
536+
* down multi-linked structures after a grace period has elapsed. However,
537+
* rcu_dereference_protected() is normally preferred for this use case.
506538
*/
507539
#define rcu_access_pointer(p) __rcu_access_pointer((p), __UNIQUE_ID(rcu), __rcu)
508540

include/linux/rcutiny.h

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,25 +14,75 @@
1414

1515
#include <asm/param.h> /* for HZ */
1616

17+
struct rcu_gp_oldstate {
18+
unsigned long rgos_norm;
19+
};
20+
21+
// Maximum number of rcu_gp_oldstate values corresponding to
22+
// not-yet-completed RCU grace periods.
23+
#define NUM_ACTIVE_RCU_POLL_FULL_OLDSTATE 2
24+
25+
/*
26+
* Are the two oldstate values the same? See the Tree RCU version for
27+
* docbook header.
28+
*/
29+
static inline bool same_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp1,
30+
struct rcu_gp_oldstate *rgosp2)
31+
{
32+
return rgosp1->rgos_norm == rgosp2->rgos_norm;
33+
}
34+
1735
unsigned long get_state_synchronize_rcu(void);
36+
37+
static inline void get_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp)
38+
{
39+
rgosp->rgos_norm = get_state_synchronize_rcu();
40+
}
41+
1842
unsigned long start_poll_synchronize_rcu(void);
43+
44+
static inline void start_poll_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp)
45+
{
46+
rgosp->rgos_norm = start_poll_synchronize_rcu();
47+
}
48+
1949
bool poll_state_synchronize_rcu(unsigned long oldstate);
2050

51+
static inline bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp)
52+
{
53+
return poll_state_synchronize_rcu(rgosp->rgos_norm);
54+
}
55+
2156
static inline void cond_synchronize_rcu(unsigned long oldstate)
2257
{
2358
might_sleep();
2459
}
2560

61+
static inline void cond_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp)
62+
{
63+
cond_synchronize_rcu(rgosp->rgos_norm);
64+
}
65+
2666
static inline unsigned long start_poll_synchronize_rcu_expedited(void)
2767
{
2868
return start_poll_synchronize_rcu();
2969
}
3070

71+
static inline void start_poll_synchronize_rcu_expedited_full(struct rcu_gp_oldstate *rgosp)
72+
{
73+
rgosp->rgos_norm = start_poll_synchronize_rcu_expedited();
74+
}
75+
3176
static inline void cond_synchronize_rcu_expedited(unsigned long oldstate)
3277
{
3378
cond_synchronize_rcu(oldstate);
3479
}
3580

81+
static inline void cond_synchronize_rcu_expedited_full(struct rcu_gp_oldstate *rgosp)
82+
{
83+
cond_synchronize_rcu_expedited(rgosp->rgos_norm);
84+
}
85+
3686
extern void rcu_barrier(void);
3787

3888
static inline void synchronize_rcu_expedited(void)

include/linux/rcutree.h

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,12 +40,52 @@ bool rcu_eqs_special_set(int cpu);
4040
void rcu_momentary_dyntick_idle(void);
4141
void kfree_rcu_scheduler_running(void);
4242
bool rcu_gp_might_be_stalled(void);
43+
44+
struct rcu_gp_oldstate {
45+
unsigned long rgos_norm;
46+
unsigned long rgos_exp;
47+
};
48+
49+
// Maximum number of rcu_gp_oldstate values corresponding to
50+
// not-yet-completed RCU grace periods.
51+
#define NUM_ACTIVE_RCU_POLL_FULL_OLDSTATE 4
52+
53+
/**
54+
* same_state_synchronize_rcu_full - Are two old-state values identical?
55+
* @rgosp1: First old-state value.
56+
* @rgosp2: Second old-state value.
57+
*
58+
* The two old-state values must have been obtained from either
59+
* get_state_synchronize_rcu_full(), start_poll_synchronize_rcu_full(),
60+
* or get_completed_synchronize_rcu_full(). Returns @true if the two
61+
* values are identical and @false otherwise. This allows structures
62+
* whose lifetimes are tracked by old-state values to push these values
63+
* to a list header, allowing those structures to be slightly smaller.
64+
*
65+
* Note that equality is judged on a bitwise basis, so that an
66+
* @rcu_gp_oldstate structure with an already-completed state in one field
67+
* will compare not-equal to a structure with an already-completed state
68+
* in the other field. After all, the @rcu_gp_oldstate structure is opaque
69+
* so how did such a situation come to pass in the first place?
70+
*/
71+
static inline bool same_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp1,
72+
struct rcu_gp_oldstate *rgosp2)
73+
{
74+
return rgosp1->rgos_norm == rgosp2->rgos_norm && rgosp1->rgos_exp == rgosp2->rgos_exp;
75+
}
76+
4377
unsigned long start_poll_synchronize_rcu_expedited(void);
78+
void start_poll_synchronize_rcu_expedited_full(struct rcu_gp_oldstate *rgosp);
4479
void cond_synchronize_rcu_expedited(unsigned long oldstate);
80+
void cond_synchronize_rcu_expedited_full(struct rcu_gp_oldstate *rgosp);
4581
unsigned long get_state_synchronize_rcu(void);
82+
void get_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp);
4683
unsigned long start_poll_synchronize_rcu(void);
84+
void start_poll_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp);
4785
bool poll_state_synchronize_rcu(unsigned long oldstate);
86+
bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp);
4887
void cond_synchronize_rcu(unsigned long oldstate);
88+
void cond_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp);
4989

5090
bool rcu_is_idle_cpu(int cpu);
5191

include/linux/srcutiny.h

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,10 @@
1515

1616
struct srcu_struct {
1717
short srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */
18-
unsigned short srcu_idx; /* Current reader array element in bit 0x2. */
19-
unsigned short srcu_idx_max; /* Furthest future srcu_idx request. */
2018
u8 srcu_gp_running; /* GP workqueue running? */
2119
u8 srcu_gp_waiting; /* GP waiting for readers? */
20+
unsigned long srcu_idx; /* Current reader array element in bit 0x2. */
21+
unsigned long srcu_idx_max; /* Furthest future srcu_idx request. */
2222
struct swait_queue_head srcu_wq;
2323
/* Last srcu_read_unlock() wakes GP. */
2424
struct rcu_head *srcu_cb_head; /* Pending callbacks: Head. */
@@ -82,10 +82,12 @@ static inline void srcu_torture_stats_print(struct srcu_struct *ssp,
8282
int idx;
8383

8484
idx = ((data_race(READ_ONCE(ssp->srcu_idx)) + 1) & 0x2) >> 1;
85-
pr_alert("%s%s Tiny SRCU per-CPU(idx=%d): (%hd,%hd)\n",
85+
pr_alert("%s%s Tiny SRCU per-CPU(idx=%d): (%hd,%hd) gp: %lu->%lu\n",
8686
tt, tf, idx,
8787
data_race(READ_ONCE(ssp->srcu_lock_nesting[!idx])),
88-
data_race(READ_ONCE(ssp->srcu_lock_nesting[idx])));
88+
data_race(READ_ONCE(ssp->srcu_lock_nesting[idx])),
89+
data_race(READ_ONCE(ssp->srcu_idx)),
90+
data_race(READ_ONCE(ssp->srcu_idx_max)));
8991
}
9092

9193
#endif

0 commit comments

Comments
 (0)