Skip to content

Commit 078838d

Browse files
committed
Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU changes from Ingo Molnar: "The main changes in this cycle were: - changes permitting use of call_rcu() and friends very early in boot, for example, before rcu_init() is invoked. - add in-kernel API to enable and disable expediting of normal RCU grace periods. - improve RCU's handling of (hotplug-) outgoing CPUs. - NO_HZ_FULL_SYSIDLE fixes. - tiny-RCU updates to make it more tiny. - documentation updates. - miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits) cpu: Provide smpboot_thread_init() on !CONFIG_SMP kernels as well cpu: Defer smpboot kthread unparking until CPU known to scheduler rcu: Associate quiescent-state reports with grace period rcu: Yet another fix for preemption and CPU hotplug rcu: Add diagnostics to grace-period cleanup rcutorture: Default to grace-period-initialization delays rcu: Handle outgoing CPUs on exit from idle loop cpu: Make CPU-offline idle-loop transition point more precise rcu: Eliminate ->onoff_mutex from rcu_node structure rcu: Process offlining and onlining only at grace-period start rcu: Move rcu_report_unblock_qs_rnp() to common code rcu: Rework preemptible expedited bitmask handling rcu: Remove event tracing from rcu_cpu_notify(), used by offline CPUs rcutorture: Enable slow grace-period initializations rcu: Provide diagnostic option to slow down grace-period initialization rcu: Detect stalls caused by failure to propagate up rcu_node tree rcu: Eliminate empty HOTPLUG_CPU ifdef rcu: Simplify sync_rcu_preempt_exp_init() rcu: Put all orphan-callback-related code under same comment rcu: Consolidate offline-CPU callback initialization ...
2 parents eeee78c + 590ee7d commit 078838d

File tree

31 files changed

+986
-440
lines changed

31 files changed

+986
-440
lines changed

Documentation/atomic_ops.txt

Lines changed: 23 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -201,11 +201,11 @@ These routines add 1 and subtract 1, respectively, from the given
201201
atomic_t and return the new counter value after the operation is
202202
performed.
203203

204-
Unlike the above routines, it is required that explicit memory
205-
barriers are performed before and after the operation. It must be
206-
done such that all memory operations before and after the atomic
207-
operation calls are strongly ordered with respect to the atomic
208-
operation itself.
204+
Unlike the above routines, it is required that these primitives
205+
include explicit memory barriers that are performed before and after
206+
the operation. It must be done such that all memory operations before
207+
and after the atomic operation calls are strongly ordered with respect
208+
to the atomic operation itself.
209209

210210
For example, it should behave as if a smp_mb() call existed both
211211
before and after the atomic operation.
@@ -233,21 +233,21 @@ These two routines increment and decrement by 1, respectively, the
233233
given atomic counter. They return a boolean indicating whether the
234234
resulting counter value was zero or not.
235235

236-
It requires explicit memory barrier semantics around the operation as
237-
above.
236+
Again, these primitives provide explicit memory barrier semantics around
237+
the atomic operation.
238238

239239
int atomic_sub_and_test(int i, atomic_t *v);
240240

241241
This is identical to atomic_dec_and_test() except that an explicit
242-
decrement is given instead of the implicit "1". It requires explicit
243-
memory barrier semantics around the operation.
242+
decrement is given instead of the implicit "1". This primitive must
243+
provide explicit memory barrier semantics around the operation.
244244

245245
int atomic_add_negative(int i, atomic_t *v);
246246

247-
The given increment is added to the given atomic counter value. A
248-
boolean is return which indicates whether the resulting counter value
249-
is negative. It requires explicit memory barrier semantics around the
250-
operation.
247+
The given increment is added to the given atomic counter value. A boolean
248+
is return which indicates whether the resulting counter value is negative.
249+
This primitive must provide explicit memory barrier semantics around
250+
the operation.
251251

252252
Then:
253253

@@ -257,7 +257,7 @@ This performs an atomic exchange operation on the atomic variable v, setting
257257
the given new value. It returns the old value that the atomic variable v had
258258
just before the operation.
259259

260-
atomic_xchg requires explicit memory barriers around the operation.
260+
atomic_xchg must provide explicit memory barriers around the operation.
261261

262262
int atomic_cmpxchg(atomic_t *v, int old, int new);
263263

@@ -266,7 +266,7 @@ with the given old and new values. Like all atomic_xxx operations,
266266
atomic_cmpxchg will only satisfy its atomicity semantics as long as all
267267
other accesses of *v are performed through atomic_xxx operations.
268268

269-
atomic_cmpxchg requires explicit memory barriers around the operation.
269+
atomic_cmpxchg must provide explicit memory barriers around the operation.
270270

271271
The semantics for atomic_cmpxchg are the same as those defined for 'cas'
272272
below.
@@ -279,8 +279,8 @@ If the atomic value v is not equal to u, this function adds a to v, and
279279
returns non zero. If v is equal to u then it returns zero. This is done as
280280
an atomic operation.
281281

282-
atomic_add_unless requires explicit memory barriers around the operation
283-
unless it fails (returns 0).
282+
atomic_add_unless must provide explicit memory barriers around the
283+
operation unless it fails (returns 0).
284284

285285
atomic_inc_not_zero, equivalent to atomic_add_unless(v, 1, 0)
286286

@@ -460,9 +460,9 @@ the return value into an int. There are other places where things
460460
like this occur as well.
461461

462462
These routines, like the atomic_t counter operations returning values,
463-
require explicit memory barrier semantics around their execution. All
464-
memory operations before the atomic bit operation call must be made
465-
visible globally before the atomic bit operation is made visible.
463+
must provide explicit memory barrier semantics around their execution.
464+
All memory operations before the atomic bit operation call must be
465+
made visible globally before the atomic bit operation is made visible.
466466
Likewise, the atomic bit operation must be visible globally before any
467467
subsequent memory operation is made visible. For example:
468468

@@ -536,8 +536,9 @@ except that two underscores are prefixed to the interface name.
536536
These non-atomic variants also do not require any special memory
537537
barrier semantics.
538538

539-
The routines xchg() and cmpxchg() need the same exact memory barriers
540-
as the atomic and bit operations returning values.
539+
The routines xchg() and cmpxchg() must provide the same exact
540+
memory-barrier semantics as the atomic and bit operations returning
541+
values.
541542

542543
Spinlocks and rwlocks have memory barrier expectations as well.
543544
The rule to follow is simple:

Documentation/kernel-parameters.txt

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2969,6 +2969,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
29692969
Set maximum number of finished RCU callbacks to
29702970
process in one batch.
29712971

2972+
rcutree.gp_init_delay= [KNL]
2973+
Set the number of jiffies to delay each step of
2974+
RCU grace-period initialization. This only has
2975+
effect when CONFIG_RCU_TORTURE_TEST_SLOW_INIT is
2976+
set.
2977+
29722978
rcutree.rcu_fanout_leaf= [KNL]
29732979
Increase the number of CPUs assigned to each
29742980
leaf rcu_node structure. Useful for very large
@@ -2992,11 +2998,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
29922998
value is one, and maximum value is HZ.
29932999

29943000
rcutree.kthread_prio= [KNL,BOOT]
2995-
Set the SCHED_FIFO priority of the RCU
2996-
per-CPU kthreads (rcuc/N). This value is also
2997-
used for the priority of the RCU boost threads
2998-
(rcub/N). Valid values are 1-99 and the default
2999-
is 1 (the least-favored priority).
3001+
Set the SCHED_FIFO priority of the RCU per-CPU
3002+
kthreads (rcuc/N). This value is also used for
3003+
the priority of the RCU boost threads (rcub/N)
3004+
and for the RCU grace-period kthreads (rcu_bh,
3005+
rcu_preempt, and rcu_sched). If RCU_BOOST is
3006+
set, valid values are 1-99 and the default is 1
3007+
(the least-favored priority). Otherwise, when
3008+
RCU_BOOST is not set, valid values are 0-99 and
3009+
the default is zero (non-realtime operation).
30003010

30013011
rcutree.rcu_nocb_leader_stride= [KNL]
30023012
Set the number of NOCB kthread groups, which

Documentation/kernel-per-CPU-kthreads.txt

Lines changed: 21 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -190,33 +190,37 @@ To reduce its OS jitter, do any of the following:
190190
on each CPU, including cs_dbs_timer() and od_dbs_timer().
191191
WARNING: Please check your CPU specifications to
192192
make sure that this is safe on your particular system.
193-
d. It is not possible to entirely get rid of OS jitter
194-
from vmstat_update() on CONFIG_SMP=y systems, but you
195-
can decrease its frequency by writing a large value
196-
to /proc/sys/vm/stat_interval. The default value is
197-
HZ, for an interval of one second. Of course, larger
198-
values will make your virtual-memory statistics update
199-
more slowly. Of course, you can also run your workload
200-
at a real-time priority, thus preempting vmstat_update(),
193+
d. As of v3.18, Christoph Lameter's on-demand vmstat workers
194+
commit prevents OS jitter due to vmstat_update() on
195+
CONFIG_SMP=y systems. Before v3.18, is not possible
196+
to entirely get rid of the OS jitter, but you can
197+
decrease its frequency by writing a large value to
198+
/proc/sys/vm/stat_interval. The default value is HZ,
199+
for an interval of one second. Of course, larger values
200+
will make your virtual-memory statistics update more
201+
slowly. Of course, you can also run your workload at
202+
a real-time priority, thus preempting vmstat_update(),
201203
but if your workload is CPU-bound, this is a bad idea.
202204
However, there is an RFC patch from Christoph Lameter
203205
(based on an earlier one from Gilad Ben-Yossef) that
204206
reduces or even eliminates vmstat overhead for some
205207
workloads at https://lkml.org/lkml/2013/9/4/379.
206-
e. If running on high-end powerpc servers, build with
208+
e. Boot with "elevator=noop" to avoid workqueue use by
209+
the block layer.
210+
f. If running on high-end powerpc servers, build with
207211
CONFIG_PPC_RTAS_DAEMON=n. This prevents the RTAS
208212
daemon from running on each CPU every second or so.
209213
(This will require editing Kconfig files and will defeat
210214
this platform's RAS functionality.) This avoids jitter
211215
due to the rtas_event_scan() function.
212216
WARNING: Please check your CPU specifications to
213217
make sure that this is safe on your particular system.
214-
f. If running on Cell Processor, build your kernel with
218+
g. If running on Cell Processor, build your kernel with
215219
CBE_CPUFREQ_SPU_GOVERNOR=n to avoid OS jitter from
216220
spu_gov_work().
217221
WARNING: Please check your CPU specifications to
218222
make sure that this is safe on your particular system.
219-
g. If running on PowerMAC, build your kernel with
223+
h. If running on PowerMAC, build your kernel with
220224
CONFIG_PMAC_RACKMETER=n to disable the CPU-meter,
221225
avoiding OS jitter from rackmeter_do_timer().
222226

@@ -258,8 +262,12 @@ Purpose: Detect software lockups on each CPU.
258262
To reduce its OS jitter, do at least one of the following:
259263
1. Build with CONFIG_LOCKUP_DETECTOR=n, which will prevent these
260264
kthreads from being created in the first place.
261-
2. Echo a zero to /proc/sys/kernel/watchdog to disable the
265+
2. Boot with "nosoftlockup=0", which will also prevent these kthreads
266+
from being created. Other related watchdog and softlockup boot
267+
parameters may be found in Documentation/kernel-parameters.txt
268+
and Documentation/watchdog/watchdog-parameters.txt.
269+
3. Echo a zero to /proc/sys/kernel/watchdog to disable the
262270
watchdog timer.
263-
3. Echo a large number of /proc/sys/kernel/watchdog_thresh in
271+
4. Echo a large number of /proc/sys/kernel/watchdog_thresh in
264272
order to reduce the frequency of OS jitter due to the watchdog
265273
timer down to a level that is acceptable for your workload.

Documentation/memory-barriers.txt

Lines changed: 29 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -592,9 +592,9 @@ See also the subsection on "Cache Coherency" for a more thorough example.
592592
CONTROL DEPENDENCIES
593593
--------------------
594594

595-
A control dependency requires a full read memory barrier, not simply a data
596-
dependency barrier to make it work correctly. Consider the following bit of
597-
code:
595+
A load-load control dependency requires a full read memory barrier, not
596+
simply a data dependency barrier to make it work correctly. Consider the
597+
following bit of code:
598598

599599
q = ACCESS_ONCE(a);
600600
if (q) {
@@ -615,14 +615,15 @@ case what's actually required is:
615615
}
616616

617617
However, stores are not speculated. This means that ordering -is- provided
618-
in the following example:
618+
for load-store control dependencies, as in the following example:
619619

620620
q = ACCESS_ONCE(a);
621621
if (q) {
622622
ACCESS_ONCE(b) = p;
623623
}
624624

625-
Please note that ACCESS_ONCE() is not optional! Without the
625+
Control dependencies pair normally with other types of barriers.
626+
That said, please note that ACCESS_ONCE() is not optional! Without the
626627
ACCESS_ONCE(), might combine the load from 'a' with other loads from
627628
'a', and the store to 'b' with other stores to 'b', with possible highly
628629
counterintuitive effects on ordering.
@@ -813,6 +814,8 @@ In summary:
813814
barrier() can help to preserve your control dependency. Please
814815
see the Compiler Barrier section for more information.
815816

817+
(*) Control dependencies pair normally with other types of barriers.
818+
816819
(*) Control dependencies do -not- provide transitivity. If you
817820
need transitivity, use smp_mb().
818821

@@ -823,14 +826,14 @@ SMP BARRIER PAIRING
823826
When dealing with CPU-CPU interactions, certain types of memory barrier should
824827
always be paired. A lack of appropriate pairing is almost certainly an error.
825828

826-
General barriers pair with each other, though they also pair with
827-
most other types of barriers, albeit without transitivity. An acquire
828-
barrier pairs with a release barrier, but both may also pair with other
829-
barriers, including of course general barriers. A write barrier pairs
830-
with a data dependency barrier, an acquire barrier, a release barrier,
831-
a read barrier, or a general barrier. Similarly a read barrier or a
832-
data dependency barrier pairs with a write barrier, an acquire barrier,
833-
a release barrier, or a general barrier:
829+
General barriers pair with each other, though they also pair with most
830+
other types of barriers, albeit without transitivity. An acquire barrier
831+
pairs with a release barrier, but both may also pair with other barriers,
832+
including of course general barriers. A write barrier pairs with a data
833+
dependency barrier, a control dependency, an acquire barrier, a release
834+
barrier, a read barrier, or a general barrier. Similarly a read barrier,
835+
control dependency, or a data dependency barrier pairs with a write
836+
barrier, an acquire barrier, a release barrier, or a general barrier:
834837

835838
CPU 1 CPU 2
836839
=============== ===============
@@ -850,6 +853,19 @@ Or:
850853
<data dependency barrier>
851854
y = *x;
852855

856+
Or even:
857+
858+
CPU 1 CPU 2
859+
=============== ===============================
860+
r1 = ACCESS_ONCE(y);
861+
<general barrier>
862+
ACCESS_ONCE(y) = 1; if (r2 = ACCESS_ONCE(x)) {
863+
<implicit control dependency>
864+
ACCESS_ONCE(y) = 1;
865+
}
866+
867+
assert(r1 == 0 || r2 == 0);
868+
853869
Basically, the read barrier always has to be there, even though it can be of
854870
the "weaker" type.
855871

Documentation/timers/NO_HZ.txt

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -158,13 +158,9 @@ not come for free:
158158
to the need to inform kernel subsystems (such as RCU) about
159159
the change in mode.
160160

161-
3. POSIX CPU timers on adaptive-tick CPUs may miss their deadlines
162-
(perhaps indefinitely) because they currently rely on
163-
scheduling-tick interrupts. This will likely be fixed in
164-
one of two ways: (1) Prevent CPUs with POSIX CPU timers from
165-
entering adaptive-tick mode, or (2) Use hrtimers or other
166-
adaptive-ticks-immune mechanism to cause the POSIX CPU timer to
167-
fire properly.
161+
3. POSIX CPU timers prevent CPUs from entering adaptive-tick mode.
162+
Real-time applications needing to take actions based on CPU time
163+
consumption need to use other means of doing so.
168164

169165
4. If there are more perf events pending than the hardware can
170166
accommodate, they are normally round-robined so as to collect

arch/blackfin/mach-common/smp.c

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -413,16 +413,14 @@ int __cpu_disable(void)
413413
return 0;
414414
}
415415

416-
static DECLARE_COMPLETION(cpu_killed);
417-
418416
int __cpu_die(unsigned int cpu)
419417
{
420-
return wait_for_completion_timeout(&cpu_killed, 5000);
418+
return cpu_wait_death(cpu, 5);
421419
}
422420

423421
void cpu_die(void)
424422
{
425-
complete(&cpu_killed);
423+
(void)cpu_report_death();
426424

427425
atomic_dec(&init_mm.mm_users);
428426
atomic_dec(&init_mm.mm_count);

arch/metag/kernel/smp.c

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -261,7 +261,6 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
261261
}
262262

263263
#ifdef CONFIG_HOTPLUG_CPU
264-
static DECLARE_COMPLETION(cpu_killed);
265264

266265
/*
267266
* __cpu_disable runs on the processor to be shutdown.
@@ -299,7 +298,7 @@ int __cpu_disable(void)
299298
*/
300299
void __cpu_die(unsigned int cpu)
301300
{
302-
if (!wait_for_completion_timeout(&cpu_killed, msecs_to_jiffies(1)))
301+
if (!cpu_wait_death(cpu, 1))
303302
pr_err("CPU%u: unable to kill\n", cpu);
304303
}
305304

@@ -314,7 +313,7 @@ void cpu_die(void)
314313
local_irq_disable();
315314
idle_task_exit();
316315

317-
complete(&cpu_killed);
316+
(void)cpu_report_death();
318317

319318
asm ("XOR TXENABLE, D0Re0,D0Re0\n");
320319
}

arch/x86/include/asm/cpu.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,6 @@ extern int _debug_hotplug_cpu(int cpu, int action);
3434
#endif
3535
#endif
3636

37-
DECLARE_PER_CPU(int, cpu_state);
38-
3937
int mwait_usable(const struct cpuinfo_x86 *);
4038

4139
#endif /* _ASM_X86_CPU_H */

arch/x86/include/asm/smp.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -150,13 +150,13 @@ static inline void arch_send_call_function_ipi_mask(const struct cpumask *mask)
150150
}
151151

152152
void cpu_disable_common(void);
153-
void cpu_die_common(unsigned int cpu);
154153
void native_smp_prepare_boot_cpu(void);
155154
void native_smp_prepare_cpus(unsigned int max_cpus);
156155
void native_smp_cpus_done(unsigned int max_cpus);
157156
void common_cpu_up(unsigned int cpunum, struct task_struct *tidle);
158157
int native_cpu_up(unsigned int cpunum, struct task_struct *tidle);
159158
int native_cpu_disable(void);
159+
int common_cpu_die(unsigned int cpu);
160160
void native_cpu_die(unsigned int cpu);
161161
void native_play_dead(void);
162162
void play_dead_common(void);

0 commit comments

Comments
 (0)