Skip to content

Commit 6b5f04b

Browse files
committed
Merge branch 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo: "cgroup changes for v4.6-rc1. No userland visible behavior changes in this pull request. I'll send out a separate pull request for the addition of cgroup namespace support. - The biggest change is the revamping of cgroup core task migration and controller handling logic. There are quite a few places where controllers and tasks are manipulated. Previously, many of those places implemented custom operations for each specific use case assuming specific starting conditions. While this worked, it makes the code fragile and difficult to follow. The bulk of this pull request restructures these operations so that most related operations are performed through common helpers which implement recursive (subtrees are always processed consistently) and idempotent (they make cgroup hierarchy converge to the target state rather than performing operations assuming specific starting conditions). This makes the code a lot easier to understand, verify and extend. - Implicit controller support is added. This is primarily for using perf_event on the v2 hierarchy so that perf can match cgroup v2 path without requiring the user to do anything special. The kernel portion of perf_event changes is acked but userland changes are still pending review. - cgroup_no_v1= boot parameter added to ease testing cgroup v2 in certain environments. - There is a regression introduced during v4.4 devel cycle where attempts to migrate zombie tasks can mess up internal object management. This was fixed earlier this week and included in this pull request w/ stable cc'd. - Misc non-critical fixes and improvements" * 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (44 commits) cgroup: avoid false positive gcc-6 warning cgroup: ignore css_sets associated with dead cgroups during migration Documentation: cgroup v2: Trivial heading correction. cgroup: implement cgroup_subsys->implicit_on_dfl cgroup: use css_set->mg_dst_cgrp for the migration target cgroup cgroup: make cgroup[_taskset]_migrate() take cgroup_root instead of cgroup cgroup: move migration destination verification out of cgroup_migrate_prepare_dst() cgroup: fix incorrect destination cgroup in cgroup_update_dfl_csses() cgroup: Trivial correction to reflect controller. cgroup: remove stale item in cgroup-v1 document INDEX file. cgroup: update css iteration in cgroup_update_dfl_csses() cgroup: allocate 2x cgrp_cset_links when setting up a new root cgroup: make cgroup_calc_subtree_ss_mask() take @this_ss_mask cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends cgroup: use cgroup_apply_enable_control() in cgroup creation path cgroup: combine cgroup_mutex locking and offline css draining cgroup: factor out cgroup_{apply|finalize}_control() from cgroup_subtree_control_write() cgroup: introduce cgroup_{save|propagate|restore}_control() cgroup: make cgroup_drain_offline() and cgroup_apply_control_{disable|enable}() recursive cgroup: factor out cgroup_apply_control_enable() from cgroup_subtree_control_write() ...
2 parents fcab86a + cfe02a8 commit 6b5f04b

File tree

10 files changed

+738
-504
lines changed

10 files changed

+738
-504
lines changed

Documentation/cgroup-v1/00-INDEX

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,5 +24,3 @@ net_prio.txt
2424
- Network priority cgroups details and usages.
2525
pids.txt
2626
- Process number cgroups details and usages.
27-
unified-hierarchy.txt
28-
- Description the new/next cgroup interface.

Documentation/cgroup-v2.txt

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,12 @@ strongly discouraged for production use. It is recommended to decide
132132
the hierarchies and controller associations before starting using the
133133
controllers after system boot.
134134

135+
During transition to v2, system management software might still
136+
automount the v1 cgroup filesystem and so hijack all controllers
137+
during boot, before manual intervention is possible. To make testing
138+
and experimenting easier, the kernel parameter cgroup_no_v1= allows
139+
disabling controllers in v1 and make them always available in v2.
140+
135141

136142
2-2. Organizing Processes
137143

@@ -915,7 +921,7 @@ PAGE_SIZE multiple when read back.
915921
limit, anonymous meomry of the cgroup will not be swapped out.
916922

917923

918-
5-2-2. General Usage
924+
5-2-2. Usage Guidelines
919925

920926
"memory.high" is the main mechanism to control memory usage.
921927
Over-committing on high limit (sum of high limits > available memory)

Documentation/kernel-parameters.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -614,6 +614,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
614614
cut the overhead, others just disable the usage. So
615615
only cgroup_disable=memory is actually worthy}
616616

617+
cgroup_no_v1= [KNL] Disable one, multiple, all cgroup controllers in v1
618+
Format: { controller[,controller...] | "all" }
619+
Like cgroup_disable, but only applies to cgroup v1;
620+
the blacklisted controllers remain available in cgroup2.
621+
617622
cgroup.memory= [KNL] Pass options to the cgroup memory controller.
618623
Format: <string>
619624
nosocket -- Disable socket memory accounting.

include/linux/cgroup-defs.h

Lines changed: 32 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ enum {
4545
CSS_NO_REF = (1 << 0), /* no reference counting for this css */
4646
CSS_ONLINE = (1 << 1), /* between ->css_online() and ->css_offline() */
4747
CSS_RELEASED = (1 << 2), /* refcnt reached zero, released */
48+
CSS_VISIBLE = (1 << 3), /* css is visible to userland */
4849
};
4950

5051
/* bits in struct cgroup flags field */
@@ -190,12 +191,13 @@ struct css_set {
190191

191192
/*
192193
* If this cset is acting as the source of migration the following
193-
* two fields are set. mg_src_cgrp is the source cgroup of the
194-
* on-going migration and mg_dst_cset is the destination cset the
195-
* target tasks on this cset should be migrated to. Protected by
196-
* cgroup_mutex.
194+
* two fields are set. mg_src_cgrp and mg_dst_cgrp are
195+
* respectively the source and destination cgroups of the on-going
196+
* migration. mg_dst_cset is the destination cset the target tasks
197+
* on this cset should be migrated to. Protected by cgroup_mutex.
197198
*/
198199
struct cgroup *mg_src_cgrp;
200+
struct cgroup *mg_dst_cgrp;
199201
struct css_set *mg_dst_cset;
200202

201203
/*
@@ -210,6 +212,9 @@ struct css_set {
210212
/* all css_task_iters currently walking this cset */
211213
struct list_head task_iters;
212214

215+
/* dead and being drained, ignore for migration */
216+
bool dead;
217+
213218
/* For RCU-protected deletion */
214219
struct rcu_head rcu_head;
215220
};
@@ -253,13 +258,14 @@ struct cgroup {
253258
/*
254259
* The bitmask of subsystems enabled on the child cgroups.
255260
* ->subtree_control is the one configured through
256-
* "cgroup.subtree_control" while ->child_subsys_mask is the
257-
* effective one which may have more subsystems enabled.
258-
* Controller knobs are made available iff it's enabled in
259-
* ->subtree_control.
261+
* "cgroup.subtree_control" while ->child_ss_mask is the effective
262+
* one which may have more subsystems enabled. Controller knobs
263+
* are made available iff it's enabled in ->subtree_control.
260264
*/
261-
unsigned int subtree_control;
262-
unsigned int child_subsys_mask;
265+
u16 subtree_control;
266+
u16 subtree_ss_mask;
267+
u16 old_subtree_control;
268+
u16 old_subtree_ss_mask;
263269

264270
/* Private pointers for each registered subsystem */
265271
struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT];
@@ -434,7 +440,6 @@ struct cgroup_subsys {
434440
void (*css_released)(struct cgroup_subsys_state *css);
435441
void (*css_free)(struct cgroup_subsys_state *css);
436442
void (*css_reset)(struct cgroup_subsys_state *css);
437-
void (*css_e_css_changed)(struct cgroup_subsys_state *css);
438443

439444
int (*can_attach)(struct cgroup_taskset *tset);
440445
void (*cancel_attach)(struct cgroup_taskset *tset);
@@ -446,7 +451,20 @@ struct cgroup_subsys {
446451
void (*free)(struct task_struct *task);
447452
void (*bind)(struct cgroup_subsys_state *root_css);
448453

449-
int early_init;
454+
bool early_init:1;
455+
456+
/*
457+
* If %true, the controller, on the default hierarchy, doesn't show
458+
* up in "cgroup.controllers" or "cgroup.subtree_control", is
459+
* implicitly enabled on all cgroups on the default hierarchy, and
460+
* bypasses the "no internal process" constraint. This is for
461+
* utility type controllers which is transparent to userland.
462+
*
463+
* An implicit controller can be stolen from the default hierarchy
464+
* anytime and thus must be okay with offline csses from previous
465+
* hierarchies coexisting with csses for the current one.
466+
*/
467+
bool implicit_on_dfl:1;
450468

451469
/*
452470
* If %false, this subsystem is properly hierarchical -
@@ -460,8 +478,8 @@ struct cgroup_subsys {
460478
* cases. Eventually, all subsystems will be made properly
461479
* hierarchical and this will go away.
462480
*/
463-
bool broken_hierarchy;
464-
bool warned_broken_hierarchy;
481+
bool broken_hierarchy:1;
482+
bool warned_broken_hierarchy:1;
465483

466484
/* the following two fields are initialized automtically during boot */
467485
int id;

init/Kconfig

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1047,10 +1047,10 @@ config CGROUP_PIDS
10471047
is fairly trivial to reach PID exhaustion before you reach even a
10481048
conservative kmemcg limit. As a result, it is possible to grind a
10491049
system to halt without being limited by other cgroup policies. The
1050-
PIDs cgroup subsystem is designed to stop this from happening.
1050+
PIDs controller is designed to stop this from happening.
10511051

10521052
It should be noted that organisational operations (such as attaching
1053-
to a cgroup hierarchy will *not* be blocked by the PIDs subsystem),
1053+
to a cgroup hierarchy will *not* be blocked by the PIDs controller),
10541054
since the PIDs limit only affects a process's ability to fork, not to
10551055
attach to a cgroup.
10561056

kernel/Makefile

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,7 @@ obj-y = fork.o exec_domain.o panic.o \
1414
obj-$(CONFIG_MULTIUSER) += groups.o
1515

1616
ifdef CONFIG_FUNCTION_TRACER
17-
# Do not trace debug files and internal ftrace files
18-
CFLAGS_REMOVE_cgroup-debug.o = $(CC_FLAGS_FTRACE)
17+
# Do not trace internal ftrace files
1918
CFLAGS_REMOVE_irq_work.o = $(CC_FLAGS_FTRACE)
2019
endif
2120

0 commit comments

Comments
 (0)