Skip to content

Commit adf4bfc

Browse files
committed
Merge tag 'cgroup-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo: - cpuset now support isolated cpus.partition type, which will enable dynamic CPU isolation - pids.peak added to remember the max number of pids used - holes in cgroup namespace plugged - internal cleanups * tag 'cgroup-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (25 commits) cgroup: use strscpy() is more robust and safer iocost_monitor: reorder BlkgIterator cgroup: simplify code in cgroup_apply_control cgroup: Make cgroup_get_from_id() prettier cgroup/cpuset: remove unreachable code cgroup: Remove CFTYPE_PRESSURE cgroup: Improve cftype add/rm error handling kselftest/cgroup: Add cpuset v2 partition root state test cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst cgroup/cpuset: Make partition invalid if cpumask change violates exclusivity rule cgroup/cpuset: Relocate a code block in validate_change() cgroup/cpuset: Show invalid partition reason string cgroup/cpuset: Add a new isolated cpus.partition type cgroup/cpuset: Relax constraints to partition & cpus changes cgroup/cpuset: Allow no-task partition to have empty cpuset.cpus.effective cgroup/cpuset: Miscellaneous cleanups & add helper functions cgroup/cpuset: Enable update_tasks_cpumask() on top_cpuset cgroup: add pids.peak interface for pids controller cgroup: Remove data-race around cgrp_dfl_visible cgroup: Fix build failure when CONFIG_SHRINKER_DEBUG ...
2 parents 8adc048 + 8619e94 commit adf4bfc

File tree

17 files changed

+1544
-444
lines changed

17 files changed

+1544
-444
lines changed

Documentation/admin-guide/cgroup-v2.rst

Lines changed: 87 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -2190,75 +2190,93 @@ Cpuset Interface Files
21902190

21912191
It accepts only the following input values when written to.
21922192

2193-
======== ================================
2194-
"root" a partition root
2195-
"member" a non-root member of a partition
2196-
======== ================================
2197-
2198-
When set to be a partition root, the current cgroup is the
2199-
root of a new partition or scheduling domain that comprises
2200-
itself and all its descendants except those that are separate
2201-
partition roots themselves and their descendants. The root
2202-
cgroup is always a partition root.
2203-
2204-
There are constraints on where a partition root can be set.
2205-
It can only be set in a cgroup if all the following conditions
2206-
are true.
2207-
2208-
1) The "cpuset.cpus" is not empty and the list of CPUs are
2209-
exclusive, i.e. they are not shared by any of its siblings.
2210-
2) The parent cgroup is a partition root.
2211-
3) The "cpuset.cpus" is also a proper subset of the parent's
2212-
"cpuset.cpus.effective".
2213-
4) There is no child cgroups with cpuset enabled. This is for
2214-
eliminating corner cases that have to be handled if such a
2215-
condition is allowed.
2216-
2217-
Setting it to partition root will take the CPUs away from the
2218-
effective CPUs of the parent cgroup. Once it is set, this
2219-
file cannot be reverted back to "member" if there are any child
2220-
cgroups with cpuset enabled.
2221-
2222-
A parent partition cannot distribute all its CPUs to its
2223-
child partitions. There must be at least one cpu left in the
2224-
parent partition.
2225-
2226-
Once becoming a partition root, changes to "cpuset.cpus" is
2227-
generally allowed as long as the first condition above is true,
2228-
the change will not take away all the CPUs from the parent
2229-
partition and the new "cpuset.cpus" value is a superset of its
2230-
children's "cpuset.cpus" values.
2231-
2232-
Sometimes, external factors like changes to ancestors'
2233-
"cpuset.cpus" or cpu hotplug can cause the state of the partition
2234-
root to change. On read, the "cpuset.sched.partition" file
2235-
can show the following values.
2236-
2237-
============== ==============================
2238-
"member" Non-root member of a partition
2239-
"root" Partition root
2240-
"root invalid" Invalid partition root
2241-
============== ==============================
2242-
2243-
It is a partition root if the first 2 partition root conditions
2244-
above are true and at least one CPU from "cpuset.cpus" is
2245-
granted by the parent cgroup.
2246-
2247-
A partition root can become invalid if none of CPUs requested
2248-
in "cpuset.cpus" can be granted by the parent cgroup or the
2249-
parent cgroup is no longer a partition root itself. In this
2250-
case, it is not a real partition even though the restriction
2251-
of the first partition root condition above will still apply.
2252-
The cpu affinity of all the tasks in the cgroup will then be
2253-
associated with CPUs in the nearest ancestor partition.
2254-
2255-
An invalid partition root can be transitioned back to a
2256-
real partition root if at least one of the requested CPUs
2257-
can now be granted by its parent. In this case, the cpu
2258-
affinity of all the tasks in the formerly invalid partition
2259-
will be associated to the CPUs of the newly formed partition.
2260-
Changing the partition state of an invalid partition root to
2261-
"member" is always allowed even if child cpusets are present.
2193+
========== =====================================
2194+
"member" Non-root member of a partition
2195+
"root" Partition root
2196+
"isolated" Partition root without load balancing
2197+
========== =====================================
2198+
2199+
The root cgroup is always a partition root and its state
2200+
cannot be changed. All other non-root cgroups start out as
2201+
"member".
2202+
2203+
When set to "root", the current cgroup is the root of a new
2204+
partition or scheduling domain that comprises itself and all
2205+
its descendants except those that are separate partition roots
2206+
themselves and their descendants.
2207+
2208+
When set to "isolated", the CPUs in that partition root will
2209+
be in an isolated state without any load balancing from the
2210+
scheduler. Tasks placed in such a partition with multiple
2211+
CPUs should be carefully distributed and bound to each of the
2212+
individual CPUs for optimal performance.
2213+
2214+
The value shown in "cpuset.cpus.effective" of a partition root
2215+
is the CPUs that the partition root can dedicate to a potential
2216+
new child partition root. The new child subtracts available
2217+
CPUs from its parent "cpuset.cpus.effective".
2218+
2219+
A partition root ("root" or "isolated") can be in one of the
2220+
two possible states - valid or invalid. An invalid partition
2221+
root is in a degraded state where some state information may
2222+
be retained, but behaves more like a "member".
2223+
2224+
All possible state transitions among "member", "root" and
2225+
"isolated" are allowed.
2226+
2227+
On read, the "cpuset.cpus.partition" file can show the following
2228+
values.
2229+
2230+
============================= =====================================
2231+
"member" Non-root member of a partition
2232+
"root" Partition root
2233+
"isolated" Partition root without load balancing
2234+
"root invalid (<reason>)" Invalid partition root
2235+
"isolated invalid (<reason>)" Invalid isolated partition root
2236+
============================= =====================================
2237+
2238+
In the case of an invalid partition root, a descriptive string on
2239+
why the partition is invalid is included within parentheses.
2240+
2241+
For a partition root to become valid, the following conditions
2242+
must be met.
2243+
2244+
1) The "cpuset.cpus" is exclusive with its siblings , i.e. they
2245+
are not shared by any of its siblings (exclusivity rule).
2246+
2) The parent cgroup is a valid partition root.
2247+
3) The "cpuset.cpus" is not empty and must contain at least
2248+
one of the CPUs from parent's "cpuset.cpus", i.e. they overlap.
2249+
4) The "cpuset.cpus.effective" cannot be empty unless there is
2250+
no task associated with this partition.
2251+
2252+
External events like hotplug or changes to "cpuset.cpus" can
2253+
cause a valid partition root to become invalid and vice versa.
2254+
Note that a task cannot be moved to a cgroup with empty
2255+
"cpuset.cpus.effective".
2256+
2257+
For a valid partition root with the sibling cpu exclusivity
2258+
rule enabled, changes made to "cpuset.cpus" that violate the
2259+
exclusivity rule will invalidate the partition as well as its
2260+
sibiling partitions with conflicting cpuset.cpus values. So
2261+
care must be taking in changing "cpuset.cpus".
2262+
2263+
A valid non-root parent partition may distribute out all its CPUs
2264+
to its child partitions when there is no task associated with it.
2265+
2266+
Care must be taken to change a valid partition root to
2267+
"member" as all its child partitions, if present, will become
2268+
invalid causing disruption to tasks running in those child
2269+
partitions. These inactivated partitions could be recovered if
2270+
their parent is switched back to a partition root with a proper
2271+
set of "cpuset.cpus".
2272+
2273+
Poll and inotify events are triggered whenever the state of
2274+
"cpuset.cpus.partition" changes. That includes changes caused
2275+
by write to "cpuset.cpus.partition", cpu hotplug or other
2276+
changes that modify the validity status of the partition.
2277+
This will allow user space agents to monitor unexpected changes
2278+
to "cpuset.cpus.partition" without the need to do continuous
2279+
polling.
22622280

22632281

22642282
Device controller

block/blk-cgroup-fc-appid.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,8 @@ int blkcg_set_fc_appid(char *app_id, u64 cgrp_id, size_t app_id_len)
1919
return -EINVAL;
2020

2121
cgrp = cgroup_get_from_id(cgrp_id);
22-
if (!cgrp)
23-
return -ENOENT;
22+
if (IS_ERR(cgrp))
23+
return PTR_ERR(cgrp);
2424
css = cgroup_get_e_css(cgrp, &io_cgrp_subsys);
2525
if (!css) {
2626
ret = -ENOENT;

include/linux/cgroup-defs.h

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -126,11 +126,11 @@ enum {
126126
CFTYPE_NO_PREFIX = (1 << 3), /* (DON'T USE FOR NEW FILES) no subsys prefix */
127127
CFTYPE_WORLD_WRITABLE = (1 << 4), /* (DON'T USE FOR NEW FILES) S_IWUGO */
128128
CFTYPE_DEBUG = (1 << 5), /* create when cgroup_debug */
129-
CFTYPE_PRESSURE = (1 << 6), /* only if pressure feature is enabled */
130129

131130
/* internal flags, do not use outside cgroup core proper */
132131
__CFTYPE_ONLY_ON_DFL = (1 << 16), /* only on default hierarchy */
133132
__CFTYPE_NOT_ON_DFL = (1 << 17), /* not on default hierarchy */
133+
__CFTYPE_ADDED = (1 << 18),
134134
};
135135

136136
/*
@@ -384,7 +384,7 @@ struct cgroup {
384384
/*
385385
* The depth this cgroup is at. The root is at depth zero and each
386386
* step down the hierarchy increments the level. This along with
387-
* ancestor_ids[] can determine whether a given cgroup is a
387+
* ancestors[] can determine whether a given cgroup is a
388388
* descendant of another without traversing the hierarchy.
389389
*/
390390
int level;
@@ -504,8 +504,8 @@ struct cgroup {
504504
/* Used to store internal freezer state */
505505
struct cgroup_freezer_state freezer;
506506

507-
/* ids of the ancestors at each level including self */
508-
u64 ancestor_ids[];
507+
/* All ancestors including self */
508+
struct cgroup *ancestors[];
509509
};
510510

511511
/*
@@ -522,11 +522,15 @@ struct cgroup_root {
522522
/* Unique id for this hierarchy. */
523523
int hierarchy_id;
524524

525-
/* The root cgroup. Root is destroyed on its release. */
525+
/*
526+
* The root cgroup. The containing cgroup_root will be destroyed on its
527+
* release. cgrp->ancestors[0] will be used overflowing into the
528+
* following field. cgrp_ancestor_storage must immediately follow.
529+
*/
526530
struct cgroup cgrp;
527531

528-
/* for cgrp->ancestor_ids[0] */
529-
u64 cgrp_ancestor_id_storage;
532+
/* must follow cgrp for cgrp->ancestors[0], see above */
533+
struct cgroup *cgrp_ancestor_storage;
530534

531535
/* Number of cgroups in the hierarchy, used only for /proc/cgroups */
532536
atomic_t nr_cgrps;

include/linux/cgroup.h

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -575,7 +575,7 @@ static inline bool cgroup_is_descendant(struct cgroup *cgrp,
575575
{
576576
if (cgrp->root != ancestor->root || cgrp->level < ancestor->level)
577577
return false;
578-
return cgrp->ancestor_ids[ancestor->level] == cgroup_id(ancestor);
578+
return cgrp->ancestors[ancestor->level] == ancestor;
579579
}
580580

581581
/**
@@ -592,11 +592,9 @@ static inline bool cgroup_is_descendant(struct cgroup *cgrp,
592592
static inline struct cgroup *cgroup_ancestor(struct cgroup *cgrp,
593593
int ancestor_level)
594594
{
595-
if (cgrp->level < ancestor_level)
595+
if (ancestor_level < 0 || ancestor_level > cgrp->level)
596596
return NULL;
597-
while (cgrp && cgrp->level > ancestor_level)
598-
cgrp = cgroup_parent(cgrp);
599-
return cgrp;
597+
return cgrp->ancestors[ancestor_level];
600598
}
601599

602600
/**
@@ -748,11 +746,6 @@ static inline bool task_under_cgroup_hierarchy(struct task_struct *task,
748746

749747
static inline void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen)
750748
{}
751-
752-
static inline struct cgroup *cgroup_get_from_id(u64 id)
753-
{
754-
return NULL;
755-
}
756749
#endif /* !CONFIG_CGROUPS */
757750

758751
#ifdef CONFIG_CGROUPS

kernel/cgroup/cgroup-internal.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -250,6 +250,8 @@ int cgroup_migrate(struct task_struct *leader, bool threadgroup,
250250

251251
int cgroup_attach_task(struct cgroup *dst_cgrp, struct task_struct *leader,
252252
bool threadgroup);
253+
void cgroup_attach_lock(bool lock_threadgroup);
254+
void cgroup_attach_unlock(bool lock_threadgroup);
253255
struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
254256
bool *locked)
255257
__acquires(&cgroup_threadgroup_rwsem);

kernel/cgroup/cgroup-v1.c

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -59,8 +59,7 @@ int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
5959
int retval = 0;
6060

6161
mutex_lock(&cgroup_mutex);
62-
cpus_read_lock();
63-
percpu_down_write(&cgroup_threadgroup_rwsem);
62+
cgroup_attach_lock(true);
6463
for_each_root(root) {
6564
struct cgroup *from_cgrp;
6665

@@ -72,8 +71,7 @@ int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
7271
if (retval)
7372
break;
7473
}
75-
percpu_up_write(&cgroup_threadgroup_rwsem);
76-
cpus_read_unlock();
74+
cgroup_attach_unlock(true);
7775
mutex_unlock(&cgroup_mutex);
7876

7977
return retval;

0 commit comments

Comments
 (0)