Skip to content

Commit c2931b7

Browse files
committed
cgroup: iterate cgroup_subsys_states directly
Currently, css_next_child() is implemented as finding the next child cgroup which has the css enabled, which used to be the only way to do it as only cgroups participated in sibling lists and thus could be iteratd. This works as long as what's required during iteration is not missing online csses; however, it turns out that there are use cases where offlined but not yet released csses need to be iterated. This is difficult to implement through cgroup iteration the unified hierarchy as there may be multiple dying csses for the same subsystem associated with single cgroup. After the recent changes, the cgroup self and regular csses behave identically in how they're linked and unlinked from the sibling lists including assertion of CSS_RELEASED and css_next_child() can simply switch to iterating csses directly. This both simplifies the logic and ensures that all visible non-released csses are included in the iteration whether there are multiple dying csses for a subsystem or not. As all other iterators depend on css_next_child() for sibling iteration, this changes behaviors of all css iterators. Add and update explanations on the css states which are included in traversal to all iterators. As css iteration could always contain offlined csses, this shouldn't break any of the current users and new usages which need iteration of all on and offline csses can make use of the new semantics. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org>
1 parent de3f034 commit c2931b7

File tree

2 files changed

+63
-43
lines changed

2 files changed

+63
-43
lines changed

include/linux/cgroup.h

Lines changed: 26 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -764,14 +764,14 @@ struct cgroup_subsys_state *css_from_id(int id, struct cgroup_subsys *ss);
764764
* @pos: the css * to use as the loop cursor
765765
* @parent: css whose children to walk
766766
*
767-
* Walk @parent's children. Must be called under rcu_read_lock(). A child
768-
* css which hasn't finished ->css_online() or already has finished
769-
* ->css_offline() may show up during traversal and it's each subsystem's
770-
* responsibility to verify that each @pos is alive.
767+
* Walk @parent's children. Must be called under rcu_read_lock().
771768
*
772-
* If a subsystem synchronizes against the parent in its ->css_online() and
773-
* before starting iterating, a css which finished ->css_online() is
774-
* guaranteed to be visible in the future iterations.
769+
* If a subsystem synchronizes ->css_online() and the start of iteration, a
770+
* css which finished ->css_online() is guaranteed to be visible in the
771+
* future iterations and will stay visible until the last reference is put.
772+
* A css which hasn't finished ->css_online() or already finished
773+
* ->css_offline() may show up during traversal. It's each subsystem's
774+
* responsibility to synchronize against on/offlining.
775775
*
776776
* It is allowed to temporarily drop RCU read lock during iteration. The
777777
* caller is responsible for ensuring that @pos remains accessible until
@@ -794,17 +794,16 @@ css_rightmost_descendant(struct cgroup_subsys_state *pos);
794794
* @root: css whose descendants to walk
795795
*
796796
* Walk @root's descendants. @root is included in the iteration and the
797-
* first node to be visited. Must be called under rcu_read_lock(). A
798-
* descendant css which hasn't finished ->css_online() or already has
799-
* finished ->css_offline() may show up during traversal and it's each
800-
* subsystem's responsibility to verify that each @pos is alive.
797+
* first node to be visited. Must be called under rcu_read_lock().
801798
*
802-
* If a subsystem synchronizes against the parent in its ->css_online() and
803-
* before starting iterating, and synchronizes against @pos on each
804-
* iteration, any descendant css which finished ->css_online() is
805-
* guaranteed to be visible in the future iterations.
799+
* If a subsystem synchronizes ->css_online() and the start of iteration, a
800+
* css which finished ->css_online() is guaranteed to be visible in the
801+
* future iterations and will stay visible until the last reference is put.
802+
* A css which hasn't finished ->css_online() or already finished
803+
* ->css_offline() may show up during traversal. It's each subsystem's
804+
* responsibility to synchronize against on/offlining.
806805
*
807-
* In other words, the following guarantees that a descendant can't escape
806+
* For example, the following guarantees that a descendant can't escape
808807
* state updates of its ancestors.
809808
*
810809
* my_online(@css)
@@ -860,8 +859,17 @@ css_next_descendant_post(struct cgroup_subsys_state *pos,
860859
*
861860
* Similar to css_for_each_descendant_pre() but performs post-order
862861
* traversal instead. @root is included in the iteration and the last
863-
* node to be visited. Note that the walk visibility guarantee described
864-
* in pre-order walk doesn't apply the same to post-order walks.
862+
* node to be visited.
863+
*
864+
* If a subsystem synchronizes ->css_online() and the start of iteration, a
865+
* css which finished ->css_online() is guaranteed to be visible in the
866+
* future iterations and will stay visible until the last reference is put.
867+
* A css which hasn't finished ->css_online() or already finished
868+
* ->css_offline() may show up during traversal. It's each subsystem's
869+
* responsibility to synchronize against on/offlining.
870+
*
871+
* Note that the walk visibility guarantee example described in pre-order
872+
* walk doesn't apply the same to post-order walks.
865873
*/
866874
#define css_for_each_descendant_post(pos, css) \
867875
for ((pos) = css_next_descendant_post(NULL, (css)); (pos); \

kernel/cgroup.c

Lines changed: 37 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -3089,21 +3089,25 @@ static int cgroup_task_count(const struct cgroup *cgrp)
30893089

30903090
/**
30913091
* css_next_child - find the next child of a given css
3092-
* @pos_css: the current position (%NULL to initiate traversal)
3093-
* @parent_css: css whose children to walk
3092+
* @pos: the current position (%NULL to initiate traversal)
3093+
* @parent: css whose children to walk
30943094
*
3095-
* This function returns the next child of @parent_css and should be called
3095+
* This function returns the next child of @parent and should be called
30963096
* under either cgroup_mutex or RCU read lock. The only requirement is
3097-
* that @parent_css and @pos_css are accessible. The next sibling is
3098-
* guaranteed to be returned regardless of their states.
3097+
* that @parent and @pos are accessible. The next sibling is guaranteed to
3098+
* be returned regardless of their states.
3099+
*
3100+
* If a subsystem synchronizes ->css_online() and the start of iteration, a
3101+
* css which finished ->css_online() is guaranteed to be visible in the
3102+
* future iterations and will stay visible until the last reference is put.
3103+
* A css which hasn't finished ->css_online() or already finished
3104+
* ->css_offline() may show up during traversal. It's each subsystem's
3105+
* responsibility to synchronize against on/offlining.
30993106
*/
3100-
struct cgroup_subsys_state *
3101-
css_next_child(struct cgroup_subsys_state *pos_css,
3102-
struct cgroup_subsys_state *parent_css)
3107+
struct cgroup_subsys_state *css_next_child(struct cgroup_subsys_state *pos,
3108+
struct cgroup_subsys_state *parent)
31033109
{
3104-
struct cgroup *pos = pos_css ? pos_css->cgroup : NULL;
3105-
struct cgroup *cgrp = parent_css->cgroup;
3106-
struct cgroup *next;
3110+
struct cgroup_subsys_state *next;
31073111

31083112
cgroup_assert_mutex_or_rcu_locked();
31093113

@@ -3128,27 +3132,21 @@ css_next_child(struct cgroup_subsys_state *pos_css,
31283132
* races against release and the race window is very small.
31293133
*/
31303134
if (!pos) {
3131-
next = list_entry_rcu(cgrp->self.children.next, struct cgroup, self.sibling);
3132-
} else if (likely(!(pos->self.flags & CSS_RELEASED))) {
3133-
next = list_entry_rcu(pos->self.sibling.next, struct cgroup, self.sibling);
3135+
next = list_entry_rcu(parent->children.next, struct cgroup_subsys_state, sibling);
3136+
} else if (likely(!(pos->flags & CSS_RELEASED))) {
3137+
next = list_entry_rcu(pos->sibling.next, struct cgroup_subsys_state, sibling);
31343138
} else {
3135-
list_for_each_entry_rcu(next, &cgrp->self.children, self.sibling)
3136-
if (next->self.serial_nr > pos->self.serial_nr)
3139+
list_for_each_entry_rcu(next, &parent->children, sibling)
3140+
if (next->serial_nr > pos->serial_nr)
31373141
break;
31383142
}
31393143

31403144
/*
31413145
* @next, if not pointing to the head, can be dereferenced and is
3142-
* the next sibling; however, it might have @ss disabled. If so,
3143-
* fast-forward to the next enabled one.
3146+
* the next sibling.
31443147
*/
3145-
while (&next->self.sibling != &cgrp->self.children) {
3146-
struct cgroup_subsys_state *next_css = cgroup_css(next, parent_css->ss);
3147-
3148-
if (next_css)
3149-
return next_css;
3150-
next = list_entry_rcu(next->self.sibling.next, struct cgroup, self.sibling);
3151-
}
3148+
if (&next->sibling != &parent->children)
3149+
return next;
31523150
return NULL;
31533151
}
31543152

@@ -3165,6 +3163,13 @@ css_next_child(struct cgroup_subsys_state *pos_css,
31653163
* doesn't require the whole traversal to be contained in a single critical
31663164
* section. This function will return the correct next descendant as long
31673165
* as both @pos and @root are accessible and @pos is a descendant of @root.
3166+
*
3167+
* If a subsystem synchronizes ->css_online() and the start of iteration, a
3168+
* css which finished ->css_online() is guaranteed to be visible in the
3169+
* future iterations and will stay visible until the last reference is put.
3170+
* A css which hasn't finished ->css_online() or already finished
3171+
* ->css_offline() may show up during traversal. It's each subsystem's
3172+
* responsibility to synchronize against on/offlining.
31683173
*/
31693174
struct cgroup_subsys_state *
31703175
css_next_descendant_pre(struct cgroup_subsys_state *pos,
@@ -3252,6 +3257,13 @@ css_leftmost_descendant(struct cgroup_subsys_state *pos)
32523257
* section. This function will return the correct next descendant as long
32533258
* as both @pos and @cgroup are accessible and @pos is a descendant of
32543259
* @cgroup.
3260+
*
3261+
* If a subsystem synchronizes ->css_online() and the start of iteration, a
3262+
* css which finished ->css_online() is guaranteed to be visible in the
3263+
* future iterations and will stay visible until the last reference is put.
3264+
* A css which hasn't finished ->css_online() or already finished
3265+
* ->css_offline() may show up during traversal. It's each subsystem's
3266+
* responsibility to synchronize against on/offlining.
32553267
*/
32563268
struct cgroup_subsys_state *
32573269
css_next_descendant_post(struct cgroup_subsys_state *pos,

0 commit comments

Comments
 (0)