Skip to content

Commit b60205c

Browse files
Peter ZijlstraIngo Molnar
authored andcommitted
sched/fair: Fix min_vruntime tracking
While going through enqueue/dequeue to review the movement of set_curr_task() I noticed that the (2nd) update_min_vruntime() call in dequeue_entity() is suspect. It turns out, its actually wrong because it will consider cfs_rq->curr, which could be the entry we just normalized. This mixes different vruntime forms and leads to fail. The purpose of the second update_min_vruntime() is to move min_vruntime forward if the entity we just removed is the one that was holding it back; _except_ for the DEQUEUE_SAVE case, because then we know its a temporary removal and it will come back. However, since we do put_prev_task() _after_ dequeue(), cfs_rq->curr will still be set (and per the above, can be tranformed into a different unit), so update_min_vruntime() should also consider curr->on_rq. This also fixes another corner case where the enqueue (which also does update_curr()->update_min_vruntime()) happens on the rq->lock break in schedule(), between dequeue and put_prev_task. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Fixes: 1e87623 ("sched: Fix ->min_vruntime calculation in dequeue_entity()") Signed-off-by: Ingo Molnar <mingo@kernel.org>
1 parent 9148a3a commit b60205c

File tree

1 file changed

+22
-7
lines changed

1 file changed

+22
-7
lines changed

kernel/sched/fair.c

Lines changed: 22 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -460,17 +460,23 @@ static inline int entity_before(struct sched_entity *a,
460460

461461
static void update_min_vruntime(struct cfs_rq *cfs_rq)
462462
{
463+
struct sched_entity *curr = cfs_rq->curr;
464+
463465
u64 vruntime = cfs_rq->min_vruntime;
464466

465-
if (cfs_rq->curr)
466-
vruntime = cfs_rq->curr->vruntime;
467+
if (curr) {
468+
if (curr->on_rq)
469+
vruntime = curr->vruntime;
470+
else
471+
curr = NULL;
472+
}
467473

468474
if (cfs_rq->rb_leftmost) {
469475
struct sched_entity *se = rb_entry(cfs_rq->rb_leftmost,
470476
struct sched_entity,
471477
run_node);
472478

473-
if (!cfs_rq->curr)
479+
if (!curr)
474480
vruntime = se->vruntime;
475481
else
476482
vruntime = min_vruntime(vruntime, se->vruntime);
@@ -3478,18 +3484,27 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
34783484
account_entity_dequeue(cfs_rq, se);
34793485

34803486
/*
3481-
* Normalize the entity after updating the min_vruntime because the
3482-
* update can refer to the ->curr item and we need to reflect this
3483-
* movement in our normalized position.
3487+
* Normalize after update_curr(); which will also have moved
3488+
* min_vruntime if @se is the one holding it back. But before doing
3489+
* update_min_vruntime() again, which will discount @se's position and
3490+
* can move min_vruntime forward still more.
34843491
*/
34853492
if (!(flags & DEQUEUE_SLEEP))
34863493
se->vruntime -= cfs_rq->min_vruntime;
34873494

34883495
/* return excess runtime on last dequeue */
34893496
return_cfs_rq_runtime(cfs_rq);
34903497

3491-
update_min_vruntime(cfs_rq);
34923498
update_cfs_shares(cfs_rq);
3499+
3500+
/*
3501+
* Now advance min_vruntime if @se was the entity holding it back,
3502+
* except when: DEQUEUE_SAVE && !DEQUEUE_MOVE, in this case we'll be
3503+
* put back on, and if we advance min_vruntime, we'll be placed back
3504+
* further than we started -- ie. we'll be penalized.
3505+
*/
3506+
if ((flags & (DEQUEUE_SAVE | DEQUEUE_MOVE)) == DEQUEUE_SAVE)
3507+
update_min_vruntime(cfs_rq);
34933508
}
34943509

34953510
/*

0 commit comments

Comments
 (0)