Skip to content

Commit d519329

Browse files
derklingIngo Molnar
authored andcommitted
sched/fair: Update util_est only on util_avg updates
The estimated utilization of a task is currently updated every time the task is dequeued. However, to keep overheads under control, PELT signals are effectively updated at maximum once every 1ms. Thus, for really short running tasks, it can happen that their util_avg value has not been updates since their last enqueue. If such tasks are also frequently running tasks (e.g. the kind of workload generated by hackbench) it can also happen that their util_avg is updated only every few activations. This means that updating util_est at every dequeue potentially introduces not necessary overheads and it's also conceptually wrong if the util_avg signal has never been updated during a task activation. Let's introduce a throttling mechanism on task's util_est updates to sync them with util_avg updates. To make the solution memory efficient, both in terms of space and load/store operations, we encode a synchronization flag into the LSB of util_est.enqueued. This makes util_est an even values only metric, which is still considered good enough for its purpose. The synchronization bit is (re)set by __update_load_avg_se() once the PELT signal of a task has been updated during its last activation. Such a throttling mechanism allows to keep under control util_est overheads in the wakeup hot path, thus making it a suitable mechanism which can be enabled also on high-intensity workload systems. Thus, this now switches on by default the estimation utilization scheduler feature. Suggested-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Steve Muckle <smuckle@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@android.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Link: http://lkml.kernel.org/r/20180309095245.11071-5-patrick.bellasi@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
1 parent a07630b commit d519329

File tree

2 files changed

+39
-5
lines changed

2 files changed

+39
-5
lines changed

kernel/sched/fair.c

Lines changed: 38 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3242,6 +3242,32 @@ ___update_load_avg(struct sched_avg *sa, unsigned long load, unsigned long runna
32423242
sa->util_avg = sa->util_sum / divider;
32433243
}
32443244

3245+
/*
3246+
* When a task is dequeued, its estimated utilization should not be update if
3247+
* its util_avg has not been updated at least once.
3248+
* This flag is used to synchronize util_avg updates with util_est updates.
3249+
* We map this information into the LSB bit of the utilization saved at
3250+
* dequeue time (i.e. util_est.dequeued).
3251+
*/
3252+
#define UTIL_AVG_UNCHANGED 0x1
3253+
3254+
static inline void cfs_se_util_change(struct sched_avg *avg)
3255+
{
3256+
unsigned int enqueued;
3257+
3258+
if (!sched_feat(UTIL_EST))
3259+
return;
3260+
3261+
/* Avoid store if the flag has been already set */
3262+
enqueued = avg->util_est.enqueued;
3263+
if (!(enqueued & UTIL_AVG_UNCHANGED))
3264+
return;
3265+
3266+
/* Reset flag to report util_avg has been updated */
3267+
enqueued &= ~UTIL_AVG_UNCHANGED;
3268+
WRITE_ONCE(avg->util_est.enqueued, enqueued);
3269+
}
3270+
32453271
/*
32463272
* sched_entity:
32473273
*
@@ -3293,6 +3319,7 @@ __update_load_avg_se(u64 now, int cpu, struct cfs_rq *cfs_rq, struct sched_entit
32933319
cfs_rq->curr == se)) {
32943320

32953321
___update_load_avg(&se->avg, se_weight(se), se_runnable(se));
3322+
cfs_se_util_change(&se->avg);
32963323
return 1;
32973324
}
32983325

@@ -3900,7 +3927,7 @@ static inline void util_est_enqueue(struct cfs_rq *cfs_rq,
39003927

39013928
/* Update root cfs_rq's estimated utilization */
39023929
enqueued = cfs_rq->avg.util_est.enqueued;
3903-
enqueued += _task_util_est(p);
3930+
enqueued += (_task_util_est(p) | UTIL_AVG_UNCHANGED);
39043931
WRITE_ONCE(cfs_rq->avg.util_est.enqueued, enqueued);
39053932
}
39063933

@@ -3936,7 +3963,7 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep)
39363963
if (cfs_rq->nr_running) {
39373964
ue.enqueued = cfs_rq->avg.util_est.enqueued;
39383965
ue.enqueued -= min_t(unsigned int, ue.enqueued,
3939-
_task_util_est(p));
3966+
(_task_util_est(p) | UTIL_AVG_UNCHANGED));
39403967
}
39413968
WRITE_ONCE(cfs_rq->avg.util_est.enqueued, ue.enqueued);
39423969

@@ -3947,12 +3974,19 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep)
39473974
if (!task_sleep)
39483975
return;
39493976

3977+
/*
3978+
* If the PELT values haven't changed since enqueue time,
3979+
* skip the util_est update.
3980+
*/
3981+
ue = p->se.avg.util_est;
3982+
if (ue.enqueued & UTIL_AVG_UNCHANGED)
3983+
return;
3984+
39503985
/*
39513986
* Skip update of task's estimated utilization when its EWMA is
39523987
* already ~1% close to its last activation value.
39533988
*/
3954-
ue = p->se.avg.util_est;
3955-
ue.enqueued = task_util(p);
3989+
ue.enqueued = (task_util(p) | UTIL_AVG_UNCHANGED);
39563990
last_ewma_diff = ue.enqueued - ue.ewma;
39573991
if (within_margin(last_ewma_diff, (SCHED_CAPACITY_SCALE / 100)))
39583992
return;

kernel/sched/features.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,4 +89,4 @@ SCHED_FEAT(WA_BIAS, true)
8989
/*
9090
* UtilEstimation. Use estimated CPU utilization.
9191
*/
92-
SCHED_FEAT(UTIL_EST, false)
92+
SCHED_FEAT(UTIL_EST, true)

0 commit comments

Comments
 (0)