Skip to content

Commit f9be3e5

Browse files
derklingIngo Molnar
authored andcommitted
sched/fair: Use util_est in LB and WU paths
When the scheduler looks at the CPU utilization, the current PELT value for a CPU is returned straight away. In certain scenarios this can have undesired side effects on task placement. For example, since the task utilization is decayed at wakeup time, when a long sleeping big task is enqueued it does not add immediately a significant contribution to the target CPU. As a result we generate a race condition where other tasks can be placed on the same CPU while it is still considered relatively empty. In order to reduce this kind of race conditions, this patch introduces the required support to integrate the usage of the CPU's estimated utilization in the wakeup path, via cpu_util_wake(), as well as in the load-balance path, via cpu_util() which is used by update_sg_lb_stats(). The estimated utilization of a CPU is defined to be the maximum between its PELT's utilization and the sum of the estimated utilization (at previous dequeue time) of all the tasks currently RUNNABLE on that CPU. This allows to properly represent the spare capacity of a CPU which, for example, has just got a big task running since a long sleep period. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Steve Muckle <smuckle@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@android.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Link: http://lkml.kernel.org/r/20180309095245.11071-3-patrick.bellasi@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
1 parent 7f65ea4 commit f9be3e5

File tree

1 file changed

+70
-14
lines changed

1 file changed

+70
-14
lines changed

kernel/sched/fair.c

Lines changed: 70 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -6432,11 +6432,13 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
64326432
return target;
64336433
}
64346434

6435-
/*
6436-
* cpu_util returns the amount of capacity of a CPU that is used by CFS
6437-
* tasks. The unit of the return value must be the one of capacity so we can
6438-
* compare the utilization with the capacity of the CPU that is available for
6439-
* CFS task (ie cpu_capacity).
6435+
/**
6436+
* Amount of capacity of a CPU that is (estimated to be) used by CFS tasks
6437+
* @cpu: the CPU to get the utilization of
6438+
*
6439+
* The unit of the return value must be the one of capacity so we can compare
6440+
* the utilization with the capacity of the CPU that is available for CFS task
6441+
* (ie cpu_capacity).
64406442
*
64416443
* cfs_rq.avg.util_avg is the sum of running time of runnable tasks plus the
64426444
* recent utilization of currently non-runnable tasks on a CPU. It represents
@@ -6447,6 +6449,14 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
64476449
* current capacity (capacity_curr <= capacity_orig) of the CPU because it is
64486450
* the running time on this CPU scaled by capacity_curr.
64496451
*
6452+
* The estimated utilization of a CPU is defined to be the maximum between its
6453+
* cfs_rq.avg.util_avg and the sum of the estimated utilization of the tasks
6454+
* currently RUNNABLE on that CPU.
6455+
* This allows to properly represent the expected utilization of a CPU which
6456+
* has just got a big task running since a long sleep period. At the same time
6457+
* however it preserves the benefits of the "blocked utilization" in
6458+
* describing the potential for other tasks waking up on the same CPU.
6459+
*
64506460
* Nevertheless, cfs_rq.avg.util_avg can be higher than capacity_curr or even
64516461
* higher than capacity_orig because of unfortunate rounding in
64526462
* cfs.avg.util_avg or just after migrating tasks and new task wakeups until
@@ -6457,13 +6467,21 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
64576467
* available capacity. We allow utilization to overshoot capacity_curr (but not
64586468
* capacity_orig) as it useful for predicting the capacity required after task
64596469
* migrations (scheduler-driven DVFS).
6470+
*
6471+
* Return: the (estimated) utilization for the specified CPU
64606472
*/
6461-
static unsigned long cpu_util(int cpu)
6473+
static inline unsigned long cpu_util(int cpu)
64626474
{
6463-
unsigned long util = cpu_rq(cpu)->cfs.avg.util_avg;
6464-
unsigned long capacity = capacity_orig_of(cpu);
6475+
struct cfs_rq *cfs_rq;
6476+
unsigned int util;
6477+
6478+
cfs_rq = &cpu_rq(cpu)->cfs;
6479+
util = READ_ONCE(cfs_rq->avg.util_avg);
6480+
6481+
if (sched_feat(UTIL_EST))
6482+
util = max(util, READ_ONCE(cfs_rq->avg.util_est.enqueued));
64656483

6466-
return (util >= capacity) ? capacity : util;
6484+
return min_t(unsigned long, util, capacity_orig_of(cpu));
64676485
}
64686486

64696487
/*
@@ -6472,16 +6490,54 @@ static unsigned long cpu_util(int cpu)
64726490
*/
64736491
static unsigned long cpu_util_wake(int cpu, struct task_struct *p)
64746492
{
6475-
unsigned long util, capacity;
6493+
struct cfs_rq *cfs_rq;
6494+
unsigned int util;
64766495

64776496
/* Task has no contribution or is new */
6478-
if (cpu != task_cpu(p) || !p->se.avg.last_update_time)
6497+
if (cpu != task_cpu(p) || !READ_ONCE(p->se.avg.last_update_time))
64796498
return cpu_util(cpu);
64806499

6481-
capacity = capacity_orig_of(cpu);
6482-
util = max_t(long, cpu_rq(cpu)->cfs.avg.util_avg - task_util(p), 0);
6500+
cfs_rq = &cpu_rq(cpu)->cfs;
6501+
util = READ_ONCE(cfs_rq->avg.util_avg);
6502+
6503+
/* Discount task's blocked util from CPU's util */
6504+
util -= min_t(unsigned int, util, task_util(p));
64836505

6484-
return (util >= capacity) ? capacity : util;
6506+
/*
6507+
* Covered cases:
6508+
*
6509+
* a) if *p is the only task sleeping on this CPU, then:
6510+
* cpu_util (== task_util) > util_est (== 0)
6511+
* and thus we return:
6512+
* cpu_util_wake = (cpu_util - task_util) = 0
6513+
*
6514+
* b) if other tasks are SLEEPING on this CPU, which is now exiting
6515+
* IDLE, then:
6516+
* cpu_util >= task_util
6517+
* cpu_util > util_est (== 0)
6518+
* and thus we discount *p's blocked utilization to return:
6519+
* cpu_util_wake = (cpu_util - task_util) >= 0
6520+
*
6521+
* c) if other tasks are RUNNABLE on that CPU and
6522+
* util_est > cpu_util
6523+
* then we use util_est since it returns a more restrictive
6524+
* estimation of the spare capacity on that CPU, by just
6525+
* considering the expected utilization of tasks already
6526+
* runnable on that CPU.
6527+
*
6528+
* Cases a) and b) are covered by the above code, while case c) is
6529+
* covered by the following code when estimated utilization is
6530+
* enabled.
6531+
*/
6532+
if (sched_feat(UTIL_EST))
6533+
util = max(util, READ_ONCE(cfs_rq->avg.util_est.enqueued));
6534+
6535+
/*
6536+
* Utilization (estimated) can exceed the CPU capacity, thus let's
6537+
* clamp to the maximum CPU capacity to ensure consistency with
6538+
* the cpu_util call.
6539+
*/
6540+
return min_t(unsigned long, util, capacity_orig_of(cpu));
64856541
}
64866542

64876543
/*

0 commit comments

Comments
 (0)