Skip to content

Commit 8e7fbcb

Browse files
Peter ZijlstraIngo Molnar
authored andcommitted
sched: Remove stale power aware scheduling remnants and dysfunctional knobs
It's been broken forever (i.e. it's not scheduling in a power aware fashion), as reported by Suresh and others sending patches, and nobody cares enough to fix it properly ... so remove it to make space free for something better. There's various problems with the code as it stands today, first and foremost the user interface which is bound to topology levels and has multiple values per level. This results in a state explosion which the administrator or distro needs to master and almost nobody does. Furthermore large configuration state spaces aren't good, it means the thing doesn't just work right because it's either under so many impossibe to meet constraints, or even if there's an achievable state workloads have to be aware of it precisely and can never meet it for dynamic workloads. So pushing this kind of decision to user-space was a bad idea even with a single knob - it's exponentially worse with knobs on every node of the topology. There is a proposal to replace the user interface with a single 3 state knob: sched_balance_policy := { performance, power, auto } where 'auto' would be the preferred default which looks at things like Battery/AC mode and possible cpufreq state or whatever the hw exposes to show us power use expectations - but there's been no progress on it in the past many months. Aside from that, the actual implementation of the various knobs is known to be broken. There have been sporadic attempts at fixing things but these always stop short of reaching a mergable state. Therefore this wholesale removal with the hopes of spurring people who care to come forward once again and work on a coherent replacement. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1326104915.2442.53.camel@twins Signed-off-by: Ingo Molnar <mingo@kernel.org>
1 parent fac536f commit 8e7fbcb

File tree

11 files changed

+5
-498
lines changed

11 files changed

+5
-498
lines changed

Documentation/ABI/testing/sysfs-devices-system-cpu

Lines changed: 0 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -9,31 +9,6 @@ Description:
99

1010
/sys/devices/system/cpu/cpu#/
1111

12-
What: /sys/devices/system/cpu/sched_mc_power_savings
13-
/sys/devices/system/cpu/sched_smt_power_savings
14-
Date: June 2006
15-
Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>
16-
Description: Discover and adjust the kernel's multi-core scheduler support.
17-
18-
Possible values are:
19-
20-
0 - No power saving load balance (default value)
21-
1 - Fill one thread/core/package first for long running threads
22-
2 - Also bias task wakeups to semi-idle cpu package for power
23-
savings
24-
25-
sched_mc_power_savings is dependent upon SCHED_MC, which is
26-
itself architecture dependent.
27-
28-
sched_smt_power_savings is dependent upon SCHED_SMT, which
29-
is itself architecture dependent.
30-
31-
The two files are independent of each other. It is possible
32-
that one file may be present without the other.
33-
34-
Introduced by git commit 5c45bf27.
35-
36-
3712
What: /sys/devices/system/cpu/kernel_max
3813
/sys/devices/system/cpu/offline
3914
/sys/devices/system/cpu/online

Documentation/scheduler/sched-domains.txt

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -61,10 +61,6 @@ The implementor should read comments in include/linux/sched.h:
6161
struct sched_domain fields, SD_FLAG_*, SD_*_INIT to get an idea of
6262
the specifics and what to tune.
6363

64-
For SMT, the architecture must define CONFIG_SCHED_SMT and provide a
65-
cpumask_t cpu_sibling_map[NR_CPUS], where cpu_sibling_map[i] is the mask of
66-
all "i"'s siblings as well as "i" itself.
67-
6864
Architectures may retain the regular override the default SD_*_INIT flags
6965
while using the generic domain builder in kernel/sched.c if they wish to
7066
retain the traditional SMT->SMP->NUMA topology (or some subset of that). This

arch/x86/kernel/smpboot.c

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -429,8 +429,7 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
429429
* For perf, we return last level cache shared map.
430430
* And for power savings, we return cpu_core_map
431431
*/
432-
if ((sched_mc_power_savings || sched_smt_power_savings) &&
433-
!(cpu_has(c, X86_FEATURE_AMD_DCM)))
432+
if (!(cpu_has(c, X86_FEATURE_AMD_DCM)))
434433
return cpu_core_mask(cpu);
435434
else
436435
return cpu_llc_shared_mask(cpu);

drivers/base/cpu.c

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -330,8 +330,4 @@ void __init cpu_dev_init(void)
330330
panic("Failed to register CPU subsystem");
331331

332332
cpu_dev_register_generic();
333-
334-
#if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
335-
sched_create_sysfs_power_savings_entries(cpu_subsys.dev_root);
336-
#endif
337333
}

include/linux/cpu.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,6 @@ extern void cpu_remove_dev_attr(struct device_attribute *attr);
3636
extern int cpu_add_dev_attr_group(struct attribute_group *attrs);
3737
extern void cpu_remove_dev_attr_group(struct attribute_group *attrs);
3838

39-
extern int sched_create_sysfs_power_savings_entries(struct device *dev);
40-
4139
#ifdef CONFIG_HOTPLUG_CPU
4240
extern void unregister_cpu(struct cpu *cpu);
4341
extern ssize_t arch_cpu_probe(const char *, size_t);

include/linux/sched.h

Lines changed: 0 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -855,61 +855,14 @@ enum cpu_idle_type {
855855
#define SD_WAKE_AFFINE 0x0020 /* Wake task to waking CPU */
856856
#define SD_PREFER_LOCAL 0x0040 /* Prefer to keep tasks local to this domain */
857857
#define SD_SHARE_CPUPOWER 0x0080 /* Domain members share cpu power */
858-
#define SD_POWERSAVINGS_BALANCE 0x0100 /* Balance for power savings */
859858
#define SD_SHARE_PKG_RESOURCES 0x0200 /* Domain members share cpu pkg resources */
860859
#define SD_SERIALIZE 0x0400 /* Only a single load balancing instance */
861860
#define SD_ASYM_PACKING 0x0800 /* Place busy groups earlier in the domain */
862861
#define SD_PREFER_SIBLING 0x1000 /* Prefer to place tasks in a sibling domain */
863862
#define SD_OVERLAP 0x2000 /* sched_domains of this level overlap */
864863

865-
enum powersavings_balance_level {
866-
POWERSAVINGS_BALANCE_NONE = 0, /* No power saving load balance */
867-
POWERSAVINGS_BALANCE_BASIC, /* Fill one thread/core/package
868-
* first for long running threads
869-
*/
870-
POWERSAVINGS_BALANCE_WAKEUP, /* Also bias task wakeups to semi-idle
871-
* cpu package for power savings
872-
*/
873-
MAX_POWERSAVINGS_BALANCE_LEVELS
874-
};
875-
876-
extern int sched_mc_power_savings, sched_smt_power_savings;
877-
878-
static inline int sd_balance_for_mc_power(void)
879-
{
880-
if (sched_smt_power_savings)
881-
return SD_POWERSAVINGS_BALANCE;
882-
883-
if (!sched_mc_power_savings)
884-
return SD_PREFER_SIBLING;
885-
886-
return 0;
887-
}
888-
889-
static inline int sd_balance_for_package_power(void)
890-
{
891-
if (sched_mc_power_savings | sched_smt_power_savings)
892-
return SD_POWERSAVINGS_BALANCE;
893-
894-
return SD_PREFER_SIBLING;
895-
}
896-
897864
extern int __weak arch_sd_sibiling_asym_packing(void);
898865

899-
/*
900-
* Optimise SD flags for power savings:
901-
* SD_BALANCE_NEWIDLE helps aggressive task consolidation and power savings.
902-
* Keep default SD flags if sched_{smt,mc}_power_saving=0
903-
*/
904-
905-
static inline int sd_power_saving_flags(void)
906-
{
907-
if (sched_mc_power_savings | sched_smt_power_savings)
908-
return SD_BALANCE_NEWIDLE;
909-
910-
return 0;
911-
}
912-
913866
struct sched_group_power {
914867
atomic_t ref;
915868
/*

include/linux/topology.h

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,6 @@ int arch_update_cpu_topology(void);
9898
| 0*SD_BALANCE_WAKE \
9999
| 1*SD_WAKE_AFFINE \
100100
| 1*SD_SHARE_CPUPOWER \
101-
| 0*SD_POWERSAVINGS_BALANCE \
102101
| 1*SD_SHARE_PKG_RESOURCES \
103102
| 0*SD_SERIALIZE \
104103
| 0*SD_PREFER_SIBLING \
@@ -134,8 +133,6 @@ int arch_update_cpu_topology(void);
134133
| 0*SD_SHARE_CPUPOWER \
135134
| 1*SD_SHARE_PKG_RESOURCES \
136135
| 0*SD_SERIALIZE \
137-
| sd_balance_for_mc_power() \
138-
| sd_power_saving_flags() \
139136
, \
140137
.last_balance = jiffies, \
141138
.balance_interval = 1, \
@@ -167,8 +164,6 @@ int arch_update_cpu_topology(void);
167164
| 0*SD_SHARE_CPUPOWER \
168165
| 0*SD_SHARE_PKG_RESOURCES \
169166
| 0*SD_SERIALIZE \
170-
| sd_balance_for_package_power() \
171-
| sd_power_saving_flags() \
172167
, \
173168
.last_balance = jiffies, \
174169
.balance_interval = 1, \

kernel/sched/core.c

Lines changed: 0 additions & 94 deletions
Original file line numberDiff line numberDiff line change
@@ -5929,8 +5929,6 @@ static const struct cpumask *cpu_cpu_mask(int cpu)
59295929
return cpumask_of_node(cpu_to_node(cpu));
59305930
}
59315931

5932-
int sched_smt_power_savings = 0, sched_mc_power_savings = 0;
5933-
59345932
struct sd_data {
59355933
struct sched_domain **__percpu sd;
59365934
struct sched_group **__percpu sg;
@@ -6322,7 +6320,6 @@ sd_numa_init(struct sched_domain_topology_level *tl, int cpu)
63226320
| 0*SD_WAKE_AFFINE
63236321
| 0*SD_PREFER_LOCAL
63246322
| 0*SD_SHARE_CPUPOWER
6325-
| 0*SD_POWERSAVINGS_BALANCE
63266323
| 0*SD_SHARE_PKG_RESOURCES
63276324
| 1*SD_SERIALIZE
63286325
| 0*SD_PREFER_SIBLING
@@ -6819,97 +6816,6 @@ void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
68196816
mutex_unlock(&sched_domains_mutex);
68206817
}
68216818

6822-
#if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
6823-
static void reinit_sched_domains(void)
6824-
{
6825-
get_online_cpus();
6826-
6827-
/* Destroy domains first to force the rebuild */
6828-
partition_sched_domains(0, NULL, NULL);
6829-
6830-
rebuild_sched_domains();
6831-
put_online_cpus();
6832-
}
6833-
6834-
static ssize_t sched_power_savings_store(const char *buf, size_t count, int smt)
6835-
{
6836-
unsigned int level = 0;
6837-
6838-
if (sscanf(buf, "%u", &level) != 1)
6839-
return -EINVAL;
6840-
6841-
/*
6842-
* level is always be positive so don't check for
6843-
* level < POWERSAVINGS_BALANCE_NONE which is 0
6844-
* What happens on 0 or 1 byte write,
6845-
* need to check for count as well?
6846-
*/
6847-
6848-
if (level >= MAX_POWERSAVINGS_BALANCE_LEVELS)
6849-
return -EINVAL;
6850-
6851-
if (smt)
6852-
sched_smt_power_savings = level;
6853-
else
6854-
sched_mc_power_savings = level;
6855-
6856-
reinit_sched_domains();
6857-
6858-
return count;
6859-
}
6860-
6861-
#ifdef CONFIG_SCHED_MC
6862-
static ssize_t sched_mc_power_savings_show(struct device *dev,
6863-
struct device_attribute *attr,
6864-
char *buf)
6865-
{
6866-
return sprintf(buf, "%u\n", sched_mc_power_savings);
6867-
}
6868-
static ssize_t sched_mc_power_savings_store(struct device *dev,
6869-
struct device_attribute *attr,
6870-
const char *buf, size_t count)
6871-
{
6872-
return sched_power_savings_store(buf, count, 0);
6873-
}
6874-
static DEVICE_ATTR(sched_mc_power_savings, 0644,
6875-
sched_mc_power_savings_show,
6876-
sched_mc_power_savings_store);
6877-
#endif
6878-
6879-
#ifdef CONFIG_SCHED_SMT
6880-
static ssize_t sched_smt_power_savings_show(struct device *dev,
6881-
struct device_attribute *attr,
6882-
char *buf)
6883-
{
6884-
return sprintf(buf, "%u\n", sched_smt_power_savings);
6885-
}
6886-
static ssize_t sched_smt_power_savings_store(struct device *dev,
6887-
struct device_attribute *attr,
6888-
const char *buf, size_t count)
6889-
{
6890-
return sched_power_savings_store(buf, count, 1);
6891-
}
6892-
static DEVICE_ATTR(sched_smt_power_savings, 0644,
6893-
sched_smt_power_savings_show,
6894-
sched_smt_power_savings_store);
6895-
#endif
6896-
6897-
int __init sched_create_sysfs_power_savings_entries(struct device *dev)
6898-
{
6899-
int err = 0;
6900-
6901-
#ifdef CONFIG_SCHED_SMT
6902-
if (smt_capable())
6903-
err = device_create_file(dev, &dev_attr_sched_smt_power_savings);
6904-
#endif
6905-
#ifdef CONFIG_SCHED_MC
6906-
if (!err && mc_capable())
6907-
err = device_create_file(dev, &dev_attr_sched_mc_power_savings);
6908-
#endif
6909-
return err;
6910-
}
6911-
#endif /* CONFIG_SCHED_MC || CONFIG_SCHED_SMT */
6912-
69136819
/*
69146820
* Update cpusets according to cpu_active mask. If cpusets are
69156821
* disabled, cpuset_update_active_cpus() becomes a simple wrapper

0 commit comments

Comments
 (0)