Skip to content

Commit 08a2e45

Browse files
committed
Merge branches 'pm-cpuidle' and 'powercap'
* pm-cpuidle: ACPI / processor: Set P_LVL{2,3} idle state descriptions intel_idle: add support for Jacobsville cpuidle: dt: bail out if the idle-state DT node is not compatible cpuidle: use BIT() for idle state flags and remove CPUIDLE_DRIVER_FLAGS_MASK Documentation: driver-api: PM: Add cpuidle document cpuidle: New timer events oriented governor for tickless systems * powercap: powercap/intel_rapl: add Ice Lake mobile powercap: intel_rapl: add support for Jacobsville
3 parents c3739c5 + 34a62cd + ba6f3ec commit 08a2e45

File tree

14 files changed

+860
-88
lines changed

14 files changed

+860
-88
lines changed

Documentation/admin-guide/pm/cpuidle.rst

Lines changed: 96 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -155,14 +155,14 @@ governor uses that information depends on what algorithm is implemented by it
155155
and that is the primary reason for having more than one governor in the
156156
``CPUIdle`` subsystem.
157157

158-
There are two ``CPUIdle`` governors available, ``menu`` and ``ladder``. Which
159-
of them is used depends on the configuration of the kernel and in particular on
160-
whether or not the scheduler tick can be `stopped by the idle
161-
loop <idle-cpus-and-tick_>`_. It is possible to change the governor at run time
162-
if the ``cpuidle_sysfs_switch`` command line parameter has been passed to the
163-
kernel, but that is not safe in general, so it should not be done on production
164-
systems (that may change in the future, though). The name of the ``CPUIdle``
165-
governor currently used by the kernel can be read from the
158+
There are three ``CPUIdle`` governors available, ``menu``, `TEO <teo-gov_>`_
159+
and ``ladder``. Which of them is used by default depends on the configuration
160+
of the kernel and in particular on whether or not the scheduler tick can be
161+
`stopped by the idle loop <idle-cpus-and-tick_>`_. It is possible to change the
162+
governor at run time if the ``cpuidle_sysfs_switch`` command line parameter has
163+
been passed to the kernel, but that is not safe in general, so it should not be
164+
done on production systems (that may change in the future, though). The name of
165+
the ``CPUIdle`` governor currently used by the kernel can be read from the
166166
:file:`current_governor_ro` (or :file:`current_governor` if
167167
``cpuidle_sysfs_switch`` is present in the kernel command line) file under
168168
:file:`/sys/devices/system/cpu/cpuidle/` in ``sysfs``.
@@ -256,6 +256,8 @@ the ``menu`` governor by default and if it is not tickless, the default
256256
``CPUIdle`` governor on it will be ``ladder``.
257257

258258

259+
.. _menu-gov:
260+
259261
The ``menu`` Governor
260262
=====================
261263

@@ -333,6 +335,92 @@ that time, the governor may need to select a shallower state with a suitable
333335
target residency.
334336

335337

338+
.. _teo-gov:
339+
340+
The Timer Events Oriented (TEO) Governor
341+
========================================
342+
343+
The timer events oriented (TEO) governor is an alternative ``CPUIdle`` governor
344+
for tickless systems. It follows the same basic strategy as the ``menu`` `one
345+
<menu-gov_>`_: it always tries to find the deepest idle state suitable for the
346+
given conditions. However, it applies a different approach to that problem.
347+
348+
First, it does not use sleep length correction factors, but instead it attempts
349+
to correlate the observed idle duration values with the available idle states
350+
and use that information to pick up the idle state that is most likely to
351+
"match" the upcoming CPU idle interval. Second, it does not take the tasks
352+
that were running on the given CPU in the past and are waiting on some I/O
353+
operations to complete now at all (there is no guarantee that they will run on
354+
the same CPU when they become runnable again) and the pattern detection code in
355+
it avoids taking timer wakeups into account. It also only uses idle duration
356+
values less than the current time till the closest timer (with the scheduler
357+
tick excluded) for that purpose.
358+
359+
Like in the ``menu`` governor `case <menu-gov_>`_, the first step is to obtain
360+
the *sleep length*, which is the time until the closest timer event with the
361+
assumption that the scheduler tick will be stopped (that also is the upper bound
362+
on the time until the next CPU wakeup). That value is then used to preselect an
363+
idle state on the basis of three metrics maintained for each idle state provided
364+
by the ``CPUIdle`` driver: ``hits``, ``misses`` and ``early_hits``.
365+
366+
The ``hits`` and ``misses`` metrics measure the likelihood that a given idle
367+
state will "match" the observed (post-wakeup) idle duration if it "matches" the
368+
sleep length. They both are subject to decay (after a CPU wakeup) every time
369+
the target residency of the idle state corresponding to them is less than or
370+
equal to the sleep length and the target residency of the next idle state is
371+
greater than the sleep length (that is, when the idle state corresponding to
372+
them "matches" the sleep length). The ``hits`` metric is increased if the
373+
former condition is satisfied and the target residency of the given idle state
374+
is less than or equal to the observed idle duration and the target residency of
375+
the next idle state is greater than the observed idle duration at the same time
376+
(that is, it is increased when the given idle state "matches" both the sleep
377+
length and the observed idle duration). In turn, the ``misses`` metric is
378+
increased when the given idle state "matches" the sleep length only and the
379+
observed idle duration is too short for its target residency.
380+
381+
The ``early_hits`` metric measures the likelihood that a given idle state will
382+
"match" the observed (post-wakeup) idle duration if it does not "match" the
383+
sleep length. It is subject to decay on every CPU wakeup and it is increased
384+
when the idle state corresponding to it "matches" the observed (post-wakeup)
385+
idle duration and the target residency of the next idle state is less than or
386+
equal to the sleep length (i.e. the idle state "matching" the sleep length is
387+
deeper than the given one).
388+
389+
The governor walks the list of idle states provided by the ``CPUIdle`` driver
390+
and finds the last (deepest) one with the target residency less than or equal
391+
to the sleep length. Then, the ``hits`` and ``misses`` metrics of that idle
392+
state are compared with each other and it is preselected if the ``hits`` one is
393+
greater (which means that that idle state is likely to "match" the observed idle
394+
duration after CPU wakeup). If the ``misses`` one is greater, the governor
395+
preselects the shallower idle state with the maximum ``early_hits`` metric
396+
(or if there are multiple shallower idle states with equal ``early_hits``
397+
metric which also is the maximum, the shallowest of them will be preselected).
398+
[If there is a wakeup latency constraint coming from the `PM QoS framework
399+
<cpu-pm-qos_>`_ which is hit before reaching the deepest idle state with the
400+
target residency within the sleep length, the deepest idle state with the exit
401+
latency within the constraint is preselected without consulting the ``hits``,
402+
``misses`` and ``early_hits`` metrics.]
403+
404+
Next, the governor takes several idle duration values observed most recently
405+
into consideration and if at least a half of them are greater than or equal to
406+
the target residency of the preselected idle state, that idle state becomes the
407+
final candidate to ask for. Otherwise, the average of the most recent idle
408+
duration values below the target residency of the preselected idle state is
409+
computed and the governor walks the idle states shallower than the preselected
410+
one and finds the deepest of them with the target residency within that average.
411+
That idle state is then taken as the final candidate to ask for.
412+
413+
Still, at this point the governor may need to refine the idle state selection if
414+
it has not decided to `stop the scheduler tick <idle-cpus-and-tick_>`_. That
415+
generally happens if the target residency of the idle state selected so far is
416+
less than the tick period and the tick has not been stopped already (in a
417+
previous iteration of the idle loop). Then, like in the ``menu`` governor
418+
`case <menu-gov_>`_, the sleep length used in the previous computations may not
419+
reflect the real time until the closest timer event and if it really is greater
420+
than that time, a shallower state with a suitable target residency may need to
421+
be selected.
422+
423+
336424
.. _idle-states-representation:
337425

338426
Representation of Idle States

Documentation/cpuidle/driver.txt

Lines changed: 0 additions & 37 deletions
This file was deleted.

Documentation/cpuidle/governor.txt

Lines changed: 0 additions & 28 deletions
This file was deleted.

0 commit comments

Comments
 (0)