@@ -155,14 +155,14 @@ governor uses that information depends on what algorithm is implemented by it
155
155
and that is the primary reason for having more than one governor in the
156
156
``CPUIdle `` subsystem.
157
157
158
- There are two ``CPUIdle `` governors available, ``menu `` and `` ladder ``. Which
159
- of them is used depends on the configuration of the kernel and in particular on
160
- whether or not the scheduler tick can be ` stopped by the idle
161
- loop <idle-cpus-and-tick_> `_. It is possible to change the governor at run time
162
- if the ``cpuidle_sysfs_switch `` command line parameter has been passed to the
163
- kernel, but that is not safe in general, so it should not be done on production
164
- systems (that may change in the future, though). The name of the `` CPUIdle ``
165
- governor currently used by the kernel can be read from the
158
+ There are three ``CPUIdle `` governors available, ``menu ``, ` TEO < teo-gov _>`_
159
+ and `` ladder ``. Which of them is used by default depends on the configuration
160
+ of the kernel and in particular on whether or not the scheduler tick can be
161
+ ` stopped by the idle loop <idle-cpus-and-tick _>`_. It is possible to change the
162
+ governor at run time if the ``cpuidle_sysfs_switch `` command line parameter has
163
+ been passed to the kernel, but that is not safe in general, so it should not be
164
+ done on production systems (that may change in the future, though). The name of
165
+ the `` CPUIdle `` governor currently used by the kernel can be read from the
166
166
:file: `current_governor_ro ` (or :file: `current_governor ` if
167
167
``cpuidle_sysfs_switch `` is present in the kernel command line) file under
168
168
:file: `/sys/devices/system/cpu/cpuidle/ ` in ``sysfs ``.
@@ -256,6 +256,8 @@ the ``menu`` governor by default and if it is not tickless, the default
256
256
``CPUIdle `` governor on it will be ``ladder ``.
257
257
258
258
259
+ .. _menu-gov :
260
+
259
261
The ``menu `` Governor
260
262
=====================
261
263
@@ -333,6 +335,92 @@ that time, the governor may need to select a shallower state with a suitable
333
335
target residency.
334
336
335
337
338
+ .. _teo-gov :
339
+
340
+ The Timer Events Oriented (TEO) Governor
341
+ ========================================
342
+
343
+ The timer events oriented (TEO) governor is an alternative ``CPUIdle `` governor
344
+ for tickless systems. It follows the same basic strategy as the ``menu `` `one
345
+ <menu-gov_> `_: it always tries to find the deepest idle state suitable for the
346
+ given conditions. However, it applies a different approach to that problem.
347
+
348
+ First, it does not use sleep length correction factors, but instead it attempts
349
+ to correlate the observed idle duration values with the available idle states
350
+ and use that information to pick up the idle state that is most likely to
351
+ "match" the upcoming CPU idle interval. Second, it does not take the tasks
352
+ that were running on the given CPU in the past and are waiting on some I/O
353
+ operations to complete now at all (there is no guarantee that they will run on
354
+ the same CPU when they become runnable again) and the pattern detection code in
355
+ it avoids taking timer wakeups into account. It also only uses idle duration
356
+ values less than the current time till the closest timer (with the scheduler
357
+ tick excluded) for that purpose.
358
+
359
+ Like in the ``menu `` governor `case <menu-gov _>`_, the first step is to obtain
360
+ the *sleep length *, which is the time until the closest timer event with the
361
+ assumption that the scheduler tick will be stopped (that also is the upper bound
362
+ on the time until the next CPU wakeup). That value is then used to preselect an
363
+ idle state on the basis of three metrics maintained for each idle state provided
364
+ by the ``CPUIdle `` driver: ``hits ``, ``misses `` and ``early_hits ``.
365
+
366
+ The ``hits `` and ``misses `` metrics measure the likelihood that a given idle
367
+ state will "match" the observed (post-wakeup) idle duration if it "matches" the
368
+ sleep length. They both are subject to decay (after a CPU wakeup) every time
369
+ the target residency of the idle state corresponding to them is less than or
370
+ equal to the sleep length and the target residency of the next idle state is
371
+ greater than the sleep length (that is, when the idle state corresponding to
372
+ them "matches" the sleep length). The ``hits `` metric is increased if the
373
+ former condition is satisfied and the target residency of the given idle state
374
+ is less than or equal to the observed idle duration and the target residency of
375
+ the next idle state is greater than the observed idle duration at the same time
376
+ (that is, it is increased when the given idle state "matches" both the sleep
377
+ length and the observed idle duration). In turn, the ``misses `` metric is
378
+ increased when the given idle state "matches" the sleep length only and the
379
+ observed idle duration is too short for its target residency.
380
+
381
+ The ``early_hits `` metric measures the likelihood that a given idle state will
382
+ "match" the observed (post-wakeup) idle duration if it does not "match" the
383
+ sleep length. It is subject to decay on every CPU wakeup and it is increased
384
+ when the idle state corresponding to it "matches" the observed (post-wakeup)
385
+ idle duration and the target residency of the next idle state is less than or
386
+ equal to the sleep length (i.e. the idle state "matching" the sleep length is
387
+ deeper than the given one).
388
+
389
+ The governor walks the list of idle states provided by the ``CPUIdle `` driver
390
+ and finds the last (deepest) one with the target residency less than or equal
391
+ to the sleep length. Then, the ``hits `` and ``misses `` metrics of that idle
392
+ state are compared with each other and it is preselected if the ``hits `` one is
393
+ greater (which means that that idle state is likely to "match" the observed idle
394
+ duration after CPU wakeup). If the ``misses `` one is greater, the governor
395
+ preselects the shallower idle state with the maximum ``early_hits `` metric
396
+ (or if there are multiple shallower idle states with equal ``early_hits ``
397
+ metric which also is the maximum, the shallowest of them will be preselected).
398
+ [If there is a wakeup latency constraint coming from the `PM QoS framework
399
+ <cpu-pm-qos_> `_ which is hit before reaching the deepest idle state with the
400
+ target residency within the sleep length, the deepest idle state with the exit
401
+ latency within the constraint is preselected without consulting the ``hits ``,
402
+ ``misses `` and ``early_hits `` metrics.]
403
+
404
+ Next, the governor takes several idle duration values observed most recently
405
+ into consideration and if at least a half of them are greater than or equal to
406
+ the target residency of the preselected idle state, that idle state becomes the
407
+ final candidate to ask for. Otherwise, the average of the most recent idle
408
+ duration values below the target residency of the preselected idle state is
409
+ computed and the governor walks the idle states shallower than the preselected
410
+ one and finds the deepest of them with the target residency within that average.
411
+ That idle state is then taken as the final candidate to ask for.
412
+
413
+ Still, at this point the governor may need to refine the idle state selection if
414
+ it has not decided to `stop the scheduler tick <idle-cpus-and-tick _>`_. That
415
+ generally happens if the target residency of the idle state selected so far is
416
+ less than the tick period and the tick has not been stopped already (in a
417
+ previous iteration of the idle loop). Then, like in the ``menu `` governor
418
+ `case <menu-gov _>`_, the sleep length used in the previous computations may not
419
+ reflect the real time until the closest timer event and if it really is greater
420
+ than that time, a shallower state with a suitable target residency may need to
421
+ be selected.
422
+
423
+
336
424
.. _idle-states-representation :
337
425
338
426
Representation of Idle States
0 commit comments