System and Devices Latency On Linux
System and Devices Latency On Linux
System and Devices Latency On Linux
Making Wireless
Introduction
Background
What is the 'latency' ?
There is some overhead when a part of the system goes to a low power mode in idle, both at suspend and resume times. The allowed latency needs to be taken into account when deciding the next low power state. 'A part of the system' = SW, HW SoC, HW external.
Making Wireless
Introduction
Background
What is the point of controlling the latency ?
The point is to dynamically optimize the power consumption of all system components. Knowing the allowed latency (from the constraints) and the expected worst-case latency allows to choose the optimum power state.
Making Wireless
Introduction
Terminology
Latency : time to react to an external event, e.g. time spent to execute the handler code after an IRQ, time spent to execute driver code from an external wake-up event. HW latency : latency introduced by the HW to transition between power states. SW latency : time for the SW to execute low power transition code, e.g. IP block save & restore, caches flush/invalidate etc. System : 'everything needed to execute the kernel code', e.g. on OMAP3, system = CPU0 + CORE (main memory, caches, IRQ controller...). Per-device latency : latency of a device (or peripheral). The per-device PM QoS framework allows to control the devices states from the allowed devices latency. Cpuidle : framework that controls the CPUs low power states (=C-states), from the allowed system latency. Note : Is being abused to control the system state. PM runtime : framework that allows the dynamic switching of resources. 4
Making Wireless
Introduction
OMAP SoC PM
Dynamic and hierarchical PM
The behavior of the voltage regulators and external oscillators depends on various system settings. The system settings can be dynamically controlled. E.g. OMAP <-> PMIC signals : SYS_CLKREQ, SYS_OFFMODE.
Making Wireless
System Engineering Linux Development Center
Making Wireless
System Engineering Linux Development Center
Making Wireless
Current model
Making Wireless
From [1] : measuring the timing and the current consumption (thanks to the TI PSI team!) leads to the following graph of the energy spent vs time :
Making Wireless
Identify the energy-wise interesting C-states and threshold values (C1, C3, C5, C9) Aggregate the timings results. From the various sources of data the following figures are derived for all C-states (timings in us).
Notes: produce the actual figures (to be used in the code) involves a lot of operations : interpolation, intersection (linear algebra) etc. 10
Making Wireless
11
Making Wireless
Since cpuidle only manages the MPU and CORE the wake-up latency values for the other power domains must be measured separately, by adjusting the target states of the power domains (in /debug/pm_debug/xxxx_pwrdm/suspend). The significative power domains latencies are derived from the measurements as follows:
12
Making Wireless
13
Making Wireless
Current model
14
Making Wireless
Problems
There is no concept of 'overall latency'. No interdependency between PM frameworks
Ex. on OMAP3 : cpuidle manages only a subset of the power domains (MPU, CORE). Ex. on OMAP3 per-device PM QoS manages the other power domains. No relation between the frameworks, each framework has its own latency numbers.
Mainly because of the (lack of) SW support at the time of the measurement session. Ex. On OMAP3 : voltage scaling in low power modes, sys_clkreq, sys_offmode and the interaction with the PowerIC.
The measured numbers are for a fixed setup, with predefined system settings. The measured numbers are constant.
15
Making Wireless
The code is not generic enough, only the omap_device code has the feature implemented. The self-measurement results are not used at all (excepted to issue a 'New worst case (de)activate latency' debug message).
The measurement procedure needs to be re-run for every different HW (or possibly SW) setup. Measuring the latency of all power domains is difficult : take measurements, derive energy graphs, calculate intersections, adapt to missing key parameters etc.
16
Making Wireless
Solution proposal
Overall latency calculation
We need a model which breaks down the overall latency into the latencies from every contributor :
latencySoC = time for the SoC HW to change an IP block state. Includes the Power Domain state transition, DPLL stop/relock etc.
latencyExternal HW = time to stop/restart the external HW. Ex : external crystal oscillator, external power supply etc.
Note : every latency factor might be divided into smaller factors. E.g. : On OMAP a DPLL can feed multiple power domains. 17
ELC 2012 A new model for system and devices latency
Making Wireless
New model
18
Making Wireless
From the model, derive the independent factors for the overall latency. Differentiate the fixed factors from the variable ones (i.e. At HW level a power domain transition worst-case latency is fixed).
Note : Which data to pass from board files or DT ? Cf. Discussions on l-a-k & l-o MLs.
Introduce functions to calculate the devices and power domains worst case latency
Clean-up of the code that directly touches the HW settings which have an impact on the overall latency. When a HW setting is touched, re-calculate the overall worst case latency.
19
Making Wireless
20
Making Wireless
1. provide a reference implementation using the OMAP code, 2. bring the concept of multiple power domains states in the generic framework, 3. change OMAP code to use the generic power domains, 4. repeat 2-3 for clocks (hint : common clock framework) and voltages, 5. port the self-measurement feature in generic code (runtime PM)
21
Making Wireless
22
Making Wireless
Next steps
23
Making Wireless
Links
Omappedia wiki
PM debug & profiling
http://www.omappedia.org/wiki/Power_Management_Debug_and_Profiling
Making Wireless
System Engineering Linux Development Center
25
Making Wireless
System Engineering Linux Development Center
26
Making Wireless
System Engineering Linux Development Center
Back-up slides
27
Making Wireless
From [1] : measuring the timing and the current consumption (thanks to the TI PSI team!) leads to the following graph of the energy spent vs time :
28
Making Wireless
Taking the minimum energy from the graph allows to identify the 4 energy-wise interesting C-states: C1, C3, C5, C9 and the threshold time for those C-states to be efficient. Aggregated timings results From the various sources of data the following figures are derived for all C-states (timings in us).
Notes: The power efficient C-states are identifed as C1, C3, C5, C7 (1) When not measured, the threshold value equals to the next power efficient C-state (2) The threshold value is derived using the intersection of C3 and C4 in the graph (3) No sys_clkoff is supported, this value need to be corrected (4) Addition of HW and SW parts, using [2] (5) The threshold value calculation is the intersection of the lines in the graph, using linear algebra
29
Making Wireless
30