Skip to content

Commit ca5b721

Browse files
ickledanvet
authored andcommitted
drm/i915: Limit the busy wait on requests to 5us not 10ms!
When waiting for high frequency requests, the finite amount of time required to set up the irq and wait upon it limits the response rate. By busywaiting on the request completion for a short while we can service the high frequency waits as quick as possible. However, if it is a slow request, we want to sleep as quickly as possible. The tradeoff between waiting and sleeping is roughly the time it takes to sleep on a request, on the order of a microsecond. Based on measurements of synchronous workloads from across big core and little atom, I have set the limit for busywaiting as 10 microseconds. In most of the synchronous cases, we can reduce the limit down to as little as 2 miscroseconds, but that leaves quite a few test cases regressing by factors of 3 and more. The code currently uses the jiffie clock, but that is far too coarse (on the order of 10 milliseconds) and results in poor interactivity as the CPU ends up being hogged by slow requests. To get microsecond resolution we need to use a high resolution timer. The cheapest of which is polling local_clock(), but that is only valid on the same CPU. If we switch CPUs because the task was preempted, we can also use that as an indicator that the system is too busy to waste cycles on spinning and we should sleep instead. __i915_spin_request was introduced in commit 2def4ad [v4.2] Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Apr 7 16:20:41 2015 +0100 drm/i915: Optimistically spin for the request completion v2: Drop full u64 for unsigned long - the timer is 32bit wraparound safe, so we can use native register sizes on smaller architectures. Mention the approximate microseconds units for elapsed time and add some extra comments describing the reason for busywaiting. v3: Raise the limit to 10us v4: Now 5us. Reported-by: Jens Axboe <axboe@kernel.dk> Link: https://lkml.org/lkml/2015/11/12/621 Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Eero Tamminen <eero.t.tamminen@intel.com> Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1449833608-22125-3-git-send-email-chris@chris-wilson.co.uk
1 parent 91b0c35 commit ca5b721

File tree

1 file changed

+45
-2
lines changed

1 file changed

+45
-2
lines changed

drivers/gpu/drm/i915/i915_gem.c

Lines changed: 45 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1146,22 +1146,65 @@ static bool missed_irq(struct drm_i915_private *dev_priv,
11461146
return test_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings);
11471147
}
11481148

1149+
static unsigned long local_clock_us(unsigned *cpu)
1150+
{
1151+
unsigned long t;
1152+
1153+
/* Cheaply and approximately convert from nanoseconds to microseconds.
1154+
* The result and subsequent calculations are also defined in the same
1155+
* approximate microseconds units. The principal source of timing
1156+
* error here is from the simple truncation.
1157+
*
1158+
* Note that local_clock() is only defined wrt to the current CPU;
1159+
* the comparisons are no longer valid if we switch CPUs. Instead of
1160+
* blocking preemption for the entire busywait, we can detect the CPU
1161+
* switch and use that as indicator of system load and a reason to
1162+
* stop busywaiting, see busywait_stop().
1163+
*/
1164+
*cpu = get_cpu();
1165+
t = local_clock() >> 10;
1166+
put_cpu();
1167+
1168+
return t;
1169+
}
1170+
1171+
static bool busywait_stop(unsigned long timeout, unsigned cpu)
1172+
{
1173+
unsigned this_cpu;
1174+
1175+
if (time_after(local_clock_us(&this_cpu), timeout))
1176+
return true;
1177+
1178+
return this_cpu != cpu;
1179+
}
1180+
11491181
static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
11501182
{
11511183
unsigned long timeout;
1184+
unsigned cpu;
1185+
1186+
/* When waiting for high frequency requests, e.g. during synchronous
1187+
* rendering split between the CPU and GPU, the finite amount of time
1188+
* required to set up the irq and wait upon it limits the response
1189+
* rate. By busywaiting on the request completion for a short while we
1190+
* can service the high frequency waits as quick as possible. However,
1191+
* if it is a slow request, we want to sleep as quickly as possible.
1192+
* The tradeoff between waiting and sleeping is roughly the time it
1193+
* takes to sleep on a request, on the order of a microsecond.
1194+
*/
11521195

11531196
if (i915_gem_request_get_ring(req)->irq_refcount)
11541197
return -EBUSY;
11551198

1156-
timeout = jiffies + 1;
1199+
timeout = local_clock_us(&cpu) + 5;
11571200
while (!need_resched()) {
11581201
if (i915_gem_request_completed(req, true))
11591202
return 0;
11601203

11611204
if (signal_pending_state(state, current))
11621205
break;
11631206

1164-
if (time_after_eq(jiffies, timeout))
1207+
if (busywait_stop(timeout, cpu))
11651208
break;
11661209

11671210
cpu_relax_lowlatency();

0 commit comments

Comments
 (0)