Skip to content

Commit 5a198b8

Browse files
committed
drm/i915: Do not overwrite the request with zero on reallocation
When using RCU lookup for the request, commit 0eafec6 ("drm/i915: Enable lockless lookup of request tracking via RCU"), we acknowledge that we may race with another thread that could have reallocated the request. In order for the first thread not to blow up, the second thread must not clear the request completed before overwriting it. In the RCU lookup, we allow for the engine/seqno to be replaced but we do not allow for it to be zeroed. The choice we make is to either add extra checking to the RCU lookup, or embrace the inherent races (as intended). It is more complicated as we need to manually clear everything we depend upon being zero initialised, but we benefit from not emiting the memset() to clear the entire frequently allocated structure (that memset turns up in throughput profiles). And at the same time, the lookup remains flexible for future adjustments. v2: Old style LRC requires another variable to be initialize. (The danger inherent in not zeroing everything.) v3: request->batch also needs to be cleared v4: signaling.tsk is no long used unset, but pid still exists Fixes: 0eafec6 ("drm/i915: Enable lockless lookup of request...") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Goel, Akash" <akash.goel@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1470731014-6894-2-git-send-email-chris@chris-wilson.co.uk
1 parent edf6b76 commit 5a198b8

File tree

2 files changed

+47
-1
lines changed

2 files changed

+47
-1
lines changed

drivers/gpu/drm/i915/i915_gem_request.c

Lines changed: 36 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -355,7 +355,35 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
355355
if (req && i915_gem_request_completed(req))
356356
i915_gem_request_retire(req);
357357

358-
req = kmem_cache_zalloc(dev_priv->requests, GFP_KERNEL);
358+
/* Beware: Dragons be flying overhead.
359+
*
360+
* We use RCU to look up requests in flight. The lookups may
361+
* race with the request being allocated from the slab freelist.
362+
* That is the request we are writing to here, may be in the process
363+
* of being read by __i915_gem_active_get_request_rcu(). As such,
364+
* we have to be very careful when overwriting the contents. During
365+
* the RCU lookup, we change chase the request->engine pointer,
366+
* read the request->fence.seqno and increment the reference count.
367+
*
368+
* The reference count is incremented atomically. If it is zero,
369+
* the lookup knows the request is unallocated and complete. Otherwise,
370+
* it is either still in use, or has been reallocated and reset
371+
* with fence_init(). This increment is safe for release as we check
372+
* that the request we have a reference to and matches the active
373+
* request.
374+
*
375+
* Before we increment the refcount, we chase the request->engine
376+
* pointer. We must not call kmem_cache_zalloc() or else we set
377+
* that pointer to NULL and cause a crash during the lookup. If
378+
* we see the request is completed (based on the value of the
379+
* old engine and seqno), the lookup is complete and reports NULL.
380+
* If we decide the request is not completed (new engine or seqno),
381+
* then we grab a reference and double check that it is still the
382+
* active request - which it won't be and restart the lookup.
383+
*
384+
* Do not use kmem_cache_zalloc() here!
385+
*/
386+
req = kmem_cache_alloc(dev_priv->requests, GFP_KERNEL);
359387
if (!req)
360388
return ERR_PTR(-ENOMEM);
361389

@@ -375,6 +403,13 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
375403
req->engine = engine;
376404
req->ctx = i915_gem_context_get(ctx);
377405

406+
/* No zalloc, must clear what we need by hand */
407+
req->previous_context = NULL;
408+
req->file_priv = NULL;
409+
req->batch_obj = NULL;
410+
req->pid = NULL;
411+
req->elsp_submitted = 0;
412+
378413
/*
379414
* Reserve space in the ring buffer for all the commands required to
380415
* eventually emit this request. This is to guarantee that the

drivers/gpu/drm/i915/i915_gem_request.h

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,13 @@ struct intel_signal_node {
5151
* emission time to be associated with the request for tracking how far ahead
5252
* of the GPU the submission is.
5353
*
54+
* When modifying this structure be very aware that we perform a lockless
55+
* RCU lookup of it that may race against reallocation of the struct
56+
* from the slab freelist. We intentionally do not zero the structure on
57+
* allocation so that the lookup can use the dangling pointers (and is
58+
* cogniscent that those pointers may be wrong). Instead, everything that
59+
* needs to be initialised must be done so explicitly.
60+
*
5461
* The requests are reference counted.
5562
*/
5663
struct drm_i915_gem_request {
@@ -458,6 +465,10 @@ __i915_gem_active_get_rcu(const struct i915_gem_active *active)
458465
* just report the active tracker is idle. If the new request is
459466
* incomplete, then we acquire a reference on it and check that
460467
* it remained the active request.
468+
*
469+
* It is then imperative that we do not zero the request on
470+
* reallocation, so that we can chase the dangling pointers!
471+
* See i915_gem_request_alloc().
461472
*/
462473
do {
463474
struct drm_i915_gem_request *request;

0 commit comments

Comments
 (0)