Skip to content

Commit b46a33e

Browse files
committed
drm/i915/pmu: Expose a PMU interface for perf queries
From: Chris Wilson <chris@chris-wilson.co.uk> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> From: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> The first goal is to be able to measure GPU (and invidual ring) busyness without having to poll registers from userspace. (Which not only incurs holding the forcewake lock indefinitely, perturbing the system, but also runs the risk of hanging the machine.) As an alternative we can use the perf event counter interface to sample the ring registers periodically and send those results to userspace. Functionality we are exporting to userspace is via the existing perf PMU API and can be exercised via the existing tools. For example: perf stat -a -e i915/rcs0-busy/ -I 1000 Will print the render engine busynnes once per second. All the performance counters can be enumerated (perf list) and have their unit of measure correctly reported in sysfs. v1-v2 (Chris Wilson): v2: Use a common timer for the ring sampling. v3: (Tvrtko Ursulin) * Decouple uAPI from i915 engine ids. * Complete uAPI defines. * Refactor some code to helpers for clarity. * Skip sampling disabled engines. * Expose counters in sysfs. * Pass in fake regs to avoid null ptr deref in perf core. * Convert to class/instance uAPI. * Use shared driver code for rc6 residency, power and frequency. v4: (Dmitry Rogozhkin) * Register PMU with .task_ctx_nr=perf_invalid_context * Expose cpumask for the PMU with the single CPU in the mask * Properly support pmu->stop(): it should call pmu->read() * Properly support pmu->del(): it should call stop(event, PERF_EF_UPDATE) * Introduce refcounting of event subscriptions. * Make pmu.busy_stats a refcounter to avoid busy stats going away with some deleted event. * Expose cpumask for i915 PMU to avoid multiple events creation of the same type followed by counter aggregation by perf-stat. * Track CPUs getting online/offline to migrate perf context. If (likely) cpumask will initially set CPU0, CONFIG_BOOTPARAM_HOTPLUG_CPU0 will be needed to see effect of CPU status tracking. * End result is that only global events are supported and perf stat works correctly. * Deny perf driver level sampling - it is prohibited for uncore PMU. v5: (Tvrtko Ursulin) * Don't hardcode number of engine samplers. * Rewrite event ref-counting for correctness and simplicity. * Store initial counter value when starting already enabled events to correctly report values to all listeners. * Fix RC6 residency readout. * Comments, GPL header. v6: * Add missing entry to v4 changelog. * Fix accounting in CPU hotplug case by copying the approach from arch/x86/events/intel/cstate.c. (Dmitry Rogozhkin) v7: * Log failure message only on failure. * Remove CPU hotplug notification state on unregister. v8: * Fix error unwind on failed registration. * Checkpatch cleanup. v9: * Drop the energy metric, it is available via intel_rapl_perf. (Ville Syrjälä) * Use HAS_RC6(p). (Chris Wilson) * Handle unsupported non-engine events. (Dmitry Rogozhkin) * Rebase for intel_rc6_residency_ns needing caller managed runtime pm. * Drop HAS_RC6 checks from the read callback since creating those events will be rejected at init time already. * Add counter units to sysfs so perf stat output is nicer. * Cleanup the attribute tables for brevity and readability. v10: * Fixed queued accounting. v11: * Move intel_engine_lookup_user to intel_engine_cs.c * Commit update. (Joonas Lahtinen) v12: * More accurate sampling. (Chris Wilson) * Store and report frequency in MHz for better usability from perf stat. * Removed metrics: queued, interrupts, rc6 counters. * Sample engine busyness based on seqno difference only for less MMIO (and forcewake) on all platforms. (Chris Wilson) v13: * Comment spelling, use mul_u32_u32 to work around potential GCC issue and somne code alignment changes. (Chris Wilson) v14: * Rebase. v15: * Rebase for RPS refactoring. v16: * Use the dynamic slot in the CPU hotplug state machine so that we are free to setup our state as multi-instance. Previously we were re-using the CPUHP_AP_PERF_X86_UNCORE_ONLINE slot which is neither used as multi-instance, nor owned by our driver to start with. * Register the CPU hotplug handlers after the PMU, otherwise the callback will get called before the PMU is initialized which can end up in perf_pmu_migrate_context with an un-initialized base. * Added workaround for a probable bug in cpuhp core. v17: * Remove workaround for the cpuhp bug. v18: * Rebase for drm_i915_gem_engine_class getting upstream before us. v19: * Rebase. (trivial) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-2-tvrtko.ursulin@linux.intel.com
1 parent c84b270 commit b46a33e

File tree

9 files changed

+902
-0
lines changed

9 files changed

+902
-0
lines changed

drivers/gpu/drm/i915/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ i915-y := i915_drv.o \
4545

4646
i915-$(CONFIG_COMPAT) += i915_ioc32.o
4747
i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o intel_pipe_crc.o
48+
i915-$(CONFIG_PERF_EVENTS) += i915_pmu.o
4849

4950
# GEM code
5051
i915-y += i915_cmd_parser.o \

drivers/gpu/drm/i915/i915_drv.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@
4848

4949
#include "i915_drv.h"
5050
#include "i915_trace.h"
51+
#include "i915_pmu.h"
5152
#include "i915_vgpu.h"
5253
#include "intel_drv.h"
5354
#include "intel_uc.h"
@@ -1215,6 +1216,7 @@ static void i915_driver_register(struct drm_i915_private *dev_priv)
12151216
struct drm_device *dev = &dev_priv->drm;
12161217

12171218
i915_gem_shrinker_init(dev_priv);
1219+
i915_pmu_register(dev_priv);
12181220

12191221
/*
12201222
* Notify a valid surface after modesetting,
@@ -1269,6 +1271,7 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
12691271
intel_opregion_unregister(dev_priv);
12701272

12711273
i915_perf_unregister(dev_priv);
1274+
i915_pmu_unregister(dev_priv);
12721275

12731276
i915_teardown_sysfs(dev_priv);
12741277
i915_guc_log_unregister(dev_priv);

drivers/gpu/drm/i915/i915_drv.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@
4040
#include <linux/hash.h>
4141
#include <linux/intel-iommu.h>
4242
#include <linux/kref.h>
43+
#include <linux/perf_event.h>
4344
#include <linux/pm_qos.h>
4445
#include <linux/reservation.h>
4546
#include <linux/shmem_fs.h>
@@ -2290,6 +2291,8 @@ struct drm_i915_private {
22902291
struct i915_gem_context *kernel_context;
22912292
/* Context only to be used for injecting preemption commands */
22922293
struct i915_gem_context *preempt_context;
2294+
struct intel_engine_cs *engine_class[MAX_ENGINE_CLASS + 1]
2295+
[MAX_ENGINE_INSTANCE + 1];
22932296

22942297
struct drm_dma_handle *status_page_dmah;
22952298
struct resource mch_res;
@@ -2761,6 +2764,8 @@ struct drm_i915_private {
27612764
int irq;
27622765
} lpe_audio;
27632766

2767+
struct i915_pmu pmu;
2768+
27642769
/*
27652770
* NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch
27662771
* will be rejected. Instead look for a better place.

0 commit comments

Comments
 (0)