How to integrate with PyTorch OSS benchmark database

Prerequisite

The OSS benchmark database is located on ClickHouse Cloud at https://console.clickhouse.cloud under benchmark database and oss_ci_benchmark_v3 table. It's a good idea to try login to ClickHouse Cloud and do some simple queries to understand the data there

Login to https://console.clickhouse.cloud. For metamates, you can login using your meta email using SSO and request access. Read-only access will be granted by default.
Select benchmark database
Run a sample query:

select
    head_branch,
    head_sha,
    benchmark,
    model.name as model,
    metric.name as name,
    arrayAvg(metric.benchmark_values) as value
from
    oss_ci_benchmark_v3
where
    tupleElement(benchmark, 'name') = 'TorchAO benchmark'
    and oss_ci_benchmark_v3.timestamp < 1733870813
    and oss_ci_benchmark_v3.timestamp > 1733784413

Output format

Your benchmark results need to be a list of metrics in the following format. All fields are optional unless specified otherwise.

// The list of all benchmark metrics
[
  {
    // Information about the benchmark
    benchmark: Tuple(
      name,  // Required. The name of the benchmark
      mode,  // Training or inference
      dtype,  // The dtype used by the benchmark
      extra_info: {},  // Any additional information about the benchmark
    ),

    // Information about the model or the test
    model: Tuple (
      name,  // Required. The model or the test name
      type,  // Additional information, for example is this a HF model or a micro-benchmark custom layer
      backend,  // Any delegation backend used here, i.e. XNNPACK
      origins,  // Tell us where this is from, i.e. HF
      extra_info: {},  // Any additional information about the model or the test
    ),

    // Information about the benchmark result
    metric: Tuple(
      name,  // Required. The name of the metric. It's a good practice to include its unit here too, i.e. compilation_time(ms)
      benchmark_values,  // Float. Required. The metric values. It's a list here because a benchmark is usually run multiple times
      target_value,  // Float. The optional target value used to indicate if there is a regression
      extra_info: {},  // Any additional information about the benchmark result
    ),

    // Optional information about any inputs used by the benchmark
    inputs: {
      name: Tuple(
        dtype,  // The dtype of the input
        extra_info: {},  // Any additional information about the input
      )
    },
  },

  {
    ... Same structure as the first record
  },
  ...
]

Note that the JSON list is optional. Writing JSON record one per line (JSONEachRow) is also accepted.

Run the benchmark and upload the results

Prerequisites

If you are using PyTorch AWS self-hosted runners, they already have the permission to upload the benchmark results. No need to prepare anything else.
If you are using something else (non-AWS), for example ROCm runners. Please reach out to PyTorch Dev Infra team (poc @huydhn) to create a GitHub environment with permission to write to S3. The environment is called upload-benchmark-results. For example, android-perf.yml

A sample job on AWS self-hosted runners

name: A sample benchmark job that runs on all main commits
on:
  push:
    - main

jobs:
  benchmark:
    runs-on: linux.2xlarge
    steps:
      - uses: actions/checkout@v3

      - name: Run your own benchmark logic
        shell: bash
        run: |
          set -eux

          # Run your benchmark script and write the result to benchmark-results.json whose format is defined in the previous section
          python run_my_benchmark_script.py > ${{ runner.temp }}/benchmark-results/benchmark-results.json

          # It's also ok to write the results into multiple JSON files, for example
          python run_my_benchmark_script.py --output-dir ${{ runner.temp }}/benchmark-results

     - name: Upload the benchmark results to OSS benchmark database for the dashboard
       uses: pytorch/test-infra/.github/actions/upload-benchmark-results@main
       with:
         benchmark-results-dir: ${{ runner.temp }}/benchmark-results
         dry-run: false
         schema-version: v3
         github-token: ${{ secrets.GITHUB_TOKEN }}

A sample job on non-AWS runners

name: A sample benchmark job that runs on all main commits
on:
  push:
    - main

jobs:
  benchmark:
    runs-on: linux.rocm.gpu.2  // An example non-AWS runner
    environment: upload-benchmark-results  // The environment has write access S3 to upload the results
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: actions/checkout@v3

      - name: Authenticate with AWS
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::308535385114:role/gha_workflow_upload-benchmark-results
          # The max duration enforced by the server side
          role-duration-seconds: 18000
          aws-region: us-east-1

      - name: Run your own benchmark logic
        shell: bash
        run: |
          # Run your benchmark script and write the result to benchmark-results.json whose format is defined in the previous section
          python run_my_benchmark_script.py > ${{ runner.temp }}/benchmark-results/benchmark-results.json

          # It's also ok to write the results into multiple JSON files, for example
          python run_my_benchmark_script.py --output-dir ${{ runner.temp }}/benchmark-results

     - name: Upload the benchmark results to OSS benchmark database for the dashboard
       uses: pytorch/test-infra/.github/actions/upload-benchmark-results@main
       with:
         benchmark-results-dir: ${{ runner.temp }}/benchmark-results
         dry-run: false
         schema-version: v3
         github-token: ${{ secrets.GITHUB_TOKEN }}

I would love to contribute to PyTorch!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly