Recheck Autotune cache on Precompile serialization to prune compilation results #158656

jamesjwu · 2025-07-18T17:05:14Z

Stack from ghstack (oldest at bottom):

-> Recheck Autotune cache on Precompile serialization to prune compilation results #158656

This PR rechecks the autotune cache on Precompile.serialize(), allowing us to ahead of time save autotune results for statically compiled triton kernels, so that warm start does not need to check the autotune cache.

It has a few extra changes to make this work:

Storing source code in TritonBundler

We now store the source_code for statically compiled triton kernels instead of the hash of the source code in TritonBundler, so that we can easily access their source code when rechecking the autotune cache on PrecompileContext.serialize. To make sure that this is not a huge space concern, I ran the entire hugging face benchmark on training. The total space of /tmp/torchinductor_jjwu/fxgraph before my change was 1185004 KB (1.18 GB). After my change, this increased to 1207312 KB (1.2 GB), for an increased storage cost of ~1.8%, which seems safe.
We now return early from recheck_autotune_cache if the number of triton kernels being compiled is 1, since there's no reason to check the cache at all in those cases.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Lucaskabela

[ghstack-poisoned]

pytorch-bot · 2025-07-18T17:05:18Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158656

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ghstack-mergeability-check and Check labels failing with 'Resource not accessible by integration'

❌ 2 New Failures, 1 Unrelated Failure

As of commit 00b148e with merge base 50eac81 ():

NEW FAILURES - The following jobs have failed:

Check Labels / Check labels (gh)
RuntimeError: GraphQL query
Check mergeability of ghstack PR / ghstack-mergeability-check (gh)
RuntimeError: GraphQL query

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, lf.linux.12xlarge, unstable) (gh) (#158876)
/var/lib/jenkins/workspace/xla/torch_xla/csrc/runtime/BUILD:476:14: Compiling torch_xla/csrc/runtime/xla_util_test.cpp failed: (Exit 1): gcc failed: error executing CppCompile command (from target //torch_xla/csrc/runtime:xla_util_test) /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 229 arguments skipped)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…on results ghstack-source-id: 8ddcb40 Pull-Request: #158656

…e compilation results" This PR rechecks the autotune cache on Precompile.serialize(), allowing us to ahead of time save autotune results for statically compiled triton kernels, so that warm start does not need to check the autotune cache. It has a few extra changes to make this work: ### Storing source code in TritonBundler - We now store the source_code for statically compiled triton kernels instead of the hash of the source code in TritonBundler, so that we can easily access their source code when rechecking the autotune cache on PrecompileContext.serialize. To make sure that this is not a huge space concern, I ran the entire hugging face benchmark on training. The total space of `/tmp/torchinductor_jjwu/fxgraph` before my change was 1185004 KB (1.18 GB). After my change, this increased to 1207312 KB (1.2 GB), for an increased storage cost of ~1.8%, which seems safe. - We now return early from recheck_autotune_cache if the number of triton kernels being compiled is 1, since there's no reason to check the cache at all in those cases. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben Lucaskabela [ghstack-poisoned]

…on results ghstack-source-id: e746b11 Pull-Request: #158656

jamesjwu · 2025-08-04T23:58:43Z

@pytorchbot rebase

pytorchmergebot · 2025-08-05T00:00:14Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

[ghstack-poisoned]

pytorchmergebot · 2025-08-05T00:00:26Z

Successfully rebased gh/jamesjwu/176/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/158656)

…on results ghstack-source-id: 32aaec2 Pull-Request: #158656

Update

261414d

[ghstack-poisoned]

jamesjwu mentioned this pull request Jul 18, 2025

Still run TritonBundler with BundledAOTAutogradCache, save autotune results #158048

Closed

jamesjwu added a commit that referenced this pull request Jul 18, 2025

Recheck Autotune cache on Precompile serialization to prune compilati…

40936f9

…on results ghstack-source-id: 8ddcb40 Pull-Request: #158656

pytorch-bot bot added ciflow/inductor module: dynamo module: inductor labels Jul 18, 2025

jamesjwu mentioned this pull request Jul 18, 2025

[Precompile] [easy] API For Editable PrecompileCacheArtifacts #158586

Closed

jamesjwu added the topic: not user facing topic category label Jul 18, 2025

jamesjwu requested review from oulgen and zhxchen17 July 21, 2025 13:35

jamesjwu marked this pull request as ready for review July 21, 2025 13:36

jamesjwu requested a review from bdhirsh as a code owner July 21, 2025 13:36

jamesjwu added a commit that referenced this pull request Jul 24, 2025

Recheck Autotune cache on Precompile serialization to prune compilati…

060e331

…on results ghstack-source-id: e746b11 Pull-Request: #158656

Update

00b148e

[ghstack-poisoned]

pytorchmergebot pushed a commit that referenced this pull request Aug 5, 2025

Recheck Autotune cache on Precompile serialization to prune compilati…

7773cf7

…on results ghstack-source-id: 32aaec2 Pull-Request: #158656

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recheck Autotune cache on Precompile serialization to prune compilation results #158656

Recheck Autotune cache on Precompile serialization to prune compilation results #158656

Uh oh!

jamesjwu commented Jul 18, 2025 •

edited by pytorchmergebot

Loading

Uh oh!

pytorch-bot bot commented Jul 18, 2025 •

edited

Loading

Uh oh!

jamesjwu commented Aug 4, 2025

Uh oh!

pytorchmergebot commented Aug 5, 2025

Uh oh!

pytorchmergebot commented Aug 5, 2025

Uh oh!

Uh oh!

Recheck Autotune cache on Precompile serialization to prune compilation results #158656

Are you sure you want to change the base?

Recheck Autotune cache on Precompile serialization to prune compilation results #158656

Uh oh!

Conversation

jamesjwu commented Jul 18, 2025 • edited by pytorchmergebot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Storing source code in TritonBundler

Uh oh!

pytorch-bot bot commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158656

❗ 1 Active SEVs

❌ 2 New Failures, 1 Unrelated Failure

Uh oh!

jamesjwu commented Aug 4, 2025

Uh oh!

pytorchmergebot commented Aug 5, 2025

Uh oh!

pytorchmergebot commented Aug 5, 2025

Uh oh!

Uh oh!

jamesjwu commented Jul 18, 2025 •

edited by pytorchmergebot

Loading

pytorch-bot bot commented Jul 18, 2025 •

edited

Loading