Skip to content

[precompile] Add low level C API to load precompiled dynamo code on functions. #155329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

zhxchen17
Copy link
Contributor

@zhxchen17 zhxchen17 commented Jun 6, 2025

Stack from ghstack (oldest at bottom):

While loading deserialized dynamo states back from disk, precompile will need a direct way to access ExtraState and populate guarded bytecode as cache entries.

This diff adds two API at code level to load precompiled guard + bytecode entries.

  1. _load_precompile_entry() will append an entry to a precompile entry list per code object. This precompile entry will be looked up before normal compiled entries.
  2. _reset_precompile_entries() will clean up all the installed existing entries. This is useful to prevent a case where user call loading multiple times and explode the number of entries on the list.

Differential Revision: D76083247

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames

…unctions.

While loading deserialized dynamo states back from disk, precompile will need a direct way to access ExtraState and populate guarded bytecode as cache entries.

This diff adds two API at code level to load precompiled guard + bytecode entries.
1. _load_precompile_entry() will append an entry to a precompile entry list per code object. This precompile entry will be looked up before normal compiled entries.
2. _reset_precompile_entries() will clean up all the installed existing entries. This is useful to prevent a case where user call loading multiple times and explode the number of entries on the list.

Differential Revision: [D76083247](https://our.internmc.facebook.com/intern/diff/D76083247/)

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Jun 6, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155329

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 5c0677a with merge base be2ad70 (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

zhxchen17 added a commit that referenced this pull request Jun 6, 2025
…unctions.

While loading deserialized dynamo states back from disk, precompile will need a direct way to access ExtraState and populate guarded bytecode as cache entries.

This diff adds two API at code level to load precompiled guard + bytecode entries.
1. _load_precompile_entry() will append an entry to a precompile entry list per code object. This precompile entry will be looked up before normal compiled entries.
2. _reset_precompile_entries() will clean up all the installed existing entries. This is useful to prevent a case where user call loading multiple times and explode the number of entries on the list.

Differential Revision: [D76083247](https://our.internmc.facebook.com/intern/diff/D76083247/)

ghstack-source-id: 288680553
Pull Request resolved: #155329
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76083247

…o code on functions."

While loading deserialized dynamo states back from disk, precompile will need a direct way to access ExtraState and populate guarded bytecode as cache entries.

This diff adds two API at code level to load precompiled guard + bytecode entries.
1. _load_precompile_entry() will append an entry to a precompile entry list per code object. This precompile entry will be looked up before normal compiled entries.
2. _reset_precompile_entries() will clean up all the installed existing entries. This is useful to prevent a case where user call loading multiple times and explode the number of entries on the list.

Differential Revision: [D76083247](https://our.internmc.facebook.com/intern/diff/D76083247/)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames

[ghstack-poisoned]
zhxchen17 added a commit that referenced this pull request Jun 6, 2025
…unctions.

Pull Request resolved: #155329

While loading deserialized dynamo states back from disk, precompile will need a direct way to access ExtraState and populate guarded bytecode as cache entries.

This diff adds two API at code level to load precompiled guard + bytecode entries.
1. _load_precompile_entry() will append an entry to a precompile entry list per code object. This precompile entry will be looked up before normal compiled entries.
2. _reset_precompile_entries() will clean up all the installed existing entries. This is useful to prevent a case where user call loading multiple times and explode the number of entries on the list.

Differential Revision: [D76083247](https://our.internmc.facebook.com/intern/diff/D76083247/)
ghstack-source-id: 288681062
@zhxchen17 zhxchen17 added the topic: not user facing topic category label Jun 6, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76083247

…o code on functions."

While loading deserialized dynamo states back from disk, precompile will need a direct way to access ExtraState and populate guarded bytecode as cache entries.

This diff adds two API at code level to load precompiled guard + bytecode entries.
1. _load_precompile_entry() will append an entry to a precompile entry list per code object. This precompile entry will be looked up before normal compiled entries.
2. _reset_precompile_entries() will clean up all the installed existing entries. This is useful to prevent a case where user call loading multiple times and explode the number of entries on the list.

Differential Revision: [D76083247](https://our.internmc.facebook.com/intern/diff/D76083247/)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames

[ghstack-poisoned]
zhxchen17 added a commit that referenced this pull request Jun 6, 2025
…unctions.

Pull Request resolved: #155329

While loading deserialized dynamo states back from disk, precompile will need a direct way to access ExtraState and populate guarded bytecode as cache entries.

This diff adds two API at code level to load precompiled guard + bytecode entries.
1. _load_precompile_entry() will append an entry to a precompile entry list per code object. This precompile entry will be looked up before normal compiled entries.
2. _reset_precompile_entries() will clean up all the installed existing entries. This is useful to prevent a case where user call loading multiple times and explode the number of entries on the list.
ghstack-source-id: 288681418

Differential Revision: [D76083247](https://our.internmc.facebook.com/intern/diff/D76083247/)
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76083247

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Rebased gh/zhxchen17/24/orig onto refs/remotes/origin/main because #155363 was rebased, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/155329)

Copy link
Contributor

@jamesjwu jamesjwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test where, after loading the precompile entry, we pass in an input that does NOT pass the guard given by the guard manager? And then, in that case, I would expect the code to revert to the behavior of compiled_fn(x) rather than injected(x).

Copy link
Contributor

@jamesjwu jamesjwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Accepting to unblock otherwise)

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 6, 2025
…o code on functions."

While loading deserialized dynamo states back from disk, precompile will need a direct way to access ExtraState and populate guarded bytecode as cache entries.

This diff adds two API at code level to load precompiled guard + bytecode entries.
1. _load_precompile_entry() will append an entry to a precompile entry list per code object. This precompile entry will be looked up before normal compiled entries.
2. _reset_precompile_entries() will clean up all the installed existing entries. This is useful to prevent a case where user call loading multiple times and explode the number of entries on the list.

Differential Revision: [D76083247](https://our.internmc.facebook.com/intern/diff/D76083247/)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76083247

@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #155363

pytorchmergebot pushed a commit that referenced this pull request Jun 10, 2025
…om precompile bytecode. (#155363)

In a precompiled bytecode, it looks like the following:
```
pre-graph bytecode
...
compiled graph code
...
post-graph bytecode
```

In pre-graph bytecode we have calls into helper functions like torch._dynamo.utils.call_size which will invoke @disable inside the bytecode.

Normally torch.compile() will handle these frames fine, but for precompile we will load bytecode from a clean state of dynamo and we want a way to assert recompile never happen, so the current way to ensure this is by doing set_stance("fail_on_recompile") (open to any other idea to test this, but IMO this is the closest thing we have today).

This approach doesn't work when util functions like call_size() is involved and this PR fixes a bunch of places to make sure "fail_on_recompile" can skip through the functions meant to be skipped during compilation.

Differential Revision: [D76156867](https://our.internmc.facebook.com/intern/diff/D76156867/)

Pull Request resolved: #155363
Approved by: https://github.com/jamesjwu, https://github.com/jansel
ghstack dependencies: #155329
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76083247

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants