-
Notifications
You must be signed in to change notification settings - Fork 24.9k
[precompile] Add low level C API to load precompiled dynamo code on functions. #155329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…unctions. While loading deserialized dynamo states back from disk, precompile will need a direct way to access ExtraState and populate guarded bytecode as cache entries. This diff adds two API at code level to load precompiled guard + bytecode entries. 1. _load_precompile_entry() will append an entry to a precompile entry list per code object. This precompile entry will be looked up before normal compiled entries. 2. _reset_precompile_entries() will clean up all the installed existing entries. This is useful to prevent a case where user call loading multiple times and explode the number of entries on the list. Differential Revision: [D76083247](https://our.internmc.facebook.com/intern/diff/D76083247/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155329
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 5c0677a with merge base be2ad70 ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…unctions. While loading deserialized dynamo states back from disk, precompile will need a direct way to access ExtraState and populate guarded bytecode as cache entries. This diff adds two API at code level to load precompiled guard + bytecode entries. 1. _load_precompile_entry() will append an entry to a precompile entry list per code object. This precompile entry will be looked up before normal compiled entries. 2. _reset_precompile_entries() will clean up all the installed existing entries. This is useful to prevent a case where user call loading multiple times and explode the number of entries on the list. Differential Revision: [D76083247](https://our.internmc.facebook.com/intern/diff/D76083247/) ghstack-source-id: 288680553 Pull Request resolved: #155329
This pull request was exported from Phabricator. Differential Revision: D76083247 |
…o code on functions." While loading deserialized dynamo states back from disk, precompile will need a direct way to access ExtraState and populate guarded bytecode as cache entries. This diff adds two API at code level to load precompiled guard + bytecode entries. 1. _load_precompile_entry() will append an entry to a precompile entry list per code object. This precompile entry will be looked up before normal compiled entries. 2. _reset_precompile_entries() will clean up all the installed existing entries. This is useful to prevent a case where user call loading multiple times and explode the number of entries on the list. Differential Revision: [D76083247](https://our.internmc.facebook.com/intern/diff/D76083247/) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
…unctions. Pull Request resolved: #155329 While loading deserialized dynamo states back from disk, precompile will need a direct way to access ExtraState and populate guarded bytecode as cache entries. This diff adds two API at code level to load precompiled guard + bytecode entries. 1. _load_precompile_entry() will append an entry to a precompile entry list per code object. This precompile entry will be looked up before normal compiled entries. 2. _reset_precompile_entries() will clean up all the installed existing entries. This is useful to prevent a case where user call loading multiple times and explode the number of entries on the list. Differential Revision: [D76083247](https://our.internmc.facebook.com/intern/diff/D76083247/) ghstack-source-id: 288681062
This pull request was exported from Phabricator. Differential Revision: D76083247 |
…o code on functions." While loading deserialized dynamo states back from disk, precompile will need a direct way to access ExtraState and populate guarded bytecode as cache entries. This diff adds two API at code level to load precompiled guard + bytecode entries. 1. _load_precompile_entry() will append an entry to a precompile entry list per code object. This precompile entry will be looked up before normal compiled entries. 2. _reset_precompile_entries() will clean up all the installed existing entries. This is useful to prevent a case where user call loading multiple times and explode the number of entries on the list. Differential Revision: [D76083247](https://our.internmc.facebook.com/intern/diff/D76083247/) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
…unctions. Pull Request resolved: #155329 While loading deserialized dynamo states back from disk, precompile will need a direct way to access ExtraState and populate guarded bytecode as cache entries. This diff adds two API at code level to load precompiled guard + bytecode entries. 1. _load_precompile_entry() will append an entry to a precompile entry list per code object. This precompile entry will be looked up before normal compiled entries. 2. _reset_precompile_entries() will clean up all the installed existing entries. This is useful to prevent a case where user call loading multiple times and explode the number of entries on the list. ghstack-source-id: 288681418 Differential Revision: [D76083247](https://our.internmc.facebook.com/intern/diff/D76083247/)
This pull request was exported from Phabricator. Differential Revision: D76083247 |
Rebased |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test where, after loading the precompile entry, we pass in an input that does NOT pass the guard given by the guard manager? And then, in that case, I would expect the code to revert to the behavior of compiled_fn(x) rather than injected(x).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Accepting to unblock otherwise)
…o code on functions." While loading deserialized dynamo states back from disk, precompile will need a direct way to access ExtraState and populate guarded bytecode as cache entries. This diff adds two API at code level to load precompiled guard + bytecode entries. 1. _load_precompile_entry() will append an entry to a precompile entry list per code object. This precompile entry will be looked up before normal compiled entries. 2. _reset_precompile_entries() will clean up all the installed existing entries. This is useful to prevent a case where user call loading multiple times and explode the number of entries on the list. Differential Revision: [D76083247](https://our.internmc.facebook.com/intern/diff/D76083247/) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D76083247 |
Starting merge as part of PR stack under #155363 |
…om precompile bytecode. (#155363) In a precompiled bytecode, it looks like the following: ``` pre-graph bytecode ... compiled graph code ... post-graph bytecode ``` In pre-graph bytecode we have calls into helper functions like torch._dynamo.utils.call_size which will invoke @disable inside the bytecode. Normally torch.compile() will handle these frames fine, but for precompile we will load bytecode from a clean state of dynamo and we want a way to assert recompile never happen, so the current way to ensure this is by doing set_stance("fail_on_recompile") (open to any other idea to test this, but IMO this is the closest thing we have today). This approach doesn't work when util functions like call_size() is involved and this PR fixes a bunch of places to make sure "fail_on_recompile" can skip through the functions meant to be skipped during compilation. Differential Revision: [D76156867](https://our.internmc.facebook.com/intern/diff/D76156867/) Pull Request resolved: #155363 Approved by: https://github.com/jamesjwu, https://github.com/jansel ghstack dependencies: #155329
This pull request was exported from Phabricator. Differential Revision: D76083247 |
Stack from ghstack (oldest at bottom):
While loading deserialized dynamo states back from disk, precompile will need a direct way to access ExtraState and populate guarded bytecode as cache entries.
This diff adds two API at code level to load precompiled guard + bytecode entries.
Differential Revision: D76083247
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames