Add support for param mutation under inference mode #159661

tugsbayasgalan · 2025-08-01T18:28:18Z

Summary:
In HF model rwkv, we have parameter mutation under inference mode which should be safe. This PR does multiple things to make sure it works:

We execute global autograd mutation while tracing so that we can actually trace through parameter inplace mutation
Add support for parameter mutation under inference mode in AOTAutograd
Add support for parameter mutation under inference mode in export.

Test Plan:
test

Rollback Plan:

Differential Revision: D79460136

cc @ezyang @SherlockNoMad @EikanWang @jgong5 @wenzhe-nrv

pytorch-bot · 2025-08-01T18:28:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159661

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 3 Pending, 1 Unrelated Failure

As of commit 1412ce7 with merge base bfc873d ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

inductor / unit-test / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_avx2, 1, 2, linux.10xlarge.avx2) (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-08-01T18:28:29Z

This pull request was exported from Phabricator. Differential Revision: D79460136

tugsbayasgalan · 2025-08-01T18:30:58Z

torch/_functorch/_aot_autograd/input_output_analysis.py

@@ -460,6 +460,7 @@ def create_graph_signature(
        named_buffers=buffer_names,
        num_user_inputs=num_user_args,
        num_user_outputs=num_user_fw_outs,
+        trace_joint=trace_joint,


cc: @ezyang does this look ok in terms of your refactoring work?

The change looks reversible enough, but I can't really say I understand why you need yet another flag here. Additionally in your tests you are not even tracing out the joint!

This is because in the graph signature, we need to distinguish between joint IR vs inference IR. In the joint IR case, we can't have mutations on parameters.

One thing I'll say is that for a generic "this works when torch.compile works" export frontend, these cases also need to be accounted for.

tugsbayasgalan · 2025-08-01T18:32:01Z

torch/fx/experimental/proxy_tensor.py

@@ -1430,9 +1430,11 @@ def __torch_function__(
                torch.amp.autocast_mode._exit_autocast,
            ]:
                node.meta["val"] = None
+            # For autocast, the python APIs run so we don't have to run them again 
+            # here. 
+            if func is torch._C._set_grad_enabled:


cc: @zou3519, @bdhirsh This is behavioral change from before.

er what is going on here?

We are actually running the state change instead of just proxy-ing so that param mutation under no-grad still works.

# Don't actually run the function! We just want to trace the calls # into a graph. We don't actually want to change global autograd state.

So, at the very least, pre-dispatch tracing should not change the state after it runs. If there is only one single torch.set_grad_enabled(False) call in the graph, after one does the pre-dispatch trace, it should not be off.

Yep i wrapped it in ctx manager under export.

Summary: In HF model rwkv, we have parameter mutation under inference mode which should be safe. This PR does multiple things to make sure it works: 1. We execute global autograd mutation while tracing so that we can actually trace through parameter inplace mutation 2. Add support for parameter mutation under inference mode in AOTAutograd 3. Add support for parameter mutation under inference mode in export. Test Plan: test Rollback Plan: Differential Revision: D79460136

facebook-github-bot · 2025-08-03T21:17:45Z

This pull request was exported from Phabricator. Differential Revision: D79460136

zou3519

your tests don't look happy

test/functorch/test_aotdispatch.py

ydwu4 · 2025-08-04T18:57:38Z

test/export/test_export.py

+        self.assertTrue("parameter" in val.values())
+
+        with self.assertRaisesRegex(RuntimeError, "leaf"):
+            ep.module()(torch.rand(4, 4))


Is the error that leaf receives a mutation? Does it work if we set the parameters.require_grad to be False?

Yeah we can do that but we don't know when it is supposed to be True or False which depends on whether we are under torch.no_grad or something else.

the requires_grad attribute of tensors doesn't change with with torch.no_grad i think. One way to test is to manually flip the attribute to false and see if eager works.

It seems weird that eager doesn't work but we can export successfully lol, which seems a divergent behavior and is hard to explain.

torch/export/graph_signature.py

Summary: In HF model rwkv, we have parameter mutation under inference mode which should be safe. This PR does multiple things to make sure it works: 1. We execute global autograd mutation while tracing so that we can actually trace through parameter inplace mutation 2. Add support for parameter mutation under inference mode in AOTAutograd 3. Add support for parameter mutation under inference mode in export. Test Plan: test Rollback Plan: Differential Revision: D79460136

facebook-github-bot · 2025-08-05T18:45:28Z

This pull request was exported from Phabricator. Differential Revision: D79460136

Summary: In HF model rwkv, we have parameter mutation under inference mode which should be safe. This PR does multiple things to make sure it works: 1. We execute global autograd mutation while tracing so that we can actually trace through parameter inplace mutation 2. Add support for parameter mutation under inference mode in AOTAutograd 3. Add support for parameter mutation under inference mode in export. Test Plan: test Rollback Plan: Differential Revision: D79460136

facebook-github-bot · 2025-08-06T00:35:49Z

This pull request was exported from Phabricator. Differential Revision: D79460136

test/functorch/test_aotdispatch.py

ydwu4 · 2025-08-06T16:27:09Z

torch/_export/serde/serialize.py

@@ -1241,6 +1242,15 @@ def serialize_output_spec(self, spec: ep.OutputSpec) -> OutputSpec:
                    buffer_name=spec.target,
                )
            )
+        elif spec.kind == ep.OutputKind.PARAMETER_MUTATION:


I didn't see the logic of handling de-serializing a parameter mutation OutputSpec. Why this doesn't trigger an error for the serde test?

ydwu4 · 2025-08-06T16:41:48Z

torch/_export/serde/schema.py

@@ -359,6 +365,7 @@ class OutputSpec(_Union):
    gradient_to_user_input: Annotated[GradientToUserInputSpec, 50]
    user_input_mutation: Annotated[UserInputMutationSpec, 60]
    token: Annotated[OutputTokenSpec, 70]
+    parameter_mutation: Annotated[Optional[ParameterMutationSpec], 80]


After a second thought, I think a union type is probably already optional lol meaning that when we can always create the dataclass (using the new dataclass definition) from json of an old artifact. Should we remove the Optional?

This is BC-compitable but it breaks FC. I remember seeing internal pipelines sometimes rely on FC not sure if it's fixed or not. So probably we should check with internal folks to warn them or do more testing also cc @yiming0416 @SherlockNoMad

ydwu4 · 2025-08-06T16:44:57Z

test/export/test_export.py

+        self.assertTrue("parameter" in val.values())
+
+        with self.assertRaisesRegex(RuntimeError, "leaf"):
+            ep.module()(torch.rand(4, 4))


the requires_grad attribute of tensors doesn't change with with torch.no_grad i think. One way to test is to manually flip the attribute to false and see if eager works.

It seems weird that eager doesn't work but we can export successfully lol, which seems a divergent behavior and is hard to explain.

Summary: In HF model rwkv, we have parameter mutation under inference mode which should be safe. This PR does multiple things to make sure it works: 1. We execute global autograd mutation while tracing so that we can actually trace through parameter inplace mutation 2. Add support for parameter mutation under inference mode in AOTAutograd 3. Add support for parameter mutation under inference mode in export. Test Plan: test Rollback Plan: Differential Revision: D79460136

facebook-github-bot · 2025-08-11T15:50:01Z

This pull request was exported from Phabricator. Differential Revision: D79460136

ydwu4 · 2025-08-11T22:25:33Z

torch/export/_trace.py

+            with (
+                ctx,
+                override_getattribute_for_subclasses(flat_args),
+                _maybe_restore_grad_state(),


Is it accurate that parameter mutation is supported in export (for all the flags/mode)? We should document all the semantic of mutations, e.g. input, buffer, parameters mutations somewhere. Right now I don't have a clear mental model for them.

Summary: In HF model rwkv, we have parameter mutation under inference mode which should be safe. This PR does multiple things to make sure it works: 1. We execute global autograd mutation while tracing so that we can actually trace through parameter inplace mutation 2. Add support for parameter mutation under inference mode in AOTAutograd 3. Add support for parameter mutation under inference mode in export. Test Plan: test Rollback Plan: Reviewed By: ydwu4 Differential Revision: D79460136

Summary: Pull Request resolved: pytorch#159661 In HF model rwkv, we have parameter mutation under inference mode which should be safe. This PR does multiple things to make sure it works: 1. We execute global autograd mutation while tracing so that we can actually trace through parameter inplace mutation 2. Add support for parameter mutation under inference mode in AOTAutograd 3. Add support for parameter mutation under inference mode in export. Test Plan: test Rollback Plan: Reviewed By: ydwu4 Differential Revision: D79460136

facebook-github-bot · 2025-08-12T03:47:25Z

This pull request was exported from Phabricator. Differential Revision: D79460136

facebook-github-bot · 2025-08-12T03:49:20Z

This pull request was exported from Phabricator. Differential Revision: D79460136

tugsbayasgalan requested review from avikchaudhuri, zhxchen17, ydwu4, angelayi and bdhirsh as code owners August 1, 2025 18:28

pytorch-bot bot added ciflow/inductor release notes: export labels Aug 1, 2025

facebook-github-bot added the fx label Aug 1, 2025

facebook-github-bot added the fb-exported label Aug 1, 2025

tugsbayasgalan requested review from ezyang and zou3519 August 1, 2025 18:29

tugsbayasgalan commented Aug 1, 2025

View reviewed changes

tugsbayasgalan force-pushed the export-D79460136 branch from da3eb83 to 76593ac Compare August 3, 2025 21:17

tugsbayasgalan requested a review from Chillee as a code owner August 3, 2025 21:17

zou3519 reviewed Aug 4, 2025

View reviewed changes

ydwu4 reviewed Aug 4, 2025

View reviewed changes

tugsbayasgalan force-pushed the export-D79460136 branch from 76593ac to 098450f Compare August 5, 2025 18:45

tugsbayasgalan force-pushed the export-D79460136 branch from 098450f to b9bb196 Compare August 6, 2025 00:35

tugsbayasgalan requested review from zou3519 and ydwu4 August 6, 2025 00:37

ydwu4 reviewed Aug 6, 2025

View reviewed changes

tugsbayasgalan force-pushed the export-D79460136 branch from b9bb196 to e55fb54 Compare August 11, 2025 15:49

tugsbayasgalan requested a review from SherlockNoMad as a code owner August 11, 2025 15:49

tugsbayasgalan requested a review from ydwu4 August 11, 2025 18:06

ydwu4 approved these changes Aug 11, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 11, 2025

tugsbayasgalan force-pushed the export-D79460136 branch from e55fb54 to 5644567 Compare August 12, 2025 03:44

tugsbayasgalan force-pushed the export-D79460136 branch from 5644567 to fbe846d Compare August 12, 2025 03:46

tugsbayasgalan force-pushed the export-D79460136 branch from fbe846d to 1412ce7 Compare August 12, 2025 03:49

Add support for param mutation under inference mode #159661

Are you sure you want to change the base?

Add support for param mutation under inference mode #159661

Conversation

tugsbayasgalan commented Aug 1, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159661

⏳ 3 Pending, 1 Unrelated Failure

Uh oh!

facebook-github-bot commented Aug 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Aug 3, 2025

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tugsbayasgalan Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydwu4 Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

facebook-github-bot commented Aug 5, 2025

Uh oh!

facebook-github-bot commented Aug 6, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydwu4 Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydwu4 Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Aug 11, 2025

Uh oh!

ydwu4 Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Aug 12, 2025

Uh oh!

facebook-github-bot commented Aug 12, 2025

Uh oh!

Uh oh!

tugsbayasgalan commented Aug 1, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Aug 1, 2025 •

edited

Loading

tugsbayasgalan Aug 4, 2025 •

edited

Loading

ydwu4 Aug 6, 2025 •

edited

Loading

ydwu4 Aug 6, 2025 •

edited

Loading

ydwu4 Aug 6, 2025 •

edited

Loading

ydwu4 Aug 11, 2025 •

edited

Loading