remove unnecessary sync point in AveragedModel update #158017

gl3lan · 2025-07-10T09:30:36Z

Summary:
The test bool(self.n_averaged == 0) is a CPU/GPU synchronization point that is called for each update.
This test is only meant to know whether the AveragedModel copy has been initialized or not.
This diff introduces a CPU-based variable for that purpose.
When loading from checkpoint we also make sure the parameter is refreshed.

After this fix, each update_parameter call is reduced to 6ms from 333ms (98% reduction).

Test Plan:
contbuild & OSS CI
Test plan from GitHub:
CI

Rollback Plan:

Differential Revision: D78074709

pytorch-bot · 2025-07-10T09:30:40Z

This appears to be a diff that was exported from phabricator, but the PR author does not have sufficient permissions to run CI. @gl3lan, please do step 2 of internal wiki to get write access so you do not need to get CI approvals in the future. If you think this is a mistake, please contact the Pytorch Dev Infra team.

linux-foundation-easycla · 2025-07-10T09:30:41Z

The committers listed above are authorized under a signed CLA.

✅ login: gl3lan / name: Gaël Le Lan (091aeb5)

pytorch-bot · 2025-07-10T09:30:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158017

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 10 New Failures

As of commit 091aeb5 with merge base db78943 ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner-noclang / linux-job (gh)
>>> Lint for torch/optim/swa_utils.py:
pull / linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 4, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu) (gh)
test_optim.py::TestSWAUtils::test_averaged_model_all_devices_ema_False
pull / linux-jammy-py3.10-clang18-asan / test (default, 3, 6, lf.linux.4xlarge) (gh)
test_optim.py::TestSWAUtils::test_averaged_model_all_devices_ema_False
pull / linux-jammy-py3.13-clang12 / test (crossref, 1, 2, lf.linux.2xlarge) (gh)
test_optim.py::TestSWAUtils::test_averaged_model_all_devices_ema_False
pull / linux-jammy-py3.13-clang12 / test (default, 3, 5, lf.linux.4xlarge) (gh)
test_optim.py::TestSWAUtils::test_averaged_model_all_devices_ema_False
pull / linux-jammy-py3.13-clang12 / test (dynamo_wrapped, 1, 3, lf.linux.2xlarge) (gh)
test_optim.py::TestSWAUtils::test_averaged_model_all_devices_ema_False
pull / linux-jammy-py3.9-clang12 / test (crossref, 1, 2, lf.linux.2xlarge) (gh)
test_optim.py::TestSWAUtils::test_averaged_model_all_devices_ema_False
pull / linux-jammy-py3.9-clang12 / test (default, 3, 5, lf.linux.4xlarge) (gh)
test_optim.py::TestSWAUtils::test_averaged_model_all_devices_ema_False
pull / linux-jammy-py3.9-clang12 / test (dynamo_wrapped, 2, 3, lf.linux.2xlarge) (gh)
test_optim.py::TestSWAUtils::test_averaged_model_all_devices_ema_False
pull / linux-jammy-py3.9-gcc11 / test (default, 3, 5, lf.linux.2xlarge) (gh)
test_optim.py::TestSWAUtils::test_averaged_model_all_devices_ema_False

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-07-10T09:30:52Z

This pull request was exported from Phabricator. Differential Revision: D78074709

facebook-github-bot · 2025-07-10T13:42:36Z

This pull request was exported from Phabricator. Differential Revision: D78074709

facebook-github-bot · 2025-07-10T16:58:44Z

This pull request was exported from Phabricator. Differential Revision: D78074709

facebook-github-bot · 2025-07-14T08:36:51Z

This pull request was exported from Phabricator. Differential Revision: D78074709

Summary: The test `bool(self.n_averaged == 0)` is a CPU/GPU synchronization point that is called for each update. This test is only meant to know whether the AveragedModel copy has been initialized or not. This diff introduces a CPU-based boolean variable for that purpose. When loading from checkpoint we also make sure the parameter is refreshed. After this fix, each `update_parameter` call is reduced to 6ms from 333ms (98% reduction). Test Plan: contbuild & OSS CI Test plan from GitHub: CI Rollback Plan: Differential Revision: D78074709

facebook-github-bot · 2025-07-22T08:07:01Z

This pull request was exported from Phabricator. Differential Revision: D78074709

Summary: The test `bool(self.n_averaged == 0)` is a CPU/GPU synchronization point that is called for each update. This test is only meant to know whether the AveragedModel copy has been initialized or not. This diff introduces a CPU-based boolean variable for that purpose. When loading from checkpoint we also make sure the parameter is refreshed. After this fix, each `update_parameter` call is reduced to 6ms from 333ms (98% reduction). Test Plan: contbuild & OSS CI Test plan from GitHub: CI Rollback Plan: Differential Revision: D78074709

facebook-github-bot · 2025-08-03T20:59:07Z

This pull request was exported from Phabricator. Differential Revision: D78074709

Summary: The test `bool(self.n_averaged == 0)` is a CPU/GPU synchronization point that is called for each update. This test is only meant to know whether the AveragedModel copy has been initialized or not. This diff introduces a CPU-based boolean variable for that purpose. When loading from checkpoint we also make sure the parameter is refreshed. After this fix, each `update_parameter` call is reduced to 6ms from 333ms (98% reduction). Test Plan: contbuild & OSS CI Test plan from GitHub: CI Rollback Plan: Differential Revision: D78074709

facebook-github-bot · 2025-08-03T22:03:17Z

This pull request was exported from Phabricator. Differential Revision: D78074709

Summary: The test `bool(self.n_averaged == 0)` is a CPU/GPU synchronization point that is called for each update. This test is only meant to know whether the AveragedModel copy has been initialized or not. This diff introduces a CPU-based boolean variable for that purpose. When loading from checkpoint we also make sure the parameter is refreshed. After this fix, each `update_parameter` call is reduced to 6ms from 333ms (98% reduction). Test Plan: contbuild & OSS CI Test plan from GitHub: CI Rollback Plan: Differential Revision: D78074709

facebook-github-bot · 2025-08-04T18:41:31Z

This pull request was exported from Phabricator. Differential Revision: D78074709

Summary: The test `bool(self.n_averaged == 0)` is a CPU/GPU synchronization point that is called for each update. This test is only meant to know whether the AveragedModel copy has been initialized or not. This diff introduces a CPU-based boolean variable for that purpose. When loading from checkpoint we also make sure the parameter is refreshed. After this fix, each `update_parameter` call is reduced to 6ms from 333ms (98% reduction). Test Plan: contbuild & OSS CI Test plan from GitHub: CI Rollback Plan: Differential Revision: D78074709

facebook-github-bot · 2025-08-04T20:03:58Z

This pull request was exported from Phabricator. Differential Revision: D78074709

Summary: The test `bool(self.n_averaged == 0)` is a CPU/GPU synchronization point that is called for each update. This test is only meant to know whether the AveragedModel copy has been initialized or not. This diff introduces a CPU-based boolean variable for that purpose. When loading from checkpoint we also make sure the parameter is refreshed. After this fix, each `update_parameter` call is reduced to 6ms from 333ms (98% reduction). Test Plan: contbuild & OSS CI Test plan from GitHub: CI Rollback Plan: Differential Revision: D78074709

facebook-github-bot · 2025-08-05T19:16:34Z

This pull request was exported from Phabricator. Differential Revision: D78074709

Summary: Pull Request resolved: pytorch#158017 The test `bool(self.n_averaged == 0)` is a CPU/GPU synchronization point that is called for each update. This test is only meant to know whether the AveragedModel copy has been initialized or not. This diff introduces a CPU-based boolean variable for that purpose. When loading from checkpoint we also make sure the parameter is refreshed. After this fix, each `update_parameter` call is reduced to 6ms from 333ms (98% reduction). Test Plan: contbuild & OSS CI Test plan from GitHub: CI Rollback Plan: Differential Revision: D78074709

janeyx99

Huh why is n_averaged even a tensor. It can just be a python number and this sync would go away, right?

facebook-github-bot · 2025-08-07T06:24:26Z

This pull request was exported from Phabricator. Differential Revision: D78074709

Summary: The test `bool(self.n_averaged == 0)` is a CPU/GPU synchronization point that is called for each update. This test is only meant to know whether the AveragedModel copy has been initialized or not. This diff introduces a CPU-based boolean variable for that purpose. When loading from checkpoint we also make sure the parameter is refreshed. After this fix, each `update_parameter` call is reduced to 6ms from 333ms (98% reduction). Test Plan: contbuild & OSS CI Test plan from GitHub: CI Rollback Plan: Differential Revision: D78074709

gl3lan · 2025-08-07T07:36:03Z

Huh why is n_averaged even a tensor. It can just be a python number and this sync would go away, right?

Right, but we need it to be saved/reloaded upon job resuming, or else the EMA behavior will change after resuming.

janeyx99 · 2025-08-07T17:52:31Z

I've just learned from @mikaylagawarecki that there is a get_extra_state and set_extra_state on modules that we can use to just store a Python number for n_averaged: https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.get_extra_state

This would make this code simpler and more understandable with the perf win if we had n_averaged as a python number instead of buffer

Summary: The test `bool(self.n_averaged == 0)` is a CPU/GPU synchronization point that is called for each update. This test is only meant to know whether the AveragedModel copy has been initialized or not. This diff introduces a CPU-based boolean variable for that purpose. When loading from checkpoint we also make sure the parameter is refreshed. After this fix, each `update_parameter` call is reduced to 6ms from 333ms (98% reduction). Test Plan: contbuild & OSS CI Test plan from GitHub: CI Rollback Plan: Differential Revision: D78074709

facebook-github-bot · 2025-08-09T20:00:59Z

This pull request was exported from Phabricator. Differential Revision: D78074709

gl3lan · 2025-08-11T09:41:10Z

I've just learned from @mikaylagawarecki that there is a get_extra_state and set_extra_state on modules that we can use to just store a Python number for n_averaged: https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.get_extra_state

This would make this code simpler and more understandable with the perf win if we had n_averaged as a python number instead of buffer

@janeyx99 it looks cleaner now but it breaks any pre-existing checkpoint loading because of the missing parameter. Any suggestion? Override load_state_dict, search the tensor value and pass it to the new parameter?

janeyx99 · 2025-08-11T15:51:10Z

@gl3lan yea we should be able to register a hook https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.register_load_state_dict_pre_hook so that if the state dict has a Tensor n_averaged, we just convert it to the number. How does that sound to you?

I'm also okay with the overriding load_state_dict option, whichever is simpler. Can you also add a test case to ensure we don't break existing users?

gl3lan requested review from albanD and janeyx99 as code owners July 10, 2025 09:30

pytorch-bot bot added the release notes: optim label Jul 10, 2025

facebook-github-bot added the fb-exported label Jul 10, 2025

gl3lan force-pushed the export-D78074709 branch from b62e792 to 6463fb5 Compare July 10, 2025 13:42

albanD removed their request for review July 10, 2025 13:46

gl3lan force-pushed the export-D78074709 branch from 6463fb5 to 5c3ca33 Compare July 10, 2025 16:58

gl3lan force-pushed the export-D78074709 branch from 5c3ca33 to f61c1f5 Compare July 14, 2025 08:36

gl3lan marked this pull request as draft July 14, 2025 08:38

gl3lan force-pushed the export-D78074709 branch from f61c1f5 to ec192fb Compare July 22, 2025 08:06

gl3lan marked this pull request as ready for review July 22, 2025 09:01

gl3lan force-pushed the export-D78074709 branch from ec192fb to 6a831d2 Compare August 3, 2025 20:58

gl3lan force-pushed the export-D78074709 branch from 6a831d2 to ce429ea Compare August 3, 2025 22:02

gl3lan force-pushed the export-D78074709 branch from ce429ea to 49f50ec Compare August 4, 2025 18:41

gl3lan force-pushed the export-D78074709 branch from 49f50ec to e63c89c Compare August 4, 2025 20:03

gl3lan marked this pull request as draft August 4, 2025 20:10

gl3lan force-pushed the export-D78074709 branch from e63c89c to 30603a7 Compare August 5, 2025 19:12

gl3lan force-pushed the export-D78074709 branch from 30603a7 to 97bb125 Compare August 5, 2025 19:16

janeyx99 reviewed Aug 6, 2025

View reviewed changes

gl3lan force-pushed the export-D78074709 branch from 97bb125 to 64a7a4f Compare August 7, 2025 06:24

gl3lan marked this pull request as ready for review August 7, 2025 14:57

gl3lan force-pushed the export-D78074709 branch from 64a7a4f to 091aeb5 Compare August 9, 2025 20:00

remove unnecessary sync point in AveragedModel update #158017

Are you sure you want to change the base?

remove unnecessary sync point in AveragedModel update #158017

Uh oh!

Conversation

gl3lan commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 10, 2025

Uh oh!

linux-foundation-easycla bot commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158017

❌ 10 New Failures

Uh oh!

facebook-github-bot commented Jul 10, 2025

Uh oh!

facebook-github-bot commented Jul 10, 2025

Uh oh!

facebook-github-bot commented Jul 10, 2025

Uh oh!

facebook-github-bot commented Jul 14, 2025

Uh oh!

facebook-github-bot commented Jul 22, 2025

Uh oh!

facebook-github-bot commented Aug 3, 2025

Uh oh!

facebook-github-bot commented Aug 3, 2025

Uh oh!

facebook-github-bot commented Aug 4, 2025

Uh oh!

facebook-github-bot commented Aug 4, 2025

Uh oh!

facebook-github-bot commented Aug 5, 2025

Uh oh!

janeyx99 left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Aug 7, 2025

Uh oh!

gl3lan commented Aug 7, 2025

Uh oh!

janeyx99 commented Aug 7, 2025

Uh oh!

facebook-github-bot commented Aug 9, 2025

Uh oh!

gl3lan commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

janeyx99 commented Aug 11, 2025

Uh oh!

Uh oh!

gl3lan commented Jul 10, 2025 •

edited

Loading

linux-foundation-easycla bot commented Jul 10, 2025 •

edited

Loading

pytorch-bot bot commented Jul 10, 2025 •

edited

Loading

gl3lan commented Aug 11, 2025 •

edited

Loading