Skip to content

Move inductor workflows focal (ubuntu 20.04) -> jammy (ubuntu 22.04) #154153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

atalman
Copy link
Contributor

@atalman atalman commented May 22, 2025

Trying to fix: #154157

@atalman atalman requested review from a team and jeffdaily as code owners May 22, 2025 21:40
Copy link

pytorch-bot bot commented May 22, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/154153

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 26 Pending, 3 Unrelated Failures

As of commit f5fa877 with merge base 413664b (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label May 22, 2025
@atalman atalman changed the title [DRAFT] Move inductor workflow from focal to jammy [WIP] Move inductor workflow from focal to jammy May 22, 2025
@atalman atalman added ciflow/inductor ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR labels May 23, 2025
@huydhn
Copy link
Contributor

huydhn commented May 23, 2025

TIL, we don't need to manually create new ECR entries anymore, which is super nice

@atalman atalman changed the title [WIP] Move inductor workflow from focal to jammy Move inductor workflows from focal (ubuntu 20.04) -> jammy (ubuntu 22.04) May 23, 2025
@atalman atalman changed the title Move inductor workflows from focal (ubuntu 20.04) -> jammy (ubuntu 22.04) Move inductor workflows focal (ubuntu 20.04) -> jammy (ubuntu 22.04) May 23, 2025
focal->jammy

[CI] migrate focal to jammy

fix

Revert "fix"

This reverts commit d70b171cdd5e2a7286aa4b80fd8e7ee67b9e867c.
@atalman atalman force-pushed the inductor_workflows_try_move branch from d70b171 to 4c0e74e Compare May 23, 2025 15:45
@atalman
Copy link
Contributor Author

atalman commented May 27, 2025

@pytorchmergebot merge -f "all required signal is green"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@malfet
Copy link
Contributor

malfet commented May 27, 2025

@pytorchbot revert -m "Broke inductor tests, see https://hud.pytorch.org/hud/pytorch/pytorch/b8452e55bc50109a14add60397fee6c85486ae8d/1?per_page=50&name_filter=inductor_torchbench&mergeEphemeralLF=true" -c nosignal

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@atalman your PR has been successfully reverted.

@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels May 27, 2025
@atalman
Copy link
Contributor Author

atalman commented May 27, 2025

Looks like it broke opacus_cifar10: #154446

@davidberard98
Copy link
Contributor

maybe this is just a new release of opacus? https://pypi.org/project/opacus/#history

@desertfire
Copy link
Contributor

The fact this is an eager failure is pretty suspicious given the nature of your PR,

2025-05-27T17:08:45.4796194Z cuda eval  opacus_cifar10                     
2025-05-27T17:08:45.4810045Z Traceback (most recent call last):
2025-05-27T17:08:45.4810789Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 1940, in validate_model
2025-05-27T17:08:45.4811457Z     self.model_iter_fn(model, example_inputs)
2025-05-27T17:08:45.4812106Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/torchbench.py", line 460, in forward_pass
2025-05-27T17:08:45.4812855Z     return mod(*inputs)
2025-05-27T17:08:45.4813593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1767, in _wrapped_call_impl
2025-05-27T17:08:45.4814246Z     return self._call_impl(*args, **kwargs)
2025-05-27T17:08:45.4814837Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1778, in _call_impl
2025-05-27T17:08:45.4815433Z     return forward_call(*args, **kwargs)
2025-05-27T17:08:45.4816057Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/opacus/grad_sample/grad_sample_module.py", line 149, in forward
2025-05-27T17:08:45.4816686Z     return self._module(*args, **kwargs)
2025-05-27T17:08:45.4817305Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1767, in _wrapped_call_impl
2025-05-27T17:08:45.4817938Z     return self._call_impl(*args, **kwargs)
2025-05-27T17:08:45.4818524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1778, in _call_impl
2025-05-27T17:08:45.4819114Z     return forward_call(*args, **kwargs)
2025-05-27T17:08:45.4819698Z   File "/var/lib/jenkins/.local/lib/python3.10/site-packages/torchvision/models/resnet.py", line 285, in forward
2025-05-27T17:08:45.4820282Z     return self._forward_impl(x)
2025-05-27T17:08:45.4820870Z   File "/var/lib/jenkins/.local/lib/python3.10/site-packages/torchvision/models/resnet.py", line 270, in _forward_impl
2025-05-27T17:08:45.4821473Z     x = self.relu(x)
2025-05-27T17:08:45.4822039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1767, in _wrapped_call_impl
2025-05-27T17:08:45.4822671Z     return self._call_impl(*args, **kwargs)
2025-05-27T17:08:45.4823271Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1778, in _call_impl
2025-05-27T17:08:45.4823870Z     return forward_call(*args, **kwargs)
2025-05-27T17:08:45.4824766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/activation.py", line 135, in forward
2025-05-27T17:08:45.4825391Z     return F.relu(input, inplace=self.inplace)
2025-05-27T17:08:45.4825952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py", line 1711, in relu
2025-05-27T17:08:45.4826490Z     result = torch.relu_(input)
2025-05-27T17:08:45.4828147Z RuntimeError: Output 0 of BackwardHookFunctionBackward is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned as-is) and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is forbidden. You can fix this by cloning the output of the custom Function.

@atalman
Copy link
Contributor Author

atalman commented May 27, 2025

This does look like opacus update:

Previous passing PR: https://github.com/pytorch/pytorch/actions/runs/15272660608/job/42954922171#step:22:1478
used opacus==1.5.3

@atalman
Copy link
Contributor Author

atalman commented May 27, 2025

@pytorchmergebot merge -f "lint is green, this is a reland"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-no-td Do not run TD on this PR ciflow/h100 ciflow/inductor ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR Merged Reverted topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move all CI/CD workflows from focal to jammy
8 participants