-
Notifications
You must be signed in to change notification settings - Fork 24.9k
[CI] Migrate focal (ubuntu 20.04) images to jammy (ubuntu 22.04) #154437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/154437
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 84 PendingAs of commit 569ad47 with merge base 523b637 ( NEW FAILURE - The following job has failed:
UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
fi | ||
|
||
if [ -n "${UBUNTU_VERSION}" ]; then | ||
OS="ubuntu" | ||
elif [ -n "${CENTOS_VERSION}" ]; then | ||
OS="centos" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be no CentOS in CI/CD anymore
@@ -370,14 +361,6 @@ esac | |||
|
|||
tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]') | |||
|
|||
#when using cudnn version 8 install it separately from cuda | |||
if [[ "$image" == *cuda* && ${OS} == "ubuntu" ]]; then | |||
IMAGE_NAME="nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvidia images used on Focal builds. For Jammy builds these images are not used anymore
@pytorchbot rebase -b main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about splitting this into mulitple separate PRs to derisk the switch:
PR 1: Builds the jammy docker images for all desired configs
PR 2: Switches over all workflows to use the jammy images
PR 3: Stops building the focal docker images
It'll cut down on the blast radius in case something goes wrong. For example, it'll let us make sure that all jammy docker builds succeed before we start taking CI dependencies on them.
@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here |
Successfully rebased |
29019c2
to
75c6e1b
Compare
@ZainRizvi I believe if the docker build + CI is successful its a lot easier to do in one shot as was done here: #154153 Building Docker images by itself are not really useful, since we most likely need to change them once we start migrating jobs over. To minimize the blast radius we can probably try to migrate in smaller chunks |
56c6445
to
2ad7c7d
Compare
@pytorchmergebot rebase -b main |
@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here |
Successfully rebased |
2ad7c7d
to
011c753
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Do you think we could just keep one version in CI going forward? for example, everything is using 22.04 (jammy) now, then we will upgrade it all to 24.04 (the next LTS)
Following errors alredy exist on trunk: and test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_10000_backend_cublaslt_cuda_float16 GH job link HUD commit link |
@pytorchmergebot merge -f "lint is green other jobs where already tested" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…orch#154437) Fixes pytorch#154157 Inductor Workflows where moved from focal to jammy here: pytorch#154153 Pull Request resolved: pytorch#154437 Approved by: https://github.com/Skylion007, https://github.com/cyyever, https://github.com/davidberard98, https://github.com/huydhn
@pytorchbot revert -m "I could be wrong, but looks like it broke slow jobs, see https://hud.pytorch.org/hud/pytorch/pytorch/b0fbbef1361ccaab8a5aec8e7cd62150e7b361de/1?per_page=50&name_filter=slow&mergeEphemeralLF=true" -c nosignal |
@pytorchbot successfully started a revert job. Check the current status here. |
Reverting PR 154437 failedReason: Command
Details for Dev Infra teamRaised by workflow job |
…orch#154437) Fixes pytorch#154157 Inductor Workflows where moved from focal to jammy here: pytorch#154153 Pull Request resolved: pytorch#154437 Approved by: https://github.com/Skylion007, https://github.com/cyyever, https://github.com/davidberard98, https://github.com/huydhn
…orch#154437) Fixes pytorch#154157 Inductor Workflows where moved from focal to jammy here: pytorch#154153 Pull Request resolved: pytorch#154437 Approved by: https://github.com/Skylion007, https://github.com/cyyever, https://github.com/davidberard98, https://github.com/huydhn
Fixes #154157
Inductor Workflows where moved from focal to jammy here: #154153
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k