-
Notifications
You must be signed in to change notification settings - Fork 24.9k
[vllm in torch ci ][step 1/3] add build logics #159815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159815
Note: Links to docs will display an error until the docs builds have been completed. ❌ 5 New Failures, 1 Unrelated FailureAs of commit 1184e6e with merge base e16c48a ( NEW FAILURES - The following jobs have failed:
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Warning: Unknown label
Please add the new label to .github/pytorch-probot.yml |
--build-arg vllm_fa_cmake_gpu_arches={cfg.vllm_fa_cmake_gpu_arches}\ | ||
--target {cfg.target} \ | ||
-t {cfg.tag_name} \ | ||
--progress=plain . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wonder why this is the statment
is there a reason why for the build.sh it says:
Do not use cache and progress=plain when in CI
Line 344 in 49abc0e
# Do not use cache and progress=plain when in CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For caching: When we had non-ephemeral runners we'd typically run into situations where we'd utilize cached docker images where we didn't want them. This isn't really an issue today.
For progress=plain: If you don't use this you'll have a bunch of garbage output since buildkit does a lot of output sugar for its main output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so I appreciate the ambition in this PR but I think we're going to need to split it up a bit to make it easier for people (like myself to review):
- PR1: Introduce the new torch_cli
- PR2: Implement the vllm specific parts of the torch_cli
- PR3: Add the workflows on top of this that will run
I'd recommend using ghstack to do this.
sounds good! thought iit's small enought without the test cli, but will split it even smaller! |
Description
Add vllm build into pytorch ci pipeline.
Details: how we do this
This pr setup vllm build logics that pass around wheels generated by the docker run and docker build. We set up a cli tool with customized logics for external build such as vllm, the vllm build process in brief:
Code implementation
This code covers 3 sections
Dockerfile.torch_nightly in .github/docker/external/vllm/
Originally Made a copy of the Dockerfile.torch_nightly from vllm to accelerate the code development, then a RCE happend last week regarding the dependency fro external repo such as vlllm.
So leave it in torch now to decide the further discussion regarding install xfn repo's dependencies in torch
Setup cli tool scripts/torch_cli
set up a cli tool for torch ci build, this also set up a way to do our build and tests for new features.
it normally comes with a config yml files to store specifc parameters which affect the build/test results
the cli tool must be run at root of pytorch repo.
vllm yaml file
vllm.yaml: main yaml file to set up vllm test workflow
_linux-external-build.yml: build yml file to handle external lib build such as vllm, which depends on torch ci from the pr.
next step
add vllm test step.
see experimental pr [not final, just a workable one]: https://github.com/pytorch/pytorch/actions/runs/16759050517/job/47453916017