Skip to content

switch to using formal vllm-cpu image #511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 17, 2025

Conversation

nirrozenbaum
Copy link
Contributor

No description provided.

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 17, 2025
Copy link

netlify bot commented Mar 17, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit a19e7a3
🔍 Latest deploy log https://app.netlify.com/sites/gateway-api-inference-extension/deploys/67d81c8882d6c00008f73e31
😎 Deploy Preview https://deploy-preview-511--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@nirrozenbaum
Copy link
Contributor Author

cc @kfswain

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 17, 2025
@@ -14,7 +14,7 @@ spec:
spec:
containers:
- name: lora
image: "seedjeffwan/vllm-cpu-env:bb392af4-20250203"
image: "public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v0.7.2" # formal images can be found in https://gallery.ecr.aws/q9t5s3a7/vllm-cpu-release-repo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who owns the release of this image? vllm? what does q9t5s3a7 refer to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. vllm owns this image.
see this issue I’ve opened:
vllm-project/vllm#14756

@kfswain
Copy link
Collaborator

kfswain commented Mar 17, 2025

/lgtm
/hold

2 questions that I think should be nonblocking:

  1. Is the hash directory q9t5s3a7 going to be persistent? Seems like it could be temporary
  2. I assume we have tested the LoRA adapters that are included?

Holding for @ahg-g to finish review and for questions, thanks Nir!!

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 17, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 17, 2025
@nirrozenbaum
Copy link
Contributor Author

/lgtm /hold

2 questions that I think should be nonblocking:

  1. Is the hash directory q9t5s3a7 going to be persistent? Seems like it could be temporary
  2. I assume we have tested the LoRA adapters that are included?

Holding for @ahg-g to finish review and for questions, thanks Nir!!

  1. from my understanding and what they wrote in the issue I’ve opened, I think in the longer term vllm team will maintain the cpu images in docker hub. once that happens we will need to update the image once again, but until then I believe this hash is going to be stable, otherwise it will be hard for them to see adoption of their image.
  2. yes, definitely. I’ve tested the LoRA adapters to verify it works

@ahg-g
Copy link
Contributor

ahg-g commented Mar 17, 2025

This is still not working for me, and I get the same error (see below). Is this expected to work on specific CPU architectures?

Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 911, in
uvloop.run(run_server(args))
File "/usr/local/lib/python3.10/dist-packages/uvloop/init.py", line 82, in run
return loop.run_until_complete(wrapper())
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.10/dist-packages/uvloop/init.py", line 61, in wrapper
return await main
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 875, in run_server
async with build_async_engine_client(args) as engine_client:
File "/usr/lib/python3.10/contextlib.py", line 199, in aenter
return await anext(self.gen)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
File "/usr/lib/python3.10/contextlib.py", line 199, in aenter
return await anext(self.gen)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 230, in build_async_engine_client_from_engine_args
raise RuntimeError(
RuntimeError: Engine process failed to start. See stack trace for the root cause.

@ahg-g
Copy link
Contributor

ahg-g commented Mar 17, 2025

/approve
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 17, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, nirrozenbaum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 17, 2025
@k8s-ci-robot k8s-ci-robot merged commit ba867c5 into kubernetes-sigs:main Mar 17, 2025
8 checks passed
@ahg-g
Copy link
Contributor

ahg-g commented Mar 17, 2025

ok, deployed on an intel machine and it worked, previously I was testing on N2D (an AMD architecture machine). We need to clarify that in the guide if indeed the image is built to work with specific architecture.

@nirrozenbaum nirrozenbaum deleted the cpu branch March 18, 2025 09:12
@nayihz
Copy link
Contributor

nayihz commented Apr 7, 2025

ok, deployed on an intel machine and it worked, previously I was testing on N2D (an AMD architecture machine). We need to clarify that in the guide if indeed the image is built to work with specific architecture.

Is there an image which supports AMD architecture?

kfswain pushed a commit to kfswain/llm-instance-gateway that referenced this pull request Apr 29, 2025
* switch to formal vllm-cpu image

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* documentation of formal vllm-cpu image

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

* minor updates to cpu deployment

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

---------

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants