switch to using formal vllm-cpu image #511

nirrozenbaum · 2025-03-17T10:42:38Z

No description provided.

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

netlify · 2025-03-17T10:42:54Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`a19e7a3`
🔍 Latest deploy log	https://app.netlify.com/sites/gateway-api-inference-extension/deploys/67d81c8882d6c00008f73e31
😎 Deploy Preview	https://deploy-preview-511--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

nirrozenbaum · 2025-03-17T10:43:01Z

cc @kfswain

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

ahg-g · 2025-03-17T17:08:05Z

config/manifests/vllm/cpu-deployment.yaml

@@ -14,7 +14,7 @@ spec:
    spec:
      containers:
        - name: lora
-          image: "seedjeffwan/vllm-cpu-env:bb392af4-20250203"
+          image: "public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v0.7.2" # formal images can be found in https://gallery.ecr.aws/q9t5s3a7/vllm-cpu-release-repo


Who owns the release of this image? vllm? what does q9t5s3a7 refer to?

yes. vllm owns this image.
see this issue I’ve opened:
vllm-project/vllm#14756

kfswain · 2025-03-17T17:47:44Z

/lgtm
/hold

2 questions that I think should be nonblocking:

Is the hash directory q9t5s3a7 going to be persistent? Seems like it could be temporary
I assume we have tested the LoRA adapters that are included?

Holding for @ahg-g to finish review and for questions, thanks Nir!!

nirrozenbaum · 2025-03-17T17:52:19Z

/lgtm /hold

2 questions that I think should be nonblocking:

Is the hash directory q9t5s3a7 going to be persistent? Seems like it could be temporary

I assume we have tested the LoRA adapters that are included?

Holding for @ahg-g to finish review and for questions, thanks Nir!!

from my understanding and what they wrote in the issue I’ve opened, I think in the longer term vllm team will maintain the cpu images in docker hub. once that happens we will need to update the image once again, but until then I believe this hash is going to be stable, otherwise it will be hard for them to see adoption of their image.
yes, definitely. I’ve tested the LoRA adapters to verify it works

ahg-g · 2025-03-17T20:18:24Z

This is still not working for me, and I get the same error (see below). Is this expected to work on specific CPU architectures?

Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 911, in
uvloop.run(run_server(args))
File "/usr/local/lib/python3.10/dist-packages/uvloop/init.py", line 82, in run
return loop.run_until_complete(wrapper())
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.10/dist-packages/uvloop/init.py", line 61, in wrapper
return await main
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 875, in run_server
async with build_async_engine_client(args) as engine_client:
File "/usr/lib/python3.10/contextlib.py", line 199, in aenter
return await anext(self.gen)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
File "/usr/lib/python3.10/contextlib.py", line 199, in aenter
return await anext(self.gen)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 230, in build_async_engine_client_from_engine_args
raise RuntimeError(
RuntimeError: Engine process failed to start. See stack trace for the root cause.

ahg-g · 2025-03-17T20:18:32Z

/approve
/hold cancel

k8s-ci-robot · 2025-03-17T20:18:39Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, nirrozenbaum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ahg-g]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ahg-g · 2025-03-17T22:53:42Z

ok, deployed on an intel machine and it worked, previously I was testing on N2D (an AMD architecture machine). We need to clarify that in the guide if indeed the image is built to work with specific architecture.

nayihz · 2025-04-07T07:49:19Z

ok, deployed on an intel machine and it worked, previously I was testing on N2D (an AMD architecture machine). We need to clarify that in the guide if indeed the image is built to work with specific architecture.

Is there an image which supports AMD architecture?

* switch to formal vllm-cpu image Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * documentation of formal vllm-cpu image Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * minor updates to cpu deployment Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

nirrozenbaum added 2 commits March 17, 2025 12:37

switch to formal vllm-cpu image

7afa7c9

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

documentation of formal vllm-cpu image

ee35a93

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

k8s-ci-robot requested review from liu-cong and robscott March 17, 2025 10:42

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 17, 2025

minor updates to cpu deployment

a19e7a3

Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 17, 2025

ahg-g reviewed Mar 17, 2025

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 17, 2025

k8s-ci-robot assigned kfswain Mar 17, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 17, 2025

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 17, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 17, 2025

k8s-ci-robot merged commit ba867c5 into kubernetes-sigs:main Mar 17, 2025
8 checks passed

nirrozenbaum deleted the cpu branch March 18, 2025 09:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

switch to using formal vllm-cpu image #511

switch to using formal vllm-cpu image #511

nirrozenbaum commented Mar 17, 2025

netlify bot commented Mar 17, 2025 •

edited

Loading

nirrozenbaum commented Mar 17, 2025

ahg-g Mar 17, 2025

nirrozenbaum Mar 17, 2025

kfswain commented Mar 17, 2025

nirrozenbaum commented Mar 17, 2025

ahg-g commented Mar 17, 2025

ahg-g commented Mar 17, 2025

k8s-ci-robot commented Mar 17, 2025

ahg-g commented Mar 17, 2025

nayihz commented Apr 7, 2025

switch to using formal vllm-cpu image #511

switch to using formal vllm-cpu image #511

Conversation

nirrozenbaum commented Mar 17, 2025

netlify bot commented Mar 17, 2025 • edited Loading

✅ Deploy Preview for gateway-api-inference-extension ready!

nirrozenbaum commented Mar 17, 2025

ahg-g Mar 17, 2025

Choose a reason for hiding this comment

nirrozenbaum Mar 17, 2025

Choose a reason for hiding this comment

kfswain commented Mar 17, 2025

nirrozenbaum commented Mar 17, 2025

ahg-g commented Mar 17, 2025

ahg-g commented Mar 17, 2025

k8s-ci-robot commented Mar 17, 2025

ahg-g commented Mar 17, 2025

nayihz commented Apr 7, 2025

netlify bot commented Mar 17, 2025 •

edited

Loading