Skip to content

feat: Add scripts for kubernetes dev env using vLLM and vLLM-p2p #60

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 29, 2025

Conversation

kfirtoledo
Copy link

Add support for Kubernetes environment development using GIE with KGateway and vLLM
This PR introduces support for the vllm mode, enabling integration testing of GIE with vLLM.
It also adds support for the vllm-p2p mode, which includes:

  1. Deployment of Redis and LMCache alongside the vLLM image
  2. Peer-to-peer (P2P) communication between vLLM instances
  3. Use of the EPP image to enable kv-cache-aware routing

@kfirtoledo kfirtoledo added help wanted Extra attention is needed WIP labels Apr 25, 2025
@kfirtoledo kfirtoledo changed the title feat: add scripts for kubernetes dev env using vLLM and vLLM-p2p feat: Add scripts for kubernetes dev env using vLLM and vLLM-p2p Apr 25, 2025
Copy link
Collaborator

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great. Most of my comments are smaller, but I do have some questions for other folks as to what effect this will have.

Also, cc @elevran @shmuelk who I think should take a look.

@@ -0,0 +1,11 @@
apiVersion: kustomize.config.k8s.io/v1beta1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, I see what you're doing with the naming now. The difference now is that any one of these deployments is deploying only a working VLLM stack, and then you have to deploy your inference-gateway stack separately.

cc @tumido @Gregory-Pereira @vMaroon just wanting to check with you on how this will work with your Helm chart?

@kfirtoledo kfirtoledo force-pushed the dev branch 2 times, most recently from fc98576 to 1a7fa8e Compare April 26, 2025 00:05
@kfirtoledo kfirtoledo force-pushed the dev branch 2 times, most recently from 390c50a to b189362 Compare April 26, 2025 23:52
@kfirtoledo
Copy link
Author

@shaneutt , PTOL.

@kfirtoledo kfirtoledo removed help wanted Extra attention is needed WIP labels Apr 27, 2025
@kfirtoledo
Copy link
Author

@shaneutt and @elevran PTAL

@shaneutt shaneutt self-requested a review April 28, 2025 12:50
Copy link
Collaborator

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock.

Once @elevran is 👍, I'm 👍

@elevran
Copy link
Collaborator

elevran commented Apr 29, 2025

@kfirtoledo LGTM, any idea on the CICD failure?

…tup for kvcache-aware)

Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>
Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>
Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>
Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>
Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>
@kfirtoledo kfirtoledo merged commit f67cc34 into neuralmagic:dev Apr 29, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OpenShift Dev Environment - Full Gateway+GIE Stack Deployment with VLLM and VLLM-P2P mode
5 participants