-
Notifications
You must be signed in to change notification settings - Fork 73
Manifest updates #81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manifest updates #81
Conversation
modify filter for LoRA affinity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks!
@kaushikmitr can you address @liu-cong comments pls on this one? all are minor. |
done |
/label tide/merge-method-squash /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, kaushikmitr The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* squashed modify filter for LoRA affinity modify filter for LoRA affinity * update llm service and llm server pool yaml, readme * remove ununsed method from metrics.go * add flowchart image * update size flowchart image * remove image name * update queueingThresholdLoRA to 50 * rollback filter related changes * rollback filter related changes in docs * addressing comments * addressing comments
test: Test load based scorer
This pull request includes several changes to Kubernetes manifests and configurations to update service names, deployment configurations, and role-based access controls.
Service and Deployment Configuration Updates:
examples/poc/manifests/llmservice.yaml
: Added a newLLMServerPool
kind and updated themodels
section to include new model names and objectives. Updated thepoolRef
to usevllm-llama2-7b-pool
. [1] [2]examples/poc/manifests/vllm/vllm-lora-deployment.yaml
: Added a newService
forvllm-llama2-7b-pool
and updated theDeployment
configuration to usevllm-llama2-7b-pool
with reduced replicas.examples/poc/manifests/vllm/vllm-lora-service.yaml
: Removed the oldService
configuration forvllm-lora
.Role-Based Access Control (RBAC) Updates:
pkg/manifests/ext_proc.yaml
: Added newClusterRole
andClusterRoleBinding
for pod read access. Updated the deployment arguments to usevllm-llama2-7b-pool
and added a verbosity level. [1] [2] [3]Gateway Configuration Updates:
pkg/manifests/gateway.yaml
: Updated the gateway and gateway class names toinstance-gateway
. [1] [2] [3]pkg/manifests/patch_policy.yaml
: Updated the gateway name references in the patch policy toinstance-gateway
. [1] [2]