Skip to content

Implemented KVCacheAwareScorer #34

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 30, 2025
Merged

Conversation

vMaroon
Copy link
Member

@vMaroon vMaroon commented Apr 19, 2025

Summary

Implemented KVCacheAwareScorer which uses kvcache.Indexer as an embedded module.
The latter is configured with default configuration + an env settable Redis server address, and a HuggingFace token which is an acceptable practice.

Changes:

  • This PR introduces some changes to the scheduler flow since this is the first scorer that has to be prompt-aware.
  • This PR also introduces minor code refactoring to the previous work.
  • The Dockerfile was changed:
    • An implication of embedding the kvcache.Indexer directly is having to statically link the HuggingFace Rust tokenizer's Go bindings.
    • Due to the kvcache-manager repo being internal, having Go download its modules requires configuring access. Therefore GIT_NM_USER and NM_TOKEN have to be set.
  • The Makefile's image-build target was slightly refactored and updated to support the Dockerfile changes, and building from non amd64 archs.

Follow-up work:

  • Scorers configuration

@vMaroon vMaroon changed the base branch from kvcache-aware to dev April 23, 2025 21:59
@vMaroon
Copy link
Member Author

vMaroon commented Apr 23, 2025

The force-push rebases onto dev and reimplements the KVCacheScorer along with the new missing logic.

@vMaroon vMaroon force-pushed the kvcache-aware branch 3 times, most recently from cb10c4b to 83a91a6 Compare April 24, 2025 15:45
@vMaroon
Copy link
Member Author

vMaroon commented Apr 24, 2025

The force push undos a nolint directive that failed to fix the buggy CI linting. The linting error makes no sense, I think we can ignore it or followup later.

Copy link
Collaborator

@nirrozenbaum nirrozenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed mainly the scorer itself. there are additional changes around it that are mostly to support the scorer which seems ok (all of them).

overall this LGTM, just a few comments to consider, up to you.
the scorer interface changed in upstream in two main things:

  1. now all filters/scorers/picker do not return errors (cause we don't want to stop the scheduling cycle when that happens, you can log the error in your scorer).

  2. scorer gets all pods to score and not one by one. this change was driven by the kv scorer which access external service and is much more efficient when getting all pods in one call. accordingly the return value was changed to map[pod]float (map from pod to its score).

these changes are part of kubernetes-sigs#737 which is basically approved, but will be merged probably tomorrow (there were some minor comments that I addressed).

/lgtm

@vMaroon vMaroon force-pushed the kvcache-aware branch 3 times, most recently from 0e3fe98 to f5bb1ae Compare April 30, 2025 14:13
@clubanderson
Copy link
Collaborator

please rebase to dev to get latest pipeline - Thank you

vMaroon added 6 commits April 30, 2025 23:27
- added configuration

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
- configured maxscorepicker as default

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
@vMaroon
Copy link
Member Author

vMaroon commented Apr 30, 2025

/retest

@vMaroon vMaroon merged commit d08e32f into neuralmagic:dev Apr 30, 2025
1 of 2 checks passed
@vMaroon vMaroon deleted the kvcache-aware branch April 30, 2025 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants