Skip to content

feat: Add the invocation of the Post response plugins #113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 4, 2025

Conversation

shmuelk
Copy link
Collaborator

@shmuelk shmuelk commented May 4, 2025

This PR adds to the scheduler the ability to invoke the configured PostResponse plugins.

In addition any headers the PostResponse plugins add, are added to the response sent to the user.

A simple unit test was added. A more complex test should be part of a larger StreamingServer test.

@shmuelk shmuelk requested review from elevran and nirrozenbaum May 4, 2025 12:45
Copy link
Collaborator

@elevran elevran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main question I have is regarding the explicit extension of the Scheduler interface.
I leave that for your discretion and only request that the debug header addition is removed from the code base before merge.

@@ -66,6 +67,7 @@ type StreamingServer struct {

type Scheduler interface {
Schedule(ctx context.Context, b *schedulingtypes.LLMRequest) (result *schedulingtypes.Result, err error)
RunPostResponsePlugins(ctx context.Context, req *types.LLMRequest, tragetPodName string) (*schedulingtypes.Result, error)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so far, all plugin invocations (pre, post, pick, ...) are done inside the pkg/epp/scheduling/Scheduler struct as private struct methods and are not defined as part of the interface itself.
Unless there's a good reason to deviate and make this an interface requirement, I would prefer we don't introduce new interface methods.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Schedule API is only used on the request side of things. PostResponse plugins are called in the response side of things

I can if you want add a parameter somewhere to indicate which "direction" the Schedule call is being made. Under the covers, My guess is that Schedule wil then wrap two functions, one foe request processing and the other for response processing.

I think this will complicate the P/D case where we have a special Scheduler that wraps two schedulers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Those are indeed two distinct flows.
There are several options (when you consider making this change upstream - seems reasonable to start a discussion on the issue to get a resolution in parallel with coding the change):

  • change the name of the RunPost... to be more generic
  • change both functions to make it clear that one handles the request and the other the response
  • introduce a new interface for doing post response processing and pass it from main, as is done for the Scheduler (the same object can implement both interfaces if needed)
  • etc.

},
llmReq := &schedulingtypes.LLMRequest{
Model: reqCtx.Model,
Headers: responseHeaders,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: Is there a need to differentiate between adding new headers and mutating existing ones, or is it all the same from Envoy's PoV?

Comment on lines +237 to +242
for _, plugin := range s.postResponsePlugins {
logger.V(logutil.DEBUG).Info("Running post-response plugin", "plugin", plugin.Name())
before := time.Now()
plugin.PostResponse(sCtx, targetPod)
metrics.RecordSchedulerPluginProcessingLatency(plugins.PostResponsePluginType, plugin.Name(), time.Since(before))
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all other run* functions are considerably shorter (amounting to the highlighted code more or less.
Would be good to understand the difference. Might be related that insofar scheduling was called for a request and this is a first processing called on the response?
If so, might be good to highlight that in the comments. Also consider if we want to create some symmetry (in the upstream) so we call OnRequest/OnResponse instead of Schedule/RunPostResponse...

Copy link
Collaborator

@elevran elevran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@shmuelk shmuelk merged commit 220915f into neuralmagic:dev May 4, 2025
1 check passed
@shmuelk shmuelk deleted the post-response branch May 4, 2025 13:53
@@ -81,6 +80,7 @@ func NewSchedulerWithConfig(datastore Datastore, config *SchedulerConfig) *Sched
scorers: config.scorers,
picker: config.picker,
postSchedulePlugins: config.postSchedulePlugins,
postResponsePlugins: config.postResponsePlugins,
Copy link
Collaborator

@nirrozenbaum nirrozenbaum May 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reading this and wondering if PostResponsePlugins should even be part of the scheduler config.
other than the fact that scorer(or other plugin) also implements the PostResponse, is there any reason for this to be part of the scheduler? is there any information PostResponse plugins need from scheduler?
I mean, we could do in main where we register the plugins something like (pseudo code):

my_plugin = scorer.MyScorer // implements also PostSchedule
...
...
Scheduler := NewSchedulerWithConfig(... , config) // config contains my_plugin
// pass my_plugin (and other plugins that implement PostResponse) to runserver to call PostResponse. 

@elevran @shmuelk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants