-
Notifications
You must be signed in to change notification settings - Fork 8
feat: Add the invocation of the Post response plugins #113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main question I have is regarding the explicit extension of the Scheduler interface.
I leave that for your discretion and only request that the debug header addition is removed from the code base before merge.
@@ -66,6 +67,7 @@ type StreamingServer struct { | |||
|
|||
type Scheduler interface { | |||
Schedule(ctx context.Context, b *schedulingtypes.LLMRequest) (result *schedulingtypes.Result, err error) | |||
RunPostResponsePlugins(ctx context.Context, req *types.LLMRequest, tragetPodName string) (*schedulingtypes.Result, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so far, all plugin invocations (pre, post, pick, ...) are done inside the pkg/epp/scheduling/Scheduler
struct as private struct methods and are not defined as part of the interface itself.
Unless there's a good reason to deviate and make this an interface requirement, I would prefer we don't introduce new interface methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Schedule API is only used on the request side of things. PostResponse plugins are called in the response side of things
I can if you want add a parameter somewhere to indicate which "direction" the Schedule call is being made. Under the covers, My guess is that Schedule wil then wrap two functions, one foe request processing and the other for response processing.
I think this will complicate the P/D case where we have a special Scheduler that wraps two schedulers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Those are indeed two distinct flows.
There are several options (when you consider making this change upstream - seems reasonable to start a discussion on the issue to get a resolution in parallel with coding the change):
- change the name of the RunPost... to be more generic
- change both functions to make it clear that one handles the request and the other the response
- introduce a new interface for doing post response processing and pass it from main, as is done for the Scheduler (the same object can implement both interfaces if needed)
- etc.
}, | ||
llmReq := &schedulingtypes.LLMRequest{ | ||
Model: reqCtx.Model, | ||
Headers: responseHeaders, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: Is there a need to differentiate between adding new headers and mutating existing ones, or is it all the same from Envoy's PoV?
for _, plugin := range s.postResponsePlugins { | ||
logger.V(logutil.DEBUG).Info("Running post-response plugin", "plugin", plugin.Name()) | ||
before := time.Now() | ||
plugin.PostResponse(sCtx, targetPod) | ||
metrics.RecordSchedulerPluginProcessingLatency(plugins.PostResponsePluginType, plugin.Name(), time.Since(before)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all other run*
functions are considerably shorter (amounting to the highlighted code more or less.
Would be good to understand the difference. Might be related that insofar scheduling was called for a request and this is a first processing called on the response?
If so, might be good to highlight that in the comments. Also consider if we want to create some symmetry (in the upstream) so we call OnRequest/OnResponse instead of Schedule/RunPostResponse...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
@@ -81,6 +80,7 @@ func NewSchedulerWithConfig(datastore Datastore, config *SchedulerConfig) *Sched | |||
scorers: config.scorers, | |||
picker: config.picker, | |||
postSchedulePlugins: config.postSchedulePlugins, | |||
postResponsePlugins: config.postResponsePlugins, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reading this and wondering if PostResponsePlugins should even be part of the scheduler config.
other than the fact that scorer(or other plugin) also implements the PostResponse, is there any reason for this to be part of the scheduler? is there any information PostResponse plugins need from scheduler?
I mean, we could do in main where we register the plugins something like (pseudo code):
my_plugin = scorer.MyScorer // implements also PostSchedule
...
...
Scheduler := NewSchedulerWithConfig(... , config) // config contains my_plugin
// pass my_plugin (and other plugins that implement PostResponse) to runserver to call PostResponse.
This PR adds to the scheduler the ability to invoke the configured PostResponse plugins.
In addition any headers the PostResponse plugins add, are added to the response sent to the user.
A simple unit test was added. A more complex test should be part of a larger StreamingServer test.