Skip to content

scripts: synthetic prompt mode for server-bench.py #14695

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 16, 2025

Conversation

JohannesGaessler
Copy link
Collaborator

This PR extends scripts/server-bench.py with synthetic prompts which are simply random lists of tokens with a configurable random length. Each prompt is then assigned a configurable random number of tokens to generate, the server is instructed to generate this exact number of tokens with n_predict and ignore_eos. This ensures that the server performance can be tested easily and consistently for specific combinations of prompt lengths and generation lengths.

I think there is something wrong with how the server handles ignore_eos. I'm setting it in the JSON but the server does not respect it. Also the way it's being handled seems incorrect, both the slot_params and the sampling_params have a field ignore_eos and they're used inconsistently.

I changed the prompt latency to be based on the latency as observed by the Python script instead of as reported by the server. Long-term we can consider extending the script with support for other APIs/projects. I changed the number of workers back to the number of slots to avoid just measuring the time a worker spends in the queue waiting for a free slot.

I changed the script to pass arguments to the server via environment variables. This way users can pass any arguments to the server without server-bench.py becoming bloated.

@JohannesGaessler JohannesGaessler requested a review from ngxson as a code owner July 15, 2025 12:36
@github-actions github-actions bot added script Script related examples python python script changes server labels Jul 15, 2025
@JohannesGaessler JohannesGaessler force-pushed the server-bench-6 branch 2 times, most recently from 1660161 to fec4250 Compare July 15, 2025 12:47
@ggerganov
Copy link
Member

I think there is something wrong with how the server handles ignore_eos. I'm setting it in the JSON but the server does not respect it. Also the way it's being handled seems incorrect, both the slot_params and the sampling_params have a field ignore_eos and they're used inconsistently.

Should be fixed in #14710

@JohannesGaessler JohannesGaessler merged commit 5cae766 into ggml-org:master Jul 16, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples python python script changes script Script related server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants