scripts: synthetic prompt mode for server-bench.py #14695

JohannesGaessler · 2025-07-15T12:36:35Z

This PR extends scripts/server-bench.py with synthetic prompts which are simply random lists of tokens with a configurable random length. Each prompt is then assigned a configurable random number of tokens to generate, the server is instructed to generate this exact number of tokens with n_predict and ignore_eos. This ensures that the server performance can be tested easily and consistently for specific combinations of prompt lengths and generation lengths.

I think there is something wrong with how the server handles ignore_eos. I'm setting it in the JSON but the server does not respect it. Also the way it's being handled seems incorrect, both the slot_params and the sampling_params have a field ignore_eos and they're used inconsistently.

I changed the prompt latency to be based on the latency as observed by the Python script instead of as reported by the server. Long-term we can consider extending the script with support for other APIs/projects. I changed the number of workers back to the number of slots to avoid just measuring the time a worker spends in the queue waiting for a free slot.

I changed the script to pass arguments to the server via environment variables. This way users can pass any arguments to the server without server-bench.py becoming bloated.

ggerganov · 2025-07-16T04:51:43Z

I think there is something wrong with how the server handles ignore_eos. I'm setting it in the JSON but the server does not respect it. Also the way it's being handled seems incorrect, both the slot_params and the sampling_params have a field ignore_eos and they're used inconsistently.

Should be fixed in #14710

JohannesGaessler requested a review from ngxson as a code owner July 15, 2025 12:36

github-actions bot added script Script related examples python python script changes server labels Jul 15, 2025

JohannesGaessler force-pushed the server-bench-6 branch 2 times, most recently from 1660161 to fec4250 Compare July 15, 2025 12:47

scripts: synthetic prompt mode for server-bench.py

34fd4f8

JohannesGaessler force-pushed the server-bench-6 branch from fec4250 to 34fd4f8 Compare July 15, 2025 12:48

ggerganov mentioned this pull request Jul 16, 2025

server : fix handling of the ignore_eos flag #14710

Merged

ggerganov approved these changes Jul 16, 2025

View reviewed changes

JohannesGaessler merged commit 5cae766 into ggml-org:master Jul 16, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

scripts: synthetic prompt mode for server-bench.py #14695

scripts: synthetic prompt mode for server-bench.py #14695

JohannesGaessler commented Jul 15, 2025

Uh oh!

ggerganov commented Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

scripts: synthetic prompt mode for server-bench.py #14695

scripts: synthetic prompt mode for server-bench.py #14695

Conversation

JohannesGaessler commented Jul 15, 2025

Uh oh!

ggerganov commented Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!