update vllm_online_benchmark script to support long input (#12095)

* update vllm_online_benchmark script to support long input * update guide
2024-09-20 14:18:30 +08:00 · 2024-09-20 14:18:30 +08:00 · 1295898830
commit 1295898830
parent 9650bf616a
2 changed files with 187 additions and 5 deletions
--- a/docker/llm/serving/xpu/docker/README.md
+++ b/docker/llm/serving/xpu/docker/README.md
@ -85,8 +85,9 @@ We can benchmark the api_server to get an estimation about TPS (transactions per
 After starting vllm service, Sending reqs through `vllm_online_benchmark.py`
 ```bash
-python vllm_online_benchmark.py $model_name $max_seqs
+python vllm_online_benchmark.py $model_name $max_seqs $input_length $output_length
 ```
 If `input_length` and `output_length` are not provided, the script will use the default values of 1024 and 512, respectively.
 And it will output like this:
 ```bash
--- a/docker/llm/serving/xpu/docker/vllm_online_benchmark.py
+++ b/docker/llm/serving/xpu/docker/vllm_online_benchmark.py