add --blocksize to doc and script (#12187)
This commit is contained in:
parent
6ffaec66a2
commit
49eb20613a
2 changed files with 2 additions and 0 deletions
|
|
@ -19,6 +19,7 @@ python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server \
|
||||||
--port 8000 \
|
--port 8000 \
|
||||||
--model $model \
|
--model $model \
|
||||||
--trust-remote-code \
|
--trust-remote-code \
|
||||||
|
--block-size 8 \
|
||||||
--gpu-memory-utilization 0.9 \
|
--gpu-memory-utilization 0.9 \
|
||||||
--device xpu \
|
--device xpu \
|
||||||
--dtype float16 \
|
--dtype float16 \
|
||||||
|
|
|
||||||
|
|
@ -103,6 +103,7 @@ Before performing benchmark or starting the service, you can refer to this [sect
|
||||||
|`--max-model-len`| Model context length. If unspecified, will be automatically derived from the model config.|
|
|`--max-model-len`| Model context length. If unspecified, will be automatically derived from the model config.|
|
||||||
|`--max-num-batched-token`| Maximum number of batched tokens per iteration.|
|
|`--max-num-batched-token`| Maximum number of batched tokens per iteration.|
|
||||||
|`--max-num-seq`| Maximum number of sequences per iteration. Default: 256|
|
|`--max-num-seq`| Maximum number of sequences per iteration. Default: 256|
|
||||||
|
|`--block-size`| vLLM block size. Set to 8 to achieve a performance boost.|
|
||||||
|
|
||||||
#### Single card serving
|
#### Single card serving
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue