Update readme & doc for the vllm upgrade to v0.6.2 (#12399)

Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-11-14 10:28:15 +08:00 · 2024-11-14 10:28:15 +08:00 · 6726b198fd
commit 6726b198fd
parent 59b01fa7d2
2 changed files with 4 additions and 0 deletions
--- a/docker/llm/serving/xpu/docker/README.md
+++ b/docker/llm/serving/xpu/docker/README.md
@ -123,6 +123,8 @@ To set up model serving using `IPEX-LLM` as backend using FastChat, you can refe
    --model-path /llm/models/Yi-1.5-34B \
    --device xpu \
    --enforce-eager \
+    --disable-async-output-proc \
+    --distributed-executor-backend ray \
    --dtype float16 \
    --load-in-low-bit fp8 \
    --tensor-parallel-size 4 \
--- a/docs/mddocs/DockerGuides/vllm_docker_quickstart.md
+++ b/docs/mddocs/DockerGuides/vllm_docker_quickstart.md
@ -852,6 +852,8 @@ We can set up model serving using `IPEX-LLM` as backend using FastChat, the foll
    --model-path /llm/models/Yi-1.5-34B \
    --device xpu \
    --enforce-eager \
+    --disable-async-output-proc \
+    --distributed-executor-backend ray \
    --dtype float16 \
    --load-in-low-bit fp8 \
    --tensor-parallel-size 4 \