Update readme & doc for the vllm upgrade to v0.6.2 (#12399)

Co-authored-by: ATMxsp01 <shou.xu@intel.com>
This commit is contained in:
Xu, Shuo 2024-11-14 10:28:15 +08:00 committed by GitHub
parent 59b01fa7d2
commit 6726b198fd
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 4 additions and 0 deletions

View file

@ -123,6 +123,8 @@ To set up model serving using `IPEX-LLM` as backend using FastChat, you can refe
--model-path /llm/models/Yi-1.5-34B \ --model-path /llm/models/Yi-1.5-34B \
--device xpu \ --device xpu \
--enforce-eager \ --enforce-eager \
--disable-async-output-proc \
--distributed-executor-backend ray \
--dtype float16 \ --dtype float16 \
--load-in-low-bit fp8 \ --load-in-low-bit fp8 \
--tensor-parallel-size 4 \ --tensor-parallel-size 4 \

View file

@ -852,6 +852,8 @@ We can set up model serving using `IPEX-LLM` as backend using FastChat, the foll
--model-path /llm/models/Yi-1.5-34B \ --model-path /llm/models/Yi-1.5-34B \
--device xpu \ --device xpu \
--enforce-eager \ --enforce-eager \
--disable-async-output-proc \
--distributed-executor-backend ray \
--dtype float16 \ --dtype float16 \
--load-in-low-bit fp8 \ --load-in-low-bit fp8 \
--tensor-parallel-size 4 \ --tensor-parallel-size 4 \