Update readme & doc for the vllm upgrade to v0.6.2 (#12399)
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
This commit is contained in:
parent
59b01fa7d2
commit
6726b198fd
2 changed files with 4 additions and 0 deletions
|
|
@ -123,6 +123,8 @@ To set up model serving using `IPEX-LLM` as backend using FastChat, you can refe
|
||||||
--model-path /llm/models/Yi-1.5-34B \
|
--model-path /llm/models/Yi-1.5-34B \
|
||||||
--device xpu \
|
--device xpu \
|
||||||
--enforce-eager \
|
--enforce-eager \
|
||||||
|
--disable-async-output-proc \
|
||||||
|
--distributed-executor-backend ray \
|
||||||
--dtype float16 \
|
--dtype float16 \
|
||||||
--load-in-low-bit fp8 \
|
--load-in-low-bit fp8 \
|
||||||
--tensor-parallel-size 4 \
|
--tensor-parallel-size 4 \
|
||||||
|
|
|
||||||
|
|
@ -852,6 +852,8 @@ We can set up model serving using `IPEX-LLM` as backend using FastChat, the foll
|
||||||
--model-path /llm/models/Yi-1.5-34B \
|
--model-path /llm/models/Yi-1.5-34B \
|
||||||
--device xpu \
|
--device xpu \
|
||||||
--enforce-eager \
|
--enforce-eager \
|
||||||
|
--disable-async-output-proc \
|
||||||
|
--distributed-executor-backend ray \
|
||||||
--dtype float16 \
|
--dtype float16 \
|
||||||
--load-in-low-bit fp8 \
|
--load-in-low-bit fp8 \
|
||||||
--tensor-parallel-size 4 \
|
--tensor-parallel-size 4 \
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue