Update readme & doc for the vllm upgrade to v0.6.2 (#12399)
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
This commit is contained in:
parent
59b01fa7d2
commit
6726b198fd
2 changed files with 4 additions and 0 deletions
|
|
@ -123,6 +123,8 @@ To set up model serving using `IPEX-LLM` as backend using FastChat, you can refe
|
|||
--model-path /llm/models/Yi-1.5-34B \
|
||||
--device xpu \
|
||||
--enforce-eager \
|
||||
--disable-async-output-proc \
|
||||
--distributed-executor-backend ray \
|
||||
--dtype float16 \
|
||||
--load-in-low-bit fp8 \
|
||||
--tensor-parallel-size 4 \
|
||||
|
|
|
|||
|
|
@ -852,6 +852,8 @@ We can set up model serving using `IPEX-LLM` as backend using FastChat, the foll
|
|||
--model-path /llm/models/Yi-1.5-34B \
|
||||
--device xpu \
|
||||
--enforce-eager \
|
||||
--disable-async-output-proc \
|
||||
--distributed-executor-backend ray \
|
||||
--dtype float16 \
|
||||
--load-in-low-bit fp8 \
|
||||
--tensor-parallel-size 4 \
|
||||
|
|
|
|||
Loading…
Reference in a new issue