Update new reference link of xpu/docker/readme.md (#12188)
* [ADD] rewrite new vllm docker quick start * [ADD] lora adapter doc finished * [ADD] mulit lora adapter test successfully * [ADD] add ipex-llm quantization doc * [Merge] rebase main * [REMOVE] rm tmp file * [Merge] rebase main * [ADD] add prefix caching experiment and result * [REMOVE] rm cpu offloading chapter * [ADD] rewrite new vllm docker quick start * [ADD] lora adapter doc finished * [ADD] mulit lora adapter test successfully * [ADD] add ipex-llm quantization doc * [Merge] rebase main * [REMOVE] rm tmp file * [Merge] rebase main * [ADD] rewrite new vllm docker quick start * [ADD] lora adapter doc finished * [ADD] mulit lora adapter test successfully * [ADD] add ipex-llm quantization doc * [Merge] rebase main * [REMOVE] rm tmp file * [Merge] rebase main * [UPDATE] update the link to new vllm-docker-quickstart
This commit is contained in:
parent
fe3b5cd89b
commit
b10fc892e1
1 changed files with 2 additions and 2 deletions
|
|
@ -69,7 +69,7 @@ You can modify this script to using fastchat with either `ipex_llm_worker` or `v
|
||||||
|
|
||||||
#### vLLM serving engine
|
#### vLLM serving engine
|
||||||
|
|
||||||
To run vLLM engine using `IPEX-LLM` as backend, you can refer to this [document](https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/vLLM-Serving/README.md).
|
To run vLLM engine using `IPEX-LLM` as backend, you can refer to this [document](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/DockerGuides/vllm_docker_quickstart.md).
|
||||||
|
|
||||||
We have included multiple example files in `/llm/`:
|
We have included multiple example files in `/llm/`:
|
||||||
1. `vllm_offline_inference.py`: Used for vLLM offline inference example
|
1. `vllm_offline_inference.py`: Used for vLLM offline inference example
|
||||||
|
|
@ -79,7 +79,7 @@ We have included multiple example files in `/llm/`:
|
||||||
|
|
||||||
##### Online benchmark throurgh api_server
|
##### Online benchmark throurgh api_server
|
||||||
|
|
||||||
We can benchmark the api_server to get an estimation about TPS (transactions per second). To do so, you need to start the service first according to the instructions in this [section](https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/vLLM-Serving/README.md#service).
|
We can benchmark the api_server to get an estimation about TPS (transactions per second). To do so, you need to start the service first according to the instructions in this [section](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/DockerGuides/vllm_docker_quickstart.md#Serving).
|
||||||
|
|
||||||
###### Online benchmark through benchmark_util
|
###### Online benchmark through benchmark_util
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue