Update new reference link of xpu/docker/readme.md (#12188)

* [ADD] rewrite new vllm docker quick start

* [ADD] lora adapter doc finished

* [ADD] mulit lora adapter test successfully

* [ADD] add ipex-llm quantization doc

* [Merge] rebase main

* [REMOVE] rm tmp file

* [Merge] rebase main

* [ADD] add prefix caching experiment and result

* [REMOVE] rm cpu offloading chapter

* [ADD] rewrite new vllm docker quick start

* [ADD] lora adapter doc finished

* [ADD] mulit lora adapter test successfully

* [ADD] add ipex-llm quantization doc

* [Merge] rebase main

* [REMOVE] rm tmp file

* [Merge] rebase main

* [ADD] rewrite new vllm docker quick start

* [ADD] lora adapter doc finished

* [ADD] mulit lora adapter test successfully

* [ADD] add ipex-llm quantization doc

* [Merge] rebase main

* [REMOVE] rm tmp file

* [Merge] rebase main

* [UPDATE] update the link to new vllm-docker-quickstart
This commit is contained in:
Jun Wang 2024-10-18 13:18:08 +08:00 committed by GitHub
parent fe3b5cd89b
commit b10fc892e1
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -69,7 +69,7 @@ You can modify this script to using fastchat with either `ipex_llm_worker` or `v
#### vLLM serving engine
To run vLLM engine using `IPEX-LLM` as backend, you can refer to this [document](https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/vLLM-Serving/README.md).
To run vLLM engine using `IPEX-LLM` as backend, you can refer to this [document](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/DockerGuides/vllm_docker_quickstart.md).
We have included multiple example files in `/llm/`:
1. `vllm_offline_inference.py`: Used for vLLM offline inference example
@ -79,7 +79,7 @@ We have included multiple example files in `/llm/`:
##### Online benchmark throurgh api_server
We can benchmark the api_server to get an estimation about TPS (transactions per second). To do so, you need to start the service first according to the instructions in this [section](https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/vLLM-Serving/README.md#service).
We can benchmark the api_server to get an estimation about TPS (transactions per second). To do so, you need to start the service first according to the instructions in this [section](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/DockerGuides/vllm_docker_quickstart.md#Serving).
###### Online benchmark through benchmark_util