From b10fc892e100fd32c75a8f062039c182cd2645df Mon Sep 17 00:00:00 2001 From: Jun Wang Date: Fri, 18 Oct 2024 13:18:08 +0800 Subject: [PATCH] Update new reference link of xpu/docker/readme.md (#12188) * [ADD] rewrite new vllm docker quick start * [ADD] lora adapter doc finished * [ADD] mulit lora adapter test successfully * [ADD] add ipex-llm quantization doc * [Merge] rebase main * [REMOVE] rm tmp file * [Merge] rebase main * [ADD] add prefix caching experiment and result * [REMOVE] rm cpu offloading chapter * [ADD] rewrite new vllm docker quick start * [ADD] lora adapter doc finished * [ADD] mulit lora adapter test successfully * [ADD] add ipex-llm quantization doc * [Merge] rebase main * [REMOVE] rm tmp file * [Merge] rebase main * [ADD] rewrite new vllm docker quick start * [ADD] lora adapter doc finished * [ADD] mulit lora adapter test successfully * [ADD] add ipex-llm quantization doc * [Merge] rebase main * [REMOVE] rm tmp file * [Merge] rebase main * [UPDATE] update the link to new vllm-docker-quickstart --- docker/llm/serving/xpu/docker/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docker/llm/serving/xpu/docker/README.md b/docker/llm/serving/xpu/docker/README.md index 89d2ef3f..2d3d93cd 100644 --- a/docker/llm/serving/xpu/docker/README.md +++ b/docker/llm/serving/xpu/docker/README.md @@ -69,7 +69,7 @@ You can modify this script to using fastchat with either `ipex_llm_worker` or `v #### vLLM serving engine -To run vLLM engine using `IPEX-LLM` as backend, you can refer to this [document](https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/vLLM-Serving/README.md). +To run vLLM engine using `IPEX-LLM` as backend, you can refer to this [document](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/DockerGuides/vllm_docker_quickstart.md). We have included multiple example files in `/llm/`: 1. `vllm_offline_inference.py`: Used for vLLM offline inference example @@ -79,7 +79,7 @@ We have included multiple example files in `/llm/`: ##### Online benchmark throurgh api_server -We can benchmark the api_server to get an estimation about TPS (transactions per second). To do so, you need to start the service first according to the instructions in this [section](https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/vLLM-Serving/README.md#service). +We can benchmark the api_server to get an estimation about TPS (transactions per second). To do so, you need to start the service first according to the instructions in this [section](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/DockerGuides/vllm_docker_quickstart.md#Serving). ###### Online benchmark through benchmark_util