Add pp_serving verified models (#11498)

* add verified models * update * verify large model * update commend
2024-07-03 14:57:09 +08:00 · 2024-07-03 14:57:09 +08:00 · 61c36ba085
commit 61c36ba085
parent 9274282ef7
1 changed files with 25 additions and 0 deletions
--- a/python/llm/example/GPU/Pipeline-Parallel-FastAPI/README.md
+++ b/python/llm/example/GPU/Pipeline-Parallel-FastAPI/README.md
@ -6,6 +6,27 @@ This example demonstrates how to run IPEX-LLM serving on multiple [Intel GPUs](.

 To run this example with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. For this particular example, you will need at least two GPUs on your machine.

+## Verified Models
+- meta-llama/Llama-2-7b-chat-hf
+- meta-llama/Llama-2-13b-chat-hf
+- meta-llama/Meta-Llama-3-8B-Instruct
+- meta-llama/Meta-Llama-3-70B-Instruct
+- codellama/CodeLlama-13b-hf
+- Qwen/Qwen2-7B-Instruct
+- Qwen/Qwen1.5-7B-Chat
+- Qwen/Qwen1.5-14B-Chat
+- Qwen/Qwen1.5-32B-Chat
+- Qwen/Qwen1.5-MoE-A2.7B-Chat
+- Qwen/CodeQwen1.5-7B-Chat
+- THUDM/chatglm3-6b
+- THUDM/glm-4-9b-chat
+- baichuan-inc/Baichuan2-7B-Chat
+- baichuan-inc/Baichuan2-13B-Chat
+- microsoft/Phi-3-mini-4k-instruct
+- mistralai/Mistral-7B-v0.1
+- mistralai/Mixtral-8x7B-Instruct-v0.1
+- 01-ai/Yi-6B-Chat
+
 ## Example

 ### 1. Install
@ -23,6 +44,10 @@ pip install gradio # for gradio web UI
 conda install -c conda-forge -y gperftools=2.10 # to enable tcmalloc

 pip install transformers==4.37.0
+
+# only for Qwen1.5-MoE-A2.7B
+pip install transformers==4.40.0
+pip install trl==0.8.1
 ```

 ### 2. Run pipeline parallel serving on multiple GPUs