Add pp_serving verified models (#11498)
* add verified models * update * verify large model * update commend
This commit is contained in:
		
							parent
							
								
									9274282ef7
								
							
						
					
					
						commit
						61c36ba085
					
				
					 1 changed files with 25 additions and 0 deletions
				
			
		| 
						 | 
					@ -6,6 +6,27 @@ This example demonstrates how to run IPEX-LLM serving on multiple [Intel GPUs](.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
To run this example with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. For this particular example, you will need at least two GPUs on your machine.
 | 
					To run this example with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. For this particular example, you will need at least two GPUs on your machine.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Verified Models
 | 
				
			||||||
 | 
					- meta-llama/Llama-2-7b-chat-hf
 | 
				
			||||||
 | 
					- meta-llama/Llama-2-13b-chat-hf
 | 
				
			||||||
 | 
					- meta-llama/Meta-Llama-3-8B-Instruct
 | 
				
			||||||
 | 
					- meta-llama/Meta-Llama-3-70B-Instruct
 | 
				
			||||||
 | 
					- codellama/CodeLlama-13b-hf
 | 
				
			||||||
 | 
					- Qwen/Qwen2-7B-Instruct
 | 
				
			||||||
 | 
					- Qwen/Qwen1.5-7B-Chat
 | 
				
			||||||
 | 
					- Qwen/Qwen1.5-14B-Chat
 | 
				
			||||||
 | 
					- Qwen/Qwen1.5-32B-Chat
 | 
				
			||||||
 | 
					- Qwen/Qwen1.5-MoE-A2.7B-Chat
 | 
				
			||||||
 | 
					- Qwen/CodeQwen1.5-7B-Chat
 | 
				
			||||||
 | 
					- THUDM/chatglm3-6b
 | 
				
			||||||
 | 
					- THUDM/glm-4-9b-chat
 | 
				
			||||||
 | 
					- baichuan-inc/Baichuan2-7B-Chat
 | 
				
			||||||
 | 
					- baichuan-inc/Baichuan2-13B-Chat
 | 
				
			||||||
 | 
					- microsoft/Phi-3-mini-4k-instruct
 | 
				
			||||||
 | 
					- mistralai/Mistral-7B-v0.1
 | 
				
			||||||
 | 
					- mistralai/Mixtral-8x7B-Instruct-v0.1
 | 
				
			||||||
 | 
					- 01-ai/Yi-6B-Chat
 | 
				
			||||||
 | 
					
 | 
				
			||||||
## Example
 | 
					## Example
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### 1. Install
 | 
					### 1. Install
 | 
				
			||||||
| 
						 | 
					@ -23,6 +44,10 @@ pip install gradio # for gradio web UI
 | 
				
			||||||
conda install -c conda-forge -y gperftools=2.10 # to enable tcmalloc
 | 
					conda install -c conda-forge -y gperftools=2.10 # to enable tcmalloc
 | 
				
			||||||
 | 
					
 | 
				
			||||||
pip install transformers==4.37.0
 | 
					pip install transformers==4.37.0
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					# only for Qwen1.5-MoE-A2.7B
 | 
				
			||||||
 | 
					pip install transformers==4.40.0
 | 
				
			||||||
 | 
					pip install trl==0.8.1
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### 2. Run pipeline parallel serving on multiple GPUs
 | 
					### 2. Run pipeline parallel serving on multiple GPUs
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in a new issue