LLM: Update qwen readme (#10245)
This commit is contained in:
		
							parent
							
								
									15ad2fd72e
								
							
						
					
					
						commit
						6c74b99a28
					
				
					 2 changed files with 2 additions and 2 deletions
				
			
		| 
						 | 
					@ -92,7 +92,7 @@ First token latency x.xxxxs
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### 4. Accelerate with BIGDL_OPT_IPEX
 | 
					### 4. Accelerate with BIGDL_OPT_IPEX
 | 
				
			||||||
 | 
					`Qwen/Qwen-72B-Chat` is not supported using ipex now.
 | 
				
			||||||
To accelerate speculative decoding on CPU, you can install our validated version of [IPEX 2.3.0+git004cd72d](https://github.com/intel/intel-extension-for-pytorch/tree/004cd72db60e87bb0712d42e3120bac9854bd77e) by following steps: (Other versions of IPEX may have some conflicts and can not accelerate speculative decoding correctly.)
 | 
					To accelerate speculative decoding on CPU, you can install our validated version of [IPEX 2.3.0+git004cd72d](https://github.com/intel/intel-extension-for-pytorch/tree/004cd72db60e87bb0712d42e3120bac9854bd77e) by following steps: (Other versions of IPEX may have some conflicts and can not accelerate speculative decoding correctly.)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
#### 4.1 Download IPEX installation script
 | 
					#### 4.1 Download IPEX installation script
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -475,7 +475,7 @@ def speculative_generate(self,
 | 
				
			||||||
                ("qwen" in self.config.model_type) or
 | 
					                ("qwen" in self.config.model_type) or
 | 
				
			||||||
                ("chatglm" in self.config.model_type)):
 | 
					                ("chatglm" in self.config.model_type)):
 | 
				
			||||||
            invalidInputError(False, "BigDL Speculative Decoding with IPEX BF16 only supports \
 | 
					            invalidInputError(False, "BigDL Speculative Decoding with IPEX BF16 only supports \
 | 
				
			||||||
                                      Llama, Baichuan2, Mistral and ChatGLM and Qwen models currently.")
 | 
					                              Llama, Baichuan2, Mistral, ChatGLM and Qwen models currently.")
 | 
				
			||||||
        if "chatglm" in self.config.model_type:
 | 
					        if "chatglm" in self.config.model_type:
 | 
				
			||||||
            global query_group_size
 | 
					            global query_group_size
 | 
				
			||||||
            query_group_size = draft_model.config.num_attention_heads // \
 | 
					            query_group_size = draft_model.config.num_attention_heads // \
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in a new issue