LLM: Update qwen readme (#10245)
This commit is contained in:
parent
15ad2fd72e
commit
6c74b99a28
2 changed files with 2 additions and 2 deletions
|
|
@ -92,7 +92,7 @@ First token latency x.xxxxs
|
||||||
```
|
```
|
||||||
|
|
||||||
### 4. Accelerate with BIGDL_OPT_IPEX
|
### 4. Accelerate with BIGDL_OPT_IPEX
|
||||||
|
`Qwen/Qwen-72B-Chat` is not supported using ipex now.
|
||||||
To accelerate speculative decoding on CPU, you can install our validated version of [IPEX 2.3.0+git004cd72d](https://github.com/intel/intel-extension-for-pytorch/tree/004cd72db60e87bb0712d42e3120bac9854bd77e) by following steps: (Other versions of IPEX may have some conflicts and can not accelerate speculative decoding correctly.)
|
To accelerate speculative decoding on CPU, you can install our validated version of [IPEX 2.3.0+git004cd72d](https://github.com/intel/intel-extension-for-pytorch/tree/004cd72db60e87bb0712d42e3120bac9854bd77e) by following steps: (Other versions of IPEX may have some conflicts and can not accelerate speculative decoding correctly.)
|
||||||
|
|
||||||
#### 4.1 Download IPEX installation script
|
#### 4.1 Download IPEX installation script
|
||||||
|
|
|
||||||
|
|
@ -475,7 +475,7 @@ def speculative_generate(self,
|
||||||
("qwen" in self.config.model_type) or
|
("qwen" in self.config.model_type) or
|
||||||
("chatglm" in self.config.model_type)):
|
("chatglm" in self.config.model_type)):
|
||||||
invalidInputError(False, "BigDL Speculative Decoding with IPEX BF16 only supports \
|
invalidInputError(False, "BigDL Speculative Decoding with IPEX BF16 only supports \
|
||||||
Llama, Baichuan2, Mistral and ChatGLM and Qwen models currently.")
|
Llama, Baichuan2, Mistral, ChatGLM and Qwen models currently.")
|
||||||
if "chatglm" in self.config.model_type:
|
if "chatglm" in self.config.model_type:
|
||||||
global query_group_size
|
global query_group_size
|
||||||
query_group_size = draft_model.config.num_attention_heads // \
|
query_group_size = draft_model.config.num_attention_heads // \
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue