Add Qwen2.5 NPU Example (#12110)
* Add Qwen2.5 NPU Example * fix * Merge qwen2.py and qwen2.5.py into qwen.py * Fix description
This commit is contained in:
parent
657889e3e4
commit
2bedb17be7
2 changed files with 24 additions and 13 deletions
|
|
@ -10,6 +10,7 @@ In this directory, you will find examples on how to directly run HuggingFace `tr
|
||||||
| Chatglm3 | [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b) |
|
| Chatglm3 | [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b) |
|
||||||
| Chatglm2 | [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b) |
|
| Chatglm2 | [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b) |
|
||||||
| Qwen2 | [Qwen/Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct), [Qwen/Qwen2-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct) |
|
| Qwen2 | [Qwen/Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct), [Qwen/Qwen2-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct) |
|
||||||
|
| Qwen2.5 | [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) |
|
||||||
| MiniCPM | [openbmb/MiniCPM-2B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) |
|
| MiniCPM | [openbmb/MiniCPM-2B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) |
|
||||||
| Phi-3 | [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) |
|
| Phi-3 | [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) |
|
||||||
| Stablelm | [stabilityai/stablelm-zephyr-3b](https://huggingface.co/stabilityai/stablelm-zephyr-3b) |
|
| Stablelm | [stabilityai/stablelm-zephyr-3b](https://huggingface.co/stabilityai/stablelm-zephyr-3b) |
|
||||||
|
|
@ -81,8 +82,9 @@ done
|
||||||
The examples below show how to run the **_optimized HuggingFace model implementations_** on Intel NPU, including
|
The examples below show how to run the **_optimized HuggingFace model implementations_** on Intel NPU, including
|
||||||
- [Llama2-7B](./llama.py)
|
- [Llama2-7B](./llama.py)
|
||||||
- [Llama3-8B](./llama.py)
|
- [Llama3-8B](./llama.py)
|
||||||
- [Qwen2-1.5B](./qwen2.py)
|
- [Qwen2-1.5B](./qwen.py)
|
||||||
- [Qwen2-7B](./qwen2.py)
|
- [Qwen2-7B](./qwen.py)
|
||||||
|
- [Qwen2.5-7B](./qwen.py)
|
||||||
- [MiniCPM-1B](./minicpm.py)
|
- [MiniCPM-1B](./minicpm.py)
|
||||||
- [MiniCPM-2B](./minicpm.py)
|
- [MiniCPM-2B](./minicpm.py)
|
||||||
- [Baichuan2-7B](./baichuan2.py)
|
- [Baichuan2-7B](./baichuan2.py)
|
||||||
|
|
@ -95,7 +97,7 @@ Supported models: Llama2-7B, Llama3-8B, Qwen2-1.5B, Qwen2-7B, MiniCPM-1B, MiniCP
|
||||||
#### 32.0.100.2625
|
#### 32.0.100.2625
|
||||||
Supported models: Llama2-7B, MiniCPM-1B, Baichuan2-7B
|
Supported models: Llama2-7B, MiniCPM-1B, Baichuan2-7B
|
||||||
#### 32.0.101.2715
|
#### 32.0.101.2715
|
||||||
Supported models: Llama3-8B, MiniCPM-2B, Qwen2-7B, Qwen2-1.5B
|
Supported models: Llama3-8B, MiniCPM-2B, Qwen2-7B, Qwen2-1.5B, Qwen2.5-7B
|
||||||
|
|
||||||
### Run
|
### Run
|
||||||
```cmd
|
```cmd
|
||||||
|
|
@ -105,11 +107,14 @@ python llama.py
|
||||||
:: to run Meta-Llama-3-8B-Instruct (LNL driver version: 32.0.101.2715)
|
:: to run Meta-Llama-3-8B-Instruct (LNL driver version: 32.0.101.2715)
|
||||||
python llama.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct
|
python llama.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct
|
||||||
|
|
||||||
:: to run Qwen2-1.5B-Instruct LNL driver version: 32.0.101.2715)
|
:: to run Qwen2-1.5B-Instruct (LNL driver version: 32.0.101.2715)
|
||||||
python qwen2.py
|
python qwen.py
|
||||||
|
|
||||||
:: to run Qwen2-7B-Instruct LNL driver version: 32.0.101.2715)
|
:: to run Qwen2-7B-Instruct (LNL driver version: 32.0.101.2715)
|
||||||
python qwen2.py --repo-id-or-model-path Qwen/Qwen2-7B-Instruct
|
python qwen.py --repo-id-or-model-path Qwen/Qwen2-7B-Instruct
|
||||||
|
|
||||||
|
:: to run Qwen2.5-7B-Instruct (LNL driver version: 32.0.101.2715)
|
||||||
|
python qwen.py --repo-id-or-model-path Qwen/Qwen2.5-7B-Instruct
|
||||||
|
|
||||||
:: to run MiniCPM-1B-sft-bf16
|
:: to run MiniCPM-1B-sft-bf16
|
||||||
python minicpm.py
|
python minicpm.py
|
||||||
|
|
@ -133,7 +138,7 @@ Arguments info:
|
||||||
### Troubleshooting
|
### Troubleshooting
|
||||||
|
|
||||||
#### `TypeError: can't convert meta device type tensor to numpy.` Error
|
#### `TypeError: can't convert meta device type tensor to numpy.` Error
|
||||||
If you encounter `TypeError: can't convert meta device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.` error when loading lowbit model, please try re-saving the lowbit model with the example script you are currently using. Please note that lowbit models saved by `qwen2.py`, `llama.py`, etc. cannot be loaded by `generate.py`.
|
If you encounter `TypeError: can't convert meta device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.` error when loading lowbit model, please try re-saving the lowbit model with the example script you are currently using. Please note that lowbit models saved by `qwen.py`, `llama.py`, etc. cannot be loaded by `generate.py`.
|
||||||
|
|
||||||
#### Output Problem
|
#### Output Problem
|
||||||
If you encounter output problem, please try to disable the optimization of transposing value cache with following command:
|
If you encounter output problem, please try to disable the optimization of transposing value cache with following command:
|
||||||
|
|
@ -145,10 +150,13 @@ python llama.py --disable-transpose-value-cache
|
||||||
python llama.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct --disable-transpose-value-cache
|
python llama.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct --disable-transpose-value-cache
|
||||||
|
|
||||||
:: to run Qwen2-1.5B-Instruct (LNL driver version: 32.0.101.2715)
|
:: to run Qwen2-1.5B-Instruct (LNL driver version: 32.0.101.2715)
|
||||||
python qwen2.py --disable-transpose-value-cache
|
python qwen.py --disable-transpose-value-cache
|
||||||
|
|
||||||
:: to run Qwen2-7B-Instruct LNL driver version: 32.0.101.2715)
|
:: to run Qwen2-7B-Instruct LNL driver version: 32.0.101.2715)
|
||||||
python qwen2.py --repo-id-or-model-path Qwen/Qwen2-7B-Instruct --disable-transpose-value-cache
|
python qwen.py --repo-id-or-model-path Qwen/Qwen2-7B-Instruct --disable-transpose-value-cache
|
||||||
|
|
||||||
|
:: to run Qwen2.5-7B-Instruct LNL driver version: 32.0.101.2715)
|
||||||
|
python qwen.py --repo-id-or-model-path Qwen/Qwen2.5-7B-Instruct --disable-transpose-value-cache
|
||||||
|
|
||||||
:: to run MiniCPM-1B-sft-bf16
|
:: to run MiniCPM-1B-sft-bf16
|
||||||
python minicpm.py --disable-transpose-value-cache
|
python minicpm.py --disable-transpose-value-cache
|
||||||
|
|
@ -160,10 +168,13 @@ python minicpm.py --repo-id-or-model-path openbmb/MiniCPM-2B-sft-bf16 --disable-
|
||||||
python baichuan2.py --disable-transpose-value-cache
|
python baichuan2.py --disable-transpose-value-cache
|
||||||
```
|
```
|
||||||
|
|
||||||
For [Qwen2-7B](./qwen2.py), you could also try to enable mixed precision optimization when encountering output problems:
|
For [Qwen2-7B](./qwen.py) and [Qwen2.5-7B](./qwen.py), you could also try to enable mixed precision optimization when encountering output problems:
|
||||||
|
|
||||||
```cmd
|
```cmd
|
||||||
python qwen2.py --repo-id-or-model-path Qwen/Qwen2-7B-Instruct --mixed-precision
|
python qwen.py --repo-id-or-model-path Qwen/Qwen2-7B-Instruct --mixed-precision
|
||||||
|
```
|
||||||
|
```cmd
|
||||||
|
python qwen.py --repo-id-or-model-path Qwen/Qwen2.5-7B-Instruct --mixed-precision
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Better Performance with High CPU Utilization
|
#### Better Performance with High CPU Utilization
|
||||||
|
|
|
||||||
|
|
@ -34,7 +34,7 @@ if __name__ == "__main__":
|
||||||
"--repo-id-or-model-path",
|
"--repo-id-or-model-path",
|
||||||
type=str,
|
type=str,
|
||||||
default="Qwen/Qwen2-1.5B-Instruct",
|
default="Qwen/Qwen2-1.5B-Instruct",
|
||||||
help="The huggingface repo id for the Qwen2 model to be downloaded"
|
help="The huggingface repo id for the Qwen2 or Qwen2.5 model to be downloaded"
|
||||||
", or the path to the huggingface checkpoint folder",
|
", or the path to the huggingface checkpoint folder",
|
||||||
)
|
)
|
||||||
parser.add_argument("--lowbit-path", type=str,
|
parser.add_argument("--lowbit-path", type=str,
|
||||||
Loading…
Reference in a new issue