Add Qwen2.5 NPU Example (#12110)

* Add Qwen2.5 NPU Example

* fix

* Merge qwen2.py and qwen2.5.py into qwen.py

* Fix description
This commit is contained in:
Jin, Qiao 2024-09-25 15:20:03 +08:00 committed by GitHub
parent 657889e3e4
commit 2bedb17be7
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 24 additions and 13 deletions

View file

@ -10,6 +10,7 @@ In this directory, you will find examples on how to directly run HuggingFace `tr
| Chatglm3 | [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b) | | Chatglm3 | [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b) |
| Chatglm2 | [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b) | | Chatglm2 | [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b) |
| Qwen2 | [Qwen/Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct), [Qwen/Qwen2-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct) | | Qwen2 | [Qwen/Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct), [Qwen/Qwen2-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct) |
| Qwen2.5 | [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) |
| MiniCPM | [openbmb/MiniCPM-2B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) | | MiniCPM | [openbmb/MiniCPM-2B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) |
| Phi-3 | [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) | | Phi-3 | [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) |
| Stablelm | [stabilityai/stablelm-zephyr-3b](https://huggingface.co/stabilityai/stablelm-zephyr-3b) | | Stablelm | [stabilityai/stablelm-zephyr-3b](https://huggingface.co/stabilityai/stablelm-zephyr-3b) |
@ -81,8 +82,9 @@ done
The examples below show how to run the **_optimized HuggingFace model implementations_** on Intel NPU, including The examples below show how to run the **_optimized HuggingFace model implementations_** on Intel NPU, including
- [Llama2-7B](./llama.py) - [Llama2-7B](./llama.py)
- [Llama3-8B](./llama.py) - [Llama3-8B](./llama.py)
- [Qwen2-1.5B](./qwen2.py) - [Qwen2-1.5B](./qwen.py)
- [Qwen2-7B](./qwen2.py) - [Qwen2-7B](./qwen.py)
- [Qwen2.5-7B](./qwen.py)
- [MiniCPM-1B](./minicpm.py) - [MiniCPM-1B](./minicpm.py)
- [MiniCPM-2B](./minicpm.py) - [MiniCPM-2B](./minicpm.py)
- [Baichuan2-7B](./baichuan2.py) - [Baichuan2-7B](./baichuan2.py)
@ -95,7 +97,7 @@ Supported models: Llama2-7B, Llama3-8B, Qwen2-1.5B, Qwen2-7B, MiniCPM-1B, MiniCP
#### 32.0.100.2625 #### 32.0.100.2625
Supported models: Llama2-7B, MiniCPM-1B, Baichuan2-7B Supported models: Llama2-7B, MiniCPM-1B, Baichuan2-7B
#### 32.0.101.2715 #### 32.0.101.2715
Supported models: Llama3-8B, MiniCPM-2B, Qwen2-7B, Qwen2-1.5B Supported models: Llama3-8B, MiniCPM-2B, Qwen2-7B, Qwen2-1.5B, Qwen2.5-7B
### Run ### Run
```cmd ```cmd
@ -105,11 +107,14 @@ python llama.py
:: to run Meta-Llama-3-8B-Instruct (LNL driver version: 32.0.101.2715) :: to run Meta-Llama-3-8B-Instruct (LNL driver version: 32.0.101.2715)
python llama.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct python llama.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct
:: to run Qwen2-1.5B-Instruct LNL driver version: 32.0.101.2715) :: to run Qwen2-1.5B-Instruct (LNL driver version: 32.0.101.2715)
python qwen2.py python qwen.py
:: to run Qwen2-7B-Instruct LNL driver version: 32.0.101.2715) :: to run Qwen2-7B-Instruct (LNL driver version: 32.0.101.2715)
python qwen2.py --repo-id-or-model-path Qwen/Qwen2-7B-Instruct python qwen.py --repo-id-or-model-path Qwen/Qwen2-7B-Instruct
:: to run Qwen2.5-7B-Instruct (LNL driver version: 32.0.101.2715)
python qwen.py --repo-id-or-model-path Qwen/Qwen2.5-7B-Instruct
:: to run MiniCPM-1B-sft-bf16 :: to run MiniCPM-1B-sft-bf16
python minicpm.py python minicpm.py
@ -133,7 +138,7 @@ Arguments info:
### Troubleshooting ### Troubleshooting
#### `TypeError: can't convert meta device type tensor to numpy.` Error #### `TypeError: can't convert meta device type tensor to numpy.` Error
If you encounter `TypeError: can't convert meta device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.` error when loading lowbit model, please try re-saving the lowbit model with the example script you are currently using. Please note that lowbit models saved by `qwen2.py`, `llama.py`, etc. cannot be loaded by `generate.py`. If you encounter `TypeError: can't convert meta device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.` error when loading lowbit model, please try re-saving the lowbit model with the example script you are currently using. Please note that lowbit models saved by `qwen.py`, `llama.py`, etc. cannot be loaded by `generate.py`.
#### Output Problem #### Output Problem
If you encounter output problem, please try to disable the optimization of transposing value cache with following command: If you encounter output problem, please try to disable the optimization of transposing value cache with following command:
@ -145,10 +150,13 @@ python llama.py --disable-transpose-value-cache
python llama.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct --disable-transpose-value-cache python llama.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct --disable-transpose-value-cache
:: to run Qwen2-1.5B-Instruct (LNL driver version: 32.0.101.2715) :: to run Qwen2-1.5B-Instruct (LNL driver version: 32.0.101.2715)
python qwen2.py --disable-transpose-value-cache python qwen.py --disable-transpose-value-cache
:: to run Qwen2-7B-Instruct LNL driver version: 32.0.101.2715) :: to run Qwen2-7B-Instruct LNL driver version: 32.0.101.2715)
python qwen2.py --repo-id-or-model-path Qwen/Qwen2-7B-Instruct --disable-transpose-value-cache python qwen.py --repo-id-or-model-path Qwen/Qwen2-7B-Instruct --disable-transpose-value-cache
:: to run Qwen2.5-7B-Instruct LNL driver version: 32.0.101.2715)
python qwen.py --repo-id-or-model-path Qwen/Qwen2.5-7B-Instruct --disable-transpose-value-cache
:: to run MiniCPM-1B-sft-bf16 :: to run MiniCPM-1B-sft-bf16
python minicpm.py --disable-transpose-value-cache python minicpm.py --disable-transpose-value-cache
@ -160,10 +168,13 @@ python minicpm.py --repo-id-or-model-path openbmb/MiniCPM-2B-sft-bf16 --disable-
python baichuan2.py --disable-transpose-value-cache python baichuan2.py --disable-transpose-value-cache
``` ```
For [Qwen2-7B](./qwen2.py), you could also try to enable mixed precision optimization when encountering output problems: For [Qwen2-7B](./qwen.py) and [Qwen2.5-7B](./qwen.py), you could also try to enable mixed precision optimization when encountering output problems:
```cmd ```cmd
python qwen2.py --repo-id-or-model-path Qwen/Qwen2-7B-Instruct --mixed-precision python qwen.py --repo-id-or-model-path Qwen/Qwen2-7B-Instruct --mixed-precision
```
```cmd
python qwen.py --repo-id-or-model-path Qwen/Qwen2.5-7B-Instruct --mixed-precision
``` ```
#### Better Performance with High CPU Utilization #### Better Performance with High CPU Utilization

View file

@ -34,7 +34,7 @@ if __name__ == "__main__":
"--repo-id-or-model-path", "--repo-id-or-model-path",
type=str, type=str,
default="Qwen/Qwen2-1.5B-Instruct", default="Qwen/Qwen2-1.5B-Instruct",
help="The huggingface repo id for the Qwen2 model to be downloaded" help="The huggingface repo id for the Qwen2 or Qwen2.5 model to be downloaded"
", or the path to the huggingface checkpoint folder", ", or the path to the huggingface checkpoint folder",
) )
parser.add_argument("--lowbit-path", type=str, parser.add_argument("--lowbit-path", type=str,