update llama3 npu example (#11933)

This commit is contained in:
Yina Chen 2024-08-27 08:03:18 +03:00 committed by GitHub
parent 14dddfc0d6
commit e246f1e258
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 11 additions and 4 deletions

View file

@ -78,12 +78,16 @@ done
## 4. Run Optimized Models (Experimental) ## 4. Run Optimized Models (Experimental)
The example below shows how to run the **_optimized model implementations_** on Intel NPU, including The example below shows how to run the **_optimized model implementations_** on Intel NPU, including
- [Llama2-7B](./llama2.py) - [Llama2-7B](./llama.py)
- [Llama3-8B](./llama.py)
- [Qwen2-1.5B](./qwen2.py) - [Qwen2-1.5B](./qwen2.py)
``` ```bash
# to run Llama-2-7b-chat-hf # to run Llama-2-7b-chat-hf
python llama2.py python llama.py
# to run Meta-Llama-3-8B-Instruct
python llama.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct
# to run Qwen2-1.5B-Instruct # to run Qwen2-1.5B-Instruct
python qwen2.py python qwen2.py
@ -102,7 +106,10 @@ Arguments info:
If you encounter output problem, please try to disable the optimization of transposing value cache with following command: If you encounter output problem, please try to disable the optimization of transposing value cache with following command:
```bash ```bash
# to run Llama-2-7b-chat-hf # to run Llama-2-7b-chat-hf
python  llama2.py --disable-transpose-value-cache python  llama.py --disable-transpose-value-cache
# to run Meta-Llama-3-8B-Instruct
python llama.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct --disable-transpose-value-cache
# to run Qwen2-1.5B-Instruct # to run Qwen2-1.5B-Instruct
python qwen2.py --disable-transpose-value-cache python qwen2.py --disable-transpose-value-cache