update llama3 npu example (#11933)
This commit is contained in:
parent
14dddfc0d6
commit
e246f1e258
2 changed files with 11 additions and 4 deletions
|
|
@ -78,12 +78,16 @@ done
|
||||||
|
|
||||||
## 4. Run Optimized Models (Experimental)
|
## 4. Run Optimized Models (Experimental)
|
||||||
The example below shows how to run the **_optimized model implementations_** on Intel NPU, including
|
The example below shows how to run the **_optimized model implementations_** on Intel NPU, including
|
||||||
- [Llama2-7B](./llama2.py)
|
- [Llama2-7B](./llama.py)
|
||||||
|
- [Llama3-8B](./llama.py)
|
||||||
- [Qwen2-1.5B](./qwen2.py)
|
- [Qwen2-1.5B](./qwen2.py)
|
||||||
|
|
||||||
```
|
```bash
|
||||||
# to run Llama-2-7b-chat-hf
|
# to run Llama-2-7b-chat-hf
|
||||||
python llama2.py
|
python llama.py
|
||||||
|
|
||||||
|
# to run Meta-Llama-3-8B-Instruct
|
||||||
|
python llama.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct
|
||||||
|
|
||||||
# to run Qwen2-1.5B-Instruct
|
# to run Qwen2-1.5B-Instruct
|
||||||
python qwen2.py
|
python qwen2.py
|
||||||
|
|
@ -102,7 +106,10 @@ Arguments info:
|
||||||
If you encounter output problem, please try to disable the optimization of transposing value cache with following command:
|
If you encounter output problem, please try to disable the optimization of transposing value cache with following command:
|
||||||
```bash
|
```bash
|
||||||
# to run Llama-2-7b-chat-hf
|
# to run Llama-2-7b-chat-hf
|
||||||
python llama2.py --disable-transpose-value-cache
|
python llama.py --disable-transpose-value-cache
|
||||||
|
|
||||||
|
# to run Meta-Llama-3-8B-Instruct
|
||||||
|
python llama.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct --disable-transpose-value-cache
|
||||||
|
|
||||||
# to run Qwen2-1.5B-Instruct
|
# to run Qwen2-1.5B-Instruct
|
||||||
python qwen2.py --disable-transpose-value-cache
|
python qwen2.py --disable-transpose-value-cache
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue