update llama3 npu example (#11933)
This commit is contained in:
parent
14dddfc0d6
commit
e246f1e258
2 changed files with 11 additions and 4 deletions
|
|
@ -78,12 +78,16 @@ done
|
|||
|
||||
## 4. Run Optimized Models (Experimental)
|
||||
The example below shows how to run the **_optimized model implementations_** on Intel NPU, including
|
||||
- [Llama2-7B](./llama2.py)
|
||||
- [Llama2-7B](./llama.py)
|
||||
- [Llama3-8B](./llama.py)
|
||||
- [Qwen2-1.5B](./qwen2.py)
|
||||
|
||||
```
|
||||
```bash
|
||||
# to run Llama-2-7b-chat-hf
|
||||
python llama2.py
|
||||
python llama.py
|
||||
|
||||
# to run Meta-Llama-3-8B-Instruct
|
||||
python llama.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct
|
||||
|
||||
# to run Qwen2-1.5B-Instruct
|
||||
python qwen2.py
|
||||
|
|
@ -102,7 +106,10 @@ Arguments info:
|
|||
If you encounter output problem, please try to disable the optimization of transposing value cache with following command:
|
||||
```bash
|
||||
# to run Llama-2-7b-chat-hf
|
||||
python llama2.py --disable-transpose-value-cache
|
||||
python llama.py --disable-transpose-value-cache
|
||||
|
||||
# to run Meta-Llama-3-8B-Instruct
|
||||
python llama.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct --disable-transpose-value-cache
|
||||
|
||||
# to run Qwen2-1.5B-Instruct
|
||||
python qwen2.py --disable-transpose-value-cache
|
||||
|
|
|
|||
Loading…
Reference in a new issue