diff --git a/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md b/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md index 52d71ed4..80c82880 100644 --- a/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md +++ b/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md @@ -78,12 +78,16 @@ done ## 4. Run Optimized Models (Experimental) The example below shows how to run the **_optimized model implementations_** on Intel NPU, including -- [Llama2-7B](./llama2.py) +- [Llama2-7B](./llama.py) +- [Llama3-8B](./llama.py) - [Qwen2-1.5B](./qwen2.py) -``` +```bash # to run Llama-2-7b-chat-hf -python llama2.py +python llama.py + +# to run Meta-Llama-3-8B-Instruct +python llama.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct # to run Qwen2-1.5B-Instruct python qwen2.py @@ -102,7 +106,10 @@ Arguments info: If you encounter output problem, please try to disable the optimization of transposing value cache with following command: ```bash # to run Llama-2-7b-chat-hf -python  llama2.py --disable-transpose-value-cache +python  llama.py --disable-transpose-value-cache + +# to run Meta-Llama-3-8B-Instruct +python llama.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct --disable-transpose-value-cache # to run Qwen2-1.5B-Instruct python qwen2.py --disable-transpose-value-cache diff --git a/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/llama2.py b/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/llama.py similarity index 100% rename from python/llm/example/NPU/HF-Transformers-AutoModels/LLM/llama2.py rename to python/llm/example/NPU/HF-Transformers-AutoModels/LLM/llama.py