From 431affd0a00cd9f03772465e1189fe42ad984da2 Mon Sep 17 00:00:00 2001 From: Jason Dai Date: Thu, 29 Aug 2024 18:56:35 +0800 Subject: [PATCH] Update README.md (#11964) --- .../NPU/HF-Transformers-AutoModels/LLM/README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md b/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md index 8cfbf490..2b59d29f 100644 --- a/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md +++ b/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md @@ -1,5 +1,5 @@ -# Run Large Language Model on Intel NPU -In this directory, you will find examples on how you could apply IPEX-LLM INT4 or INT8 optimizations on LLM models on [Intel NPUs](../../../README.md). See the table blow for verified models. +# Run HuggingFace `transformers` Models on Intel NPU +In this directory, you will find examples on how to directly run HuggingFace `transformers` models on Intel NPUs (leveraging *Intel NPU Acceleration Library*). See the table blow for verified models. ## Verified Models @@ -52,7 +52,7 @@ For optimal performance, it is recommended to set several environment variables. set BIGDL_USE_NPU=1 ``` -## 3. Run models +## 3. Run Models In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel NPUs. ``` @@ -77,7 +77,7 @@ done ``` ## 4. Run Optimized Models (Experimental) -The example below shows how to run the **_optimized model implementations_** on Intel NPU, including +The examples below show how to run the **_optimized HuggingFace model implementations_** on Intel NPU, including - [Llama2-7B](./llama.py) - [Llama3-8B](./llama.py) - [Qwen2-1.5B](./qwen2.py) @@ -92,7 +92,7 @@ Supported models: Llama2-7B, Qwen2-1.5B, Qwen2-7B, MiniCPM-1B, Baichuan2-7B #### 32.0.101.2715 Supported models: Llama3-8B, MiniCPM-2B -### Run Models +### Run ```bash # to run Llama-2-7b-chat-hf python llama.py