From 794abe2ce86762d671bc1fe27434d2d359a9e0ef Mon Sep 17 00:00:00 2001 From: Zijie Li Date: Thu, 22 Aug 2024 17:49:35 +0800 Subject: [PATCH] update npu-readme (#11900) --- .../example/NPU/HF-Transformers-AutoModels/LLM/README.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md b/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md index 22292e5b..111f1480 100644 --- a/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md +++ b/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md @@ -79,6 +79,12 @@ done ## Example 2: Predict Tokens using `generate()` API using multi processes In the example [llama2.py](./llama2.py) and [qwen2.py](./qwen2.py), we show an experimental support for a Llama2 / Qwen2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimization and fused decoderlayer optimization on Intel NPUs. + +> [!IMPORTANT] +> To run Qwen2 and Llama2 with IPEX-LLM on Intel NPUs, we recommend using version **32.0.100.2540** for the Intel NPU. +> +> Go to https://www.intel.com/content/www/us/en/download/794734/825735/intel-npu-driver-windows.html to download and unzip the driver. Then follow the same steps on [Requirements](#0-requirements). + ### 1. Install #### 1.1 Installation on Windows We suggest using conda to manage environment: