From 794abe2ce86762d671bc1fe27434d2d359a9e0ef Mon Sep 17 00:00:00 2001
From: Zijie Li <michael20001122@gmail.com>
Date: Thu, 22 Aug 2024 17:49:35 +0800
Subject: [PATCH] update npu-readme (#11900)

---
 .../example/NPU/HF-Transformers-AutoModels/LLM/README.md    | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md b/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md
index 22292e5b..111f1480 100644
--- a/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md
+++ b/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md
@@ -79,6 +79,12 @@ done
 
 ## Example 2: Predict Tokens using `generate()` API using multi processes
 In the example [llama2.py](./llama2.py) and [qwen2.py](./qwen2.py), we show an experimental support for a Llama2 / Qwen2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimization and fused decoderlayer optimization on Intel NPUs.
+
+> [!IMPORTANT]
+> To run Qwen2 and Llama2 with IPEX-LLM on Intel NPUs, we recommend using version **32.0.100.2540** for the Intel NPU.
+> 
+> Go to https://www.intel.com/content/www/us/en/download/794734/825735/intel-npu-driver-windows.html to download and unzip the driver. Then follow the same steps on [Requirements](#0-requirements).
+
 ### 1. Install
 #### 1.1 Installation on Windows
 We suggest using conda to manage environment: