update npu-readme (#11900)

This commit is contained in:
Zijie Li 2024-08-22 17:49:35 +08:00 committed by GitHub
parent 278b191dc1
commit 794abe2ce8
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -79,6 +79,12 @@ done
## Example 2: Predict Tokens using `generate()` API using multi processes ## Example 2: Predict Tokens using `generate()` API using multi processes
In the example [llama2.py](./llama2.py) and [qwen2.py](./qwen2.py), we show an experimental support for a Llama2 / Qwen2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimization and fused decoderlayer optimization on Intel NPUs. In the example [llama2.py](./llama2.py) and [qwen2.py](./qwen2.py), we show an experimental support for a Llama2 / Qwen2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimization and fused decoderlayer optimization on Intel NPUs.
> [!IMPORTANT]
> To run Qwen2 and Llama2 with IPEX-LLM on Intel NPUs, we recommend using version **32.0.100.2540** for the Intel NPU.
>
> Go to https://www.intel.com/content/www/us/en/download/794734/825735/intel-npu-driver-windows.html to download and unzip the driver. Then follow the same steps on [Requirements](#0-requirements).
### 1. Install ### 1. Install
#### 1.1 Installation on Windows #### 1.1 Installation on Windows
We suggest using conda to manage environment: We suggest using conda to manage environment: