diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila/README.md index f2b57eb4..316a9652 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila/README.md @@ -21,27 +21,30 @@ conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` + #### 1.2 Installation on Windows We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. - ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -52,6 +55,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ``` @@ -63,11 +67,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -82,7 +98,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -90,15 +106,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila2/README.md index b68ff6df..3cdfd4b7 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila2/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila2/README.md @@ -27,22 +27,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. - ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -53,6 +55,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -64,11 +67,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -83,7 +98,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -91,15 +106,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan/README.md index 105e1c0b..b56e272d 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan/README.md @@ -14,6 +14,7 @@ conda create -n llm python=3.11 conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install transformers_stream_generator # additional package required for Baichuan-13B-Chat to conduct generation ``` @@ -22,23 +23,26 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install transformers_stream_generator # additional package required for Baichuan-13B-Chat to conduct generation ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. - ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -49,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -60,10 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. + +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -78,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -86,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/README.md index d7de8ab0..3ecfa959 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/README.md @@ -14,6 +14,7 @@ conda create -n llm python=3.11 conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install transformers_stream_generator # additional package required for Baichuan-7B-Chat to conduct generation ``` @@ -22,23 +23,26 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install transformers_stream_generator # additional package required for Baichuan-7B-Chat to conduct generation ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. - ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -49,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -60,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -79,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -87,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/bluelm/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/bluelm/README.md index af784432..e6181f47 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/bluelm/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/bluelm/README.md @@ -21,22 +21,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. - ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -47,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -58,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -77,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -85,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2/README.md index 9f7fbcb7..bf01503c 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2/README.md @@ -17,25 +17,29 @@ conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` + #### 1.2 Installation on Windows We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. @@ -47,6 +51,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -58,11 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -77,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -85,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm3/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm3/README.md index 607a7a33..8d235b12 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm3/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm3/README.md @@ -22,31 +22,35 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. - ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux
For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series + ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -58,11 +62,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
+
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -77,7 +93,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -85,15 +101,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` @@ -149,20 +158,23 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. @@ -174,6 +186,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -185,11 +198,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -204,7 +229,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -212,15 +237,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples **Stream Chat using `stream_chat()` API**: diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chinese-llama2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chinese-llama2/README.md index 08c49e99..86ea6ddb 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chinese-llama2/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chinese-llama2/README.md @@ -21,20 +21,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. + ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -45,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -56,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -75,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -83,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/codellama/readme.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/codellama/readme.md index f5f31406..4c6a6903 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/codellama/readme.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/codellama/readme.md @@ -14,6 +14,7 @@ conda create -n llm python=3.11 conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install transformers==4.34.1 # CodeLlamaTokenizer is supported in higher version of transformers ``` @@ -22,22 +23,26 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install transformers==4.34.1 # CodeLlamaTokenizer is supported in higher version of transformers ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -48,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -59,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -78,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -86,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deciLM-7b/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deciLM-7b/README.md index dd69d009..8822ff45 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deciLM-7b/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deciLM-7b/README.md @@ -8,38 +8,41 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a DeciLM-7B model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: - +We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 conda activate llm -# below command will install intel_extension_for_pytorch==2.0.110+xpu as default -# you can install specific ipex/torch version for your need +# below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install transformers==4.35.2 # required by DeciLM-7B ``` + #### 1.2 Installation on Windows We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install transformers==4.35.2 # required by DeciLM-7B ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. + ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -50,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -61,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -80,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -88,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deepseek/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deepseek/README.md index d747ba53..03446bd6 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deepseek/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deepseek/README.md @@ -12,8 +12,7 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 conda activate llm -# below command will install intel_extension_for_pytorch==2.0.110+xpu as default -# you can install specific ipex/torch version for your need +# below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` @@ -22,20 +21,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. + ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -46,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -57,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -76,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -84,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/distil-whisper/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/distil-whisper/README.md index 664e67aa..70c07725 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/distil-whisper/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/distil-whisper/README.md @@ -9,14 +9,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [recognize.py](./recognize.py), we show a basic use case for a Distil-Whisper model to conduct transcription using `pipeline()` API for long audio input, with IPEX-LLM INT4 optimizations. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install datasets soundfile librosa # required by audio processing ``` @@ -25,22 +24,26 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install datasets soundfile librosa # required by audio processing ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -51,6 +54,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -62,11 +66,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -81,7 +97,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -89,15 +105,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v1/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v1/README.md index 027ff4e8..cdf488a6 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v1/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v1/README.md @@ -23,22 +23,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -49,6 +51,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -60,11 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -79,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -87,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v2/README.md index 5ab0cf0e..f515d6e8 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v2/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v2/README.md @@ -21,21 +21,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -46,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -57,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -76,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -84,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon/README.md index c5e96f1c..b1472e47 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon/README.md @@ -15,6 +15,7 @@ conda create -n llm python=3.11 conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install einops # additional package required for falcon-7b-instruct to conduct generation ``` @@ -23,8 +24,12 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install einops # additional package required for falcon-7b-instruct to conduct generation ``` @@ -53,18 +58,17 @@ print(f'tiiuae/falcon-7b-instruct checkpoint is downloaded to {model_path}') #### 2.2 Replace `modelling_RW.py` For `tiiuae/falcon-7b-instruct`, you should replace the `modelling_RW.py` with [falcon-7b-instruct/modelling_RW.py](./falcon-7b-instruct/modelling_RW.py). +### 3. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. -### 3. Configures OneAPI environment variables -#### 3.1 Configurations for Linux ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 3.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 4. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 4.1 Configurations for Linux @@ -75,6 +79,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -86,11 +91,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 4.2 Configurations for Windows
@@ -105,7 +122,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -113,15 +130,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 5. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/flan-t5/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/flan-t5/README.md index f73665f6..16a0f106 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/flan-t5/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/flan-t5/README.md @@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a Flan-t5 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` @@ -24,21 +21,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gemma/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gemma/README.md index 98b775f1..a65bba80 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gemma/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gemma/README.md @@ -10,13 +10,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a Gemma model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ @@ -29,6 +26,9 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ @@ -36,17 +36,17 @@ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-exte pip install transformers==4.38.1 ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -57,6 +57,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -68,11 +69,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -87,7 +100,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -95,15 +108,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gpt-j/readme.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gpt-j/readme.md index c8659217..74405ad0 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gpt-j/readme.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gpt-j/readme.md @@ -21,22 +21,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. - ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -47,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -58,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -77,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -85,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm/README.md index 0404daf0..114fcdad 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm/README.md @@ -21,21 +21,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -46,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -57,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -76,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -84,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm2/README.md index a6e32dd8..ea314a1d 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm2/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm2/README.md @@ -21,21 +21,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -46,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -57,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -76,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -84,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2/README.md index 97b6deee..62f582e6 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2/README.md @@ -21,20 +21,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. + ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -45,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -56,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -75,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -83,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral/README.md index 78413419..53c0d4f6 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral/README.md @@ -10,13 +10,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a Mistral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ @@ -29,6 +26,9 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ @@ -36,17 +36,17 @@ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-exte pip install transformers==4.34.0 ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -57,6 +57,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -68,11 +69,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -87,7 +100,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -95,15 +108,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral/README.md index 47c9e728..a93c8bf8 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral/README.md @@ -10,13 +10,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a Mixtral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ @@ -29,6 +26,9 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ @@ -36,17 +36,17 @@ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-exte pip install transformers==4.36.0 ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -57,6 +57,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -68,11 +69,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -87,7 +100,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -95,15 +108,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mpt/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mpt/README.md index 99092cf9..0f1caed7 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mpt/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mpt/README.md @@ -14,6 +14,7 @@ conda create -n llm python=3.11 conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install einops # additional package required for mpt-7b-chat and mpt-30b-chat to conduct generation ``` @@ -22,21 +23,26 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + +pip install einops # additional package required for mpt-7b-chat and mpt-30b-chat to conduct generation ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -47,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -58,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -77,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -85,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-1_5/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-1_5/README.md index 98868833..e7522b85 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-1_5/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-1_5/README.md @@ -14,6 +14,7 @@ conda create -n llm python=3.11 conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install einops # additional package required for phi-1_5 to conduct generation ``` @@ -22,22 +23,26 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install einops # additional package required for phi-1_5 to conduct generation ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -48,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -59,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -78,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -86,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-2/README.md index 353d6e51..12ff1653 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-2/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-2/README.md @@ -14,27 +14,34 @@ conda create -n llm python=3.11 conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install einops # additional package required for phi-2 to conduct generation ``` + #### 1.2 Installation on Windows We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + +pip install einops # additional package required for phi-2 to conduct generation ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. @@ -46,7 +53,9 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ``` +
@@ -56,10 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
+ +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -74,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -82,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phixtral/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phixtral/README.md index e91daf29..4f6861ce 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phixtral/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phixtral/README.md @@ -21,21 +21,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -46,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -57,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -76,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -84,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen-vl/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen-vl/README.md index f8d67544..f7b9cb12 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen-vl/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen-vl/README.md @@ -8,14 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [chat.py](./chat.py), we show a basic use case for a Qwen-VL model to start a multimodal chat using `chat()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install accelerate tiktoken einops transformers_stream_generator==0.0.4 scipy torchvision pillow tensorboard matplotlib # additional package required for Qwen-VL-Chat to conduct generation ``` @@ -24,22 +23,26 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install accelerate tiktoken einops transformers_stream_generator==0.0.4 scipy torchvision pillow tensorboard matplotlib # additional package required for Qwen-VL-Chat to conduct generation ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -50,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -61,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -80,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -88,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen/README.md index b475d831..127ece18 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen/README.md @@ -14,6 +14,7 @@ conda create -n llm python=3.11 conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install tiktoken einops transformers_stream_generator # additional package required for Qwen-7B-Chat to conduct generation ``` @@ -22,22 +23,26 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install tiktoken einops transformers_stream_generator # additional package required for Qwen-7B-Chat to conduct generation ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -48,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -59,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -78,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -86,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen1.5/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen1.5/README.md index 830d4d26..3c3d6233 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen1.5/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen1.5/README.md @@ -14,6 +14,7 @@ conda create -n llm python=3.11 conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install transformers==4.37.0 # install transformers which supports Qwen2 ``` @@ -22,22 +23,26 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ -pip install transformers==4.37.2 # install transformers which supports Qwen2 + +pip install transformers==4.37.0 # install transformers which supports Qwen2 ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -48,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -59,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -78,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -86,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/redpajama/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/redpajama/README.md index 201046af..5af141a5 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/redpajama/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/redpajama/README.md @@ -8,9 +8,7 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for an redpajama model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 conda activate llm @@ -23,21 +21,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -48,9 +49,9 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ``` -
@@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
+
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/replit/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/replit/README.md index 9e6930a5..adb93547 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/replit/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/replit/README.md @@ -8,14 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for an Replit model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install "transformers<4.35" ``` @@ -24,21 +23,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -49,9 +51,9 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ``` -
@@ -61,11 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
+
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -80,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -88,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv4/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv4/README.md index 5ec3e3f0..cc3a6210 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv4/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv4/README.md @@ -17,25 +17,29 @@ conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` + #### 1.2 Installation on Windows We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. @@ -47,6 +51,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -58,11 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -77,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -85,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv5/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv5/README.md index c924fc25..62ac598c 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv5/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv5/README.md @@ -17,25 +17,30 @@ conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` + #### 1.2 Installation on Windows We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. + ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -46,6 +51,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -57,10 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. + +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -75,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -83,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/solar/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/solar/README.md index 72a3562d..ed66eca3 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/solar/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/solar/README.md @@ -14,6 +14,7 @@ conda create -n llm python=3.11 conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install transformers==4.35.2 # required by SOLAR ``` @@ -22,22 +23,26 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install transformers==4.35.2 # required by SOLAR ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -48,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -59,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -78,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -86,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/stablelm/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/stablelm/README.md index ce694c49..e000b7a3 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/stablelm/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/stablelm/README.md @@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a StableLM model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ @@ -27,6 +24,9 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ @@ -34,17 +34,17 @@ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-exte pip install transformers==4.38.0 ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -55,6 +55,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -66,11 +67,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -85,7 +98,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -93,15 +106,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder/readme.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder/readme.md index d0c6a257..3b6f4053 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder/readme.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder/readme.md @@ -21,21 +21,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -46,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -57,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -76,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -84,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/vicuna/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/vicuna/README.md index 9b719625..81f91d4e 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/vicuna/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/vicuna/README.md @@ -23,22 +23,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. - ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -49,6 +51,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -60,11 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -79,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -87,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/voiceassistant/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/voiceassistant/README.md index 74afab59..44fdfff2 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/voiceassistant/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/voiceassistant/README.md @@ -16,6 +16,7 @@ conda create -n llm python=3.11 conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install librosa soundfile datasets pip install accelerate pip install SpeechRecognition sentencepiece colorama @@ -28,25 +29,29 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install librosa soundfile datasets pip install accelerate pip install SpeechRecognition sentencepiece colorama pip install PyAudio inquirer ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -57,6 +62,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -68,11 +74,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -87,7 +105,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -95,15 +113,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper/readme.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper/readme.md index 377b8592..3925bc44 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper/readme.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper/readme.md @@ -15,6 +15,7 @@ conda create -n llm python=3.11 conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install datasets soundfile librosa # required by audio processing ``` @@ -23,23 +24,26 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install datasets soundfile librosa # required by audio processing ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. - ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -50,6 +54,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -61,11 +66,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -80,7 +97,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -88,15 +105,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` python ./recognize.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --repo-id-or-data-path REPO_ID_OR_DATA_PATH --language LANGUAGE diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yi/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yi/README.md index cb020717..9bfa9637 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yi/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yi/README.md @@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a Yi model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install einops # additional package required for Yi-6B to conduct generation ``` @@ -25,22 +23,26 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install einops # additional package required for Yi-6B to conduct generation ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -51,9 +53,9 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ``` -
@@ -63,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
+
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -82,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -90,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yuan2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yuan2/README.md index d67ac916..98bca49e 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yuan2/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yuan2/README.md @@ -15,31 +15,37 @@ We suggest using conda to manage environment: conda create -n llm python=3.11 conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default -pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option +pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install einops # additional package required for Yuan2 to conduct generation pip install pandas # additional package required for Yuan2 to conduct generation ``` + #### 1.2 Installation on Windows We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install einops # additional package required for Yuan2 to conduct generation ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -50,6 +56,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -61,10 +68,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. + +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -79,7 +99,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -87,15 +107,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/PyTorch-Models/Model/aquila2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/aquila2/README.md index 32da14ea..c81fc6d7 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/aquila2/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/aquila2/README.md @@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a Aquila2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` @@ -24,21 +21,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/PyTorch-Models/Model/baichuan/README.md b/python/llm/example/GPU/PyTorch-Models/Model/baichuan/README.md index be7501ec..41a986e5 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/baichuan/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/baichuan/README.md @@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a Baichuan model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install transformers_stream_generator # additional package required for Baichuan-13B-Chat to conduct generation ``` @@ -25,22 +23,26 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install transformers_stream_generator # additional package required for Baichuan-13B-Chat to conduct generation ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -51,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -62,10 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. + +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -80,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -88,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/PyTorch-Models/Model/baichuan2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/baichuan2/README.md index 11e5dad8..9f62c2fd 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/baichuan2/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/baichuan2/README.md @@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a Baichuan2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install transformers_stream_generator # additional package required for Baichuan2-7B-Chat to conduct generation ``` @@ -25,22 +23,26 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install transformers_stream_generator # additional package required for Baichuan2-7B-Chat to conduct generation ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -51,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -62,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -81,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -89,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/PyTorch-Models/Model/bark/README.md b/python/llm/example/GPU/PyTorch-Models/Model/bark/README.md index 05f34949..31eb9afc 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/bark/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/bark/README.md @@ -8,15 +8,13 @@ To run these examples with IPEX-LLM, we have some recommended requirements for y In the example [synthesize_speech.py](./synthesize_speech.py), we show a basic use case for Bark model to synthesize speech based on the given text, with IPEX-LLM INT4 optimizations. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install scipy ``` @@ -25,23 +23,26 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install scipy ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -52,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -63,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -82,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -90,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/PyTorch-Models/Model/bluelm/README.md b/python/llm/example/GPU/PyTorch-Models/Model/bluelm/README.md index fc6f47fb..301e3525 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/bluelm/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/bluelm/README.md @@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a BlueLM model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` @@ -24,21 +21,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/PyTorch-Models/Model/chatglm2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/chatglm2/README.md index afda5bb6..6ad1407d 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/chatglm2/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/chatglm2/README.md @@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` @@ -24,21 +21,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash @@ -132,13 +138,10 @@ Inference time: xxxx s In the example [streamchat.py](./streamchat.py), we show a basic use case for a ChatGLM2 model to stream chat, with IPEX-LLM INT4 optimizations. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` @@ -148,21 +151,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -173,6 +179,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -184,10 +191,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. + +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -202,7 +222,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -210,15 +230,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples **Stream Chat using `stream_chat()` API**: diff --git a/python/llm/example/GPU/PyTorch-Models/Model/chatglm3/README.md b/python/llm/example/GPU/PyTorch-Models/Model/chatglm3/README.md index 278888b9..a7077195 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/chatglm3/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/chatglm3/README.md @@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM3 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` @@ -24,21 +21,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash @@ -131,13 +137,10 @@ AI stands for Artificial Intelligence. It refers to the development of computer In the example [streamchat.py](./streamchat.py), we show a basic use case for a ChatGLM3 model to stream chat, with IPEX-LLM INT4 optimizations. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` @@ -147,21 +150,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -172,6 +178,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -183,10 +190,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. + +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -201,7 +221,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -209,15 +229,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples **Stream Chat using `stream_chat()` API**: ``` diff --git a/python/llm/example/GPU/PyTorch-Models/Model/codellama/README.md b/python/llm/example/GPU/PyTorch-Models/Model/codellama/README.md index 01115cef..617b8bf5 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/codellama/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/codellama/README.md @@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a CodeLlama model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install transformers==4.34.1 # CodeLlamaTokenizer is supported in higher version of transformers ``` @@ -25,22 +23,26 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install transformers==4.34.1 # CodeLlamaTokenizer is supported in higher version of transformers ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -51,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -62,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -81,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -89,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/PyTorch-Models/Model/deciLM-7b/README.md b/python/llm/example/GPU/PyTorch-Models/Model/deciLM-7b/README.md index 644c0205..3bbdc081 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/deciLM-7b/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/deciLM-7b/README.md @@ -8,17 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a DeciLM-7B model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: - +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - -# below command will install intel_extension_for_pytorch==2.0.110+xpu as default -# you can install specific ipex/torch version for your need +# below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install transformers==4.35.2 # required by DeciLM-7B ``` @@ -27,20 +23,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. + ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -51,6 +51,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -62,11 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -81,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -89,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples diff --git a/python/llm/example/GPU/PyTorch-Models/Model/deepseek/README.md b/python/llm/example/GPU/PyTorch-Models/Model/deepseek/README.md index d3c76f9f..3f777467 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/deepseek/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/deepseek/README.md @@ -8,15 +8,11 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a Deepseek model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - -# below command will install intel_extension_for_pytorch==2.0.110+xpu as default -# you can install specific ipex/torch version for your need +# below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` @@ -25,21 +21,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -50,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -61,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -80,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -88,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/PyTorch-Models/Model/distil-whisper/README.md b/python/llm/example/GPU/PyTorch-Models/Model/distil-whisper/README.md index d72abcf3..f5407486 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/distil-whisper/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/distil-whisper/README.md @@ -9,14 +9,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [recognize.py](./recognize.py), we show a basic use case for a Distil-Whisper model to conduct transcription using `pipeline()` API for long audio input, with IPEX-LLM INT4 optimizations. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install datasets soundfile librosa # required by audio processing ``` @@ -25,22 +24,26 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install datasets soundfile librosa # required by audio processing ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -51,6 +54,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -62,11 +66,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -81,7 +97,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -89,15 +105,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/PyTorch-Models/Model/dolly-v1/README.md b/python/llm/example/GPU/PyTorch-Models/Model/dolly-v1/README.md index 4f80a814..006c68dd 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/dolly-v1/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/dolly-v1/README.md @@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a Dolly v1 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` @@ -24,21 +21,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/PyTorch-Models/Model/dolly-v2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/dolly-v2/README.md index 28dab67b..8de52c9c 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/dolly-v2/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/dolly-v2/README.md @@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a Dolly v2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` @@ -24,21 +21,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/PyTorch-Models/Model/flan-t5/README.md b/python/llm/example/GPU/PyTorch-Models/Model/flan-t5/README.md index d42a7cb2..d4a45ec4 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/flan-t5/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/flan-t5/README.md @@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a Flan-t5 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` @@ -24,21 +21,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/PyTorch-Models/Model/internlm2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/internlm2/README.md index a6e32dd8..ea314a1d 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/internlm2/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/internlm2/README.md @@ -21,21 +21,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -46,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -57,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -76,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -84,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/PyTorch-Models/Model/llama2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/llama2/README.md index b801c7fb..05f24e86 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/llama2/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/llama2/README.md @@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` @@ -24,30 +21,35 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux
For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series + ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -59,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
+
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -78,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -86,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/PyTorch-Models/Model/llava/README.md b/python/llm/example/GPU/PyTorch-Models/Model/llava/README.md index 424f3ec5..06449d2b 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/llava/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/llava/README.md @@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a LLaVA model to start a multi-turn chat centered around an image using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install einops # install dependencies required by llava pip install transformers==4.36.2 @@ -31,8 +29,12 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install einops # install dependencies required by llava pip install transformers==4.36.2 @@ -40,29 +42,30 @@ git clone https://github.com/haotian-liu/LLaVA.git # clone the llava libary copy generate.py .\LLaVA\ # copy our example to the LLaVA folder cd LLaVA # change the working directory to the LLaVA folder git checkout tags/v1.2.0 -b 1.2.0 # Get the branch which is compatible with transformers 4.36 - ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux
For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series + ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -74,11 +77,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
+
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -93,7 +108,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -101,15 +116,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/PyTorch-Models/Model/mamba/README.md b/python/llm/example/GPU/PyTorch-Models/Model/mamba/README.md index 7c30497a..d6c4f53a 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/mamba/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/mamba/README.md @@ -7,33 +7,107 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ ## Example: Predict Tokens using `generate()` API In the example [generate.py](./generate.py), we show a basic use case for a Mamba model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +#### 1.1 Installation on Linux +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - -# below command will install intel_extension_for_pytorch==2.0.110+xpu as default -# you can install specific ipex/torch version for your need +# below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install einops # package required by Mamba ``` -### 2. Configures OneAPI environment variables +#### 1.2 Installation on Windows +We suggest using conda to manage environment: +```bash +conda create -n llm python=3.11 libuv +conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + +# below command will install intel_extension_for_pytorch==2.1.10+xpu as default +pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + +pip install einops # package required by Mamba +``` + +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -### 3. Run +### 3. Runtime Configurations +For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. +#### 3.1 Configurations for Linux +
-For optimal performance on Arc, it is recommended to set several environment variables. +For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ``` +
+ +
+ +For Intel Data Center GPU Max Series + +```bash +export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so +export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 +export ENABLE_SDP_FUSION=1 +``` +> Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ +#### 3.2 Configurations for Windows +
+ +For Intel iGPU + +```cmd +set SYCL_CACHE_PERSISTENT=1 +set BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ +
+ +For Intel Arc™ A-Series Graphics + +```cmd +set SYCL_CACHE_PERSISTENT=1 +``` + +
+ +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +### 4. Running examples + ```bash python ./generate.py ``` @@ -45,7 +119,7 @@ In the example, several arguments can be passed to satisfy your requirements: - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`. - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`. -#### 2.3 Sample Output +#### 4.3 Sample Output #### [state-spaces/mamba-1.4b](https://huggingface.co/state-spaces/mamba-1.4b) ```log Inference time: xxxx s diff --git a/python/llm/example/GPU/PyTorch-Models/Model/mistral/README.md b/python/llm/example/GPU/PyTorch-Models/Model/mistral/README.md index 565470e5..9423f514 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/mistral/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/mistral/README.md @@ -10,13 +10,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a Mistral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ @@ -29,21 +26,27 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + +# Refer to https://huggingface.co/mistralai/Mistral-7B-v0.1#troubleshooting, please make sure you are using a stable version of Transformers, 4.34.0 or newer. pip install transformers==4.34.0 ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. + ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -54,6 +57,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -65,11 +69,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -84,7 +100,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -92,15 +108,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/PyTorch-Models/Model/mixtral/README.md b/python/llm/example/GPU/PyTorch-Models/Model/mixtral/README.md index 8f4a4dab..c54a257f 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/mixtral/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/mixtral/README.md @@ -10,13 +10,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a Mixtral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ @@ -29,6 +26,9 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ @@ -36,26 +36,28 @@ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-exte pip install transformers==4.36.0 ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux
For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series + ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -67,11 +69,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
+
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -86,7 +100,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -94,15 +108,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/PyTorch-Models/Model/phi-1_5/README.md b/python/llm/example/GPU/PyTorch-Models/Model/phi-1_5/README.md index 54a72a07..945d2665 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/phi-1_5/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/phi-1_5/README.md @@ -8,14 +8,13 @@ To run these examples with IPEX-LLM, we have some recommended requirements for y In the example [generate.py](./generate.py), we show a basic use case for a phi-1_5 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - +# below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install einops # additional package required for phi-1_5 to conduct generation ``` @@ -24,22 +23,26 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install einops # additional package required for phi-1_5 to conduct generation ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -50,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -61,10 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. + +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -79,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -87,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/PyTorch-Models/Model/phi-2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/phi-2/README.md index 4a201625..7c0707f9 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/phi-2/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/phi-2/README.md @@ -8,36 +8,39 @@ To run these examples with IPEX-LLM, we have some recommended requirements for y In the example [generate.py](./generate.py), we show a basic use case for a phi-2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - +# below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install einops # additional package required for phi-2 to conduct generation ``` + #### 1.2 Installation on Windows We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -48,7 +51,9 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ``` +
@@ -58,10 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
+ +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -76,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -84,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/PyTorch-Models/Model/phixtral/README.md b/python/llm/example/GPU/PyTorch-Models/Model/phixtral/README.md index 9f1a33be..37068e0d 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/phixtral/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/phixtral/README.md @@ -8,14 +8,13 @@ To run these examples with IPEX-LLM, we have some recommended requirements for y In the example [generate.py](./generate.py), we show a basic use case for a phixtral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - +# below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install einops # additional package required for phixtral to conduct generation ``` @@ -24,22 +23,26 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install einops # additional package required for phixtral to conduct generation ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -50,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -61,10 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. + +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -79,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -87,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/PyTorch-Models/Model/qwen-vl/README.md b/python/llm/example/GPU/PyTorch-Models/Model/qwen-vl/README.md index 473cac40..6d0bc5b5 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/qwen-vl/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/qwen-vl/README.md @@ -8,14 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [chat.py](./chat.py), we show a basic use case for a Qwen-VL model to start a multimodal chat using `chat()` API, with IPEX-LLM 'optimize_model' API on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install accelerate tiktoken einops transformers_stream_generator==0.0.4 scipy torchvision pillow tensorboard matplotlib # additional package required for Qwen-VL-Chat to conduct generation ``` @@ -24,22 +23,26 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install accelerate tiktoken einops transformers_stream_generator==0.0.4 scipy torchvision pillow tensorboard matplotlib # additional package required for Qwen-VL-Chat to conduct generation ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -50,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -61,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -80,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -88,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/PyTorch-Models/Model/qwen1.5/README.md b/python/llm/example/GPU/PyTorch-Models/Model/qwen1.5/README.md index 86b0f8c7..e8ca82f2 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/qwen1.5/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/qwen1.5/README.md @@ -14,6 +14,7 @@ conda create -n llm python=3.11 conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install transformers==4.37.0 # install transformers which supports Qwen2 ``` @@ -22,22 +23,26 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ -pip install transformers==4.37.2 # install transformers which supports Qwen2 + +pip install transformers==4.37.0 # install transformers which supports Qwen2 ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -48,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -59,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -78,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -86,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ``` diff --git a/python/llm/example/GPU/PyTorch-Models/Model/replit/README.md b/python/llm/example/GPU/PyTorch-Models/Model/replit/README.md index 8ad73633..3627c58e 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/replit/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/replit/README.md @@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a Replit model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install "transformers<4.35" ``` @@ -25,21 +23,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -47,13 +48,10 @@ For optimal performance, it is recommended to set several environment variables. For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series -### 3. Run - -For optimal performance on Arc, it is recommended to set several environment variables. - ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -65,10 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. + +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -83,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -91,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/PyTorch-Models/Model/solar/README.md b/python/llm/example/GPU/PyTorch-Models/Model/solar/README.md index e0802db7..bd549640 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/solar/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/solar/README.md @@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a SOLAR model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install transformers==4.35.2 # required by SOLAR ``` @@ -25,22 +23,26 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install transformers==4.35.2 # required by SOLAR ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -51,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -62,10 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. + +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -80,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -88,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/PyTorch-Models/Model/speech-t5/README.md b/python/llm/example/GPU/PyTorch-Models/Model/speech-t5/README.md index a0a1020c..bec5410a 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/speech-t5/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/speech-t5/README.md @@ -8,15 +8,13 @@ To run these examples with IPEX-LLM, we have some recommended requirements for y In the example [synthesize_speech.py](./synthesize_speech.py), we show a basic use case for SpeechT5 model to synthesize speech based on the given text, with IPEX-LLM INT4 optimizations. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install "datasets<2.18" soundfile # additional package required for SpeechT5 to conduct generation ``` @@ -25,23 +23,26 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install "datasets<2.18" soundfile # additional package required for SpeechT5 to conduct generation ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -52,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -63,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -82,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -90,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/PyTorch-Models/Model/stablelm/README.md b/python/llm/example/GPU/PyTorch-Models/Model/stablelm/README.md index f322d64f..bb68e00a 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/stablelm/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/stablelm/README.md @@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a StableLM model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ @@ -27,6 +24,9 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ @@ -34,17 +34,17 @@ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-exte pip install transformers==4.38.0 ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -55,6 +55,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -66,11 +67,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -85,7 +98,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -93,15 +106,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/PyTorch-Models/Model/starcoder/README.md b/python/llm/example/GPU/PyTorch-Models/Model/starcoder/README.md index 9580c1a8..aec53ec9 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/starcoder/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/starcoder/README.md @@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a StarCoder model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` @@ -24,21 +21,24 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables. ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`. +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/PyTorch-Models/Model/yi/README.md b/python/llm/example/GPU/PyTorch-Models/Model/yi/README.md index bac21baf..b3f751af 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/yi/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/yi/README.md @@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ In the example [generate.py](./generate.py), we show a basic use case for a Yi model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash -conda create -n llm python=3.11 # recommend to use Python 3.11 +conda create -n llm python=3.11 conda activate llm - # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install einops # additional package required for Yi-6B to conduct generation ``` @@ -25,31 +23,37 @@ We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install einops # additional package required for Yi-6B to conduct generation ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux
For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series + ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ```
@@ -61,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
+
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -80,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -88,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash diff --git a/python/llm/example/GPU/PyTorch-Models/Model/yuan2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/yuan2/README.md index 2def531d..8c562b22 100644 --- a/python/llm/example/GPU/PyTorch-Models/Model/yuan2/README.md +++ b/python/llm/example/GPU/PyTorch-Models/Model/yuan2/README.md @@ -10,38 +10,42 @@ In addition, you need to modify some files in Yuan2-2B-hf folder, since Flash at In the example [generate.py](./generate.py), we show a basic use case for an Yuan2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. ### 1. Install #### 1.1 Installation on Linux -We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#). - -After installing conda, create a Python environment for IPEX-LLM: +We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 conda activate llm +# below command will install intel_extension_for_pytorch==2.1.10+xpu as default +pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ -pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option pip install einops # additional package required for Yuan2 to conduct generation pip install pandas # additional package required for Yuan2 to conduct generation ``` + #### 1.2 Installation on Windows We suggest using conda to manage environment: ```bash conda create -n llm python=3.11 libuv conda activate llm +# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0 +pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 + # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + pip install einops # additional package required for Yuan2 to conduct generation ``` -### 2. Configures OneAPI environment variables -#### 2.1 Configurations for Linux +### 2. Configures OneAPI environment variables for Linux + +> [!NOTE] +> Skip this step if you are running on Windows. + +This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. + ```bash source /opt/intel/oneapi/setvars.sh ``` -#### 2.2 Configurations for Windows -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` -> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported. ### 3. Runtime Configurations For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. #### 3.1 Configurations for Linux @@ -49,12 +53,12 @@ For optimal performance, it is recommended to set several environment variables. For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series -For optimal performance on Arc, it is recommended to set several environment variables. - ```bash export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 ``` +
@@ -64,10 +68,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ```bash export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +export SYCL_CACHE_PERSISTENT=1 export ENABLE_SDP_FUSION=1 ``` > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
+ +
+ +For Intel iGPU + +```bash +export SYCL_CACHE_PERSISTENT=1 +export BIGDL_LLM_XMX_DISABLED=1 +``` + +
+ #### 3.2 Configurations for Windows
@@ -82,7 +99,7 @@ set BIGDL_LLM_XMX_DISABLED=1
-For Intel Arc™ A300-Series or Pro A60 +For Intel Arc™ A-Series Graphics ```cmd set SYCL_CACHE_PERSISTENT=1 @@ -90,15 +107,8 @@ set SYCL_CACHE_PERSISTENT=1
-
- -For other Intel dGPU Series - -There is no need to set further environment variables. - -
- -> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. +> [!NOTE] +> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. ### 4. Running examples ```bash