diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila/README.md
index f2b57eb4..316a9652 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila/README.md
@@ -21,27 +21,30 @@ conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
+
 #### 1.2 Installation on Windows
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
-
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -52,6 +55,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -63,11 +67,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -82,7 +98,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -90,15 +106,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila2/README.md
index b68ff6df..3cdfd4b7 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila2/README.md
@@ -27,22 +27,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
-
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -53,6 +55,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -64,11 +67,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -83,7 +98,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -91,15 +106,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan/README.md
index 105e1c0b..b56e272d 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan/README.md
@@ -14,6 +14,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install transformers_stream_generator  # additional package required for Baichuan-13B-Chat to conduct generation
 ```
 
@@ -22,23 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install transformers_stream_generator  # additional package required for Baichuan-13B-Chat to conduct generation
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
-
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -49,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -60,10 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
+
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -78,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -86,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/README.md
index d7de8ab0..3ecfa959 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/README.md
@@ -14,6 +14,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install transformers_stream_generator  # additional package required for Baichuan-7B-Chat to conduct generation
 ```
 
@@ -22,23 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install transformers_stream_generator  # additional package required for Baichuan-7B-Chat to conduct generation
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
-
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -49,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -60,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -79,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -87,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/bluelm/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/bluelm/README.md
index af784432..e6181f47 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/bluelm/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/bluelm/README.md
@@ -21,22 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
-
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -47,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -58,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -77,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -85,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2/README.md
index 9f7fbcb7..bf01503c 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2/README.md
@@ -17,25 +17,29 @@ conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
+
 #### 1.2 Installation on Windows
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
@@ -47,6 +51,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -58,11 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -77,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -85,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 
 ### 4. Running examples
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm3/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm3/README.md
index 607a7a33..8d235b12 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm3/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm3/README.md
@@ -22,31 +22,35 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
-
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
 <details>
 
 <summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
+
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -58,11 +62,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -77,7 +93,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -85,15 +101,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
@@ -149,20 +158,23 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
@@ -174,6 +186,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -185,11 +198,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -204,7 +229,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -212,15 +237,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 **Stream Chat using `stream_chat()` API**:
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chinese-llama2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chinese-llama2/README.md
index 08c49e99..86ea6ddb 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chinese-llama2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chinese-llama2/README.md
@@ -21,20 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
+
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -45,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -56,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -75,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -83,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/codellama/readme.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/codellama/readme.md
index f5f31406..4c6a6903 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/codellama/readme.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/codellama/readme.md
@@ -14,6 +14,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install transformers==4.34.1 # CodeLlamaTokenizer is supported in higher version of transformers
 ```
 
@@ -22,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install transformers==4.34.1 # CodeLlamaTokenizer is supported in higher version of transformers
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -48,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -59,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -78,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -86,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deciLM-7b/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deciLM-7b/README.md
index dd69d009..8822ff45 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deciLM-7b/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deciLM-7b/README.md
@@ -8,38 +8,41 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a DeciLM-7B model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
-
+We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11
 conda activate llm
-# below command will install intel_extension_for_pytorch==2.0.110+xpu as default
-# you can install specific ipex/torch version for your need
+# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install transformers==4.35.2  # required by DeciLM-7B
 ```
+
 #### 1.2 Installation on Windows
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install transformers==4.35.2  # required by DeciLM-7B
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
+
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -50,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -61,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -80,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -88,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 
 ### 4. Running examples
 
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deepseek/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deepseek/README.md
index d747ba53..03446bd6 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deepseek/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deepseek/README.md
@@ -12,8 +12,7 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11
 conda activate llm
-# below command will install intel_extension_for_pytorch==2.0.110+xpu as default
-# you can install specific ipex/torch version for your need
+# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
@@ -22,20 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
+
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -46,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -57,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -76,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -84,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 
 ### 4. Running examples
 
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/distil-whisper/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/distil-whisper/README.md
index 664e67aa..70c07725 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/distil-whisper/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/distil-whisper/README.md
@@ -9,14 +9,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [recognize.py](./recognize.py), we show a basic use case for a Distil-Whisper model to conduct transcription using `pipeline()` API for long audio input, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install datasets soundfile librosa # required by audio processing
 ```
 
@@ -25,22 +24,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install datasets soundfile librosa # required by audio processing
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -51,6 +54,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -62,11 +66,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -81,7 +97,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -89,15 +105,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v1/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v1/README.md
index 027ff4e8..cdf488a6 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v1/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v1/README.md
@@ -23,22 +23,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -49,6 +51,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -60,11 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -79,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -87,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v2/README.md
index 5ab0cf0e..f515d6e8 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v2/README.md
@@ -21,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -46,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -57,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -76,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -84,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon/README.md
index c5e96f1c..b1472e47 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon/README.md
@@ -15,6 +15,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install einops # additional package required for falcon-7b-instruct to conduct generation
 ```
 
@@ -23,8 +24,12 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install einops # additional package required for falcon-7b-instruct to conduct generation
 ```
 
@@ -53,18 +58,17 @@ print(f'tiiuae/falcon-7b-instruct checkpoint is downloaded to {model_path}')
 #### 2.2 Replace `modelling_RW.py`
 For `tiiuae/falcon-7b-instruct`, you should replace the `modelling_RW.py` with [falcon-7b-instruct/modelling_RW.py](./falcon-7b-instruct/modelling_RW.py).
 
+### 3. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 
-### 3. Configures OneAPI environment variables
-#### 3.1 Configurations for Linux
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 3.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 4. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 4.1 Configurations for Linux
@@ -75,6 +79,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -86,11 +91,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 4.2 Configurations for Windows
 <details>
 
@@ -105,7 +122,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -113,15 +130,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 5. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/flan-t5/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/flan-t5/README.md
index f73665f6..16a0f106 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/flan-t5/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/flan-t5/README.md
@@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Flan-t5 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@@ -24,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gemma/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gemma/README.md
index 98b775f1..a65bba80 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gemma/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gemma/README.md
@@ -10,13 +10,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Gemma model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 
@@ -29,6 +26,9 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 
@@ -36,17 +36,17 @@ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-exte
 pip install transformers==4.38.1
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -57,6 +57,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -68,11 +69,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -87,7 +100,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -95,15 +108,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gpt-j/readme.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gpt-j/readme.md
index c8659217..74405ad0 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gpt-j/readme.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gpt-j/readme.md
@@ -21,22 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
-
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -47,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -58,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -77,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -85,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm/README.md
index 0404daf0..114fcdad 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm/README.md
@@ -21,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -46,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -57,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -76,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -84,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm2/README.md
index a6e32dd8..ea314a1d 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm2/README.md
@@ -21,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -46,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -57,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -76,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -84,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2/README.md
index 97b6deee..62f582e6 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2/README.md
@@ -21,20 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
+
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -45,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -56,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -75,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -83,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral/README.md
index 78413419..53c0d4f6 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral/README.md
@@ -10,13 +10,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Mistral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 
@@ -29,6 +26,9 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 
@@ -36,17 +36,17 @@ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-exte
 pip install transformers==4.34.0
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -57,6 +57,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -68,11 +69,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -87,7 +100,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -95,15 +108,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral/README.md
index 47c9e728..a93c8bf8 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral/README.md
@@ -10,13 +10,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Mixtral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 
@@ -29,6 +26,9 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 
@@ -36,17 +36,17 @@ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-exte
 pip install transformers==4.36.0
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -57,6 +57,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -68,11 +69,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -87,7 +100,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -95,15 +108,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mpt/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mpt/README.md
index 99092cf9..0f1caed7 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mpt/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mpt/README.md
@@ -14,6 +14,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install einops  # additional package required for mpt-7b-chat and mpt-30b-chat to conduct generation
 ```
 
@@ -22,21 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
+pip install einops  # additional package required for mpt-7b-chat and mpt-30b-chat to conduct generation
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -47,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -58,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -77,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -85,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-1_5/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-1_5/README.md
index 98868833..e7522b85 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-1_5/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-1_5/README.md
@@ -14,6 +14,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install einops # additional package required for phi-1_5 to conduct generation
 ```
 
@@ -22,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install einops # additional package required for phi-1_5 to conduct generation
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -48,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -59,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -78,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -86,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-2/README.md
index 353d6e51..12ff1653 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-2/README.md
@@ -14,27 +14,34 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install einops # additional package required for phi-2 to conduct generation
 ```
+
 #### 1.2 Installation on Windows
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
+pip install einops # additional package required for phi-2 to conduct generation
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
@@ -46,7 +53,9 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
+
 </details>
 
 <details>
@@ -56,10 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
+
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -74,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -82,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phixtral/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phixtral/README.md
index e91daf29..4f6861ce 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phixtral/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phixtral/README.md
@@ -21,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -46,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -57,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -76,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -84,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen-vl/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen-vl/README.md
index f8d67544..f7b9cb12 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen-vl/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen-vl/README.md
@@ -8,14 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [chat.py](./chat.py), we show a basic use case for a Qwen-VL model to start a multimodal chat using `chat()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install accelerate tiktoken einops transformers_stream_generator==0.0.4 scipy torchvision pillow tensorboard matplotlib # additional package required for Qwen-VL-Chat to conduct generation
 ```
 
@@ -24,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install accelerate tiktoken einops transformers_stream_generator==0.0.4 scipy torchvision pillow tensorboard matplotlib # additional package required for Qwen-VL-Chat to conduct generation
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -50,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -61,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -80,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -88,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen/README.md
index b475d831..127ece18 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen/README.md
@@ -14,6 +14,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install tiktoken einops transformers_stream_generator  # additional package required for Qwen-7B-Chat to conduct generation
 ```
 
@@ -22,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install tiktoken einops transformers_stream_generator  # additional package required for Qwen-7B-Chat to conduct generation
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -48,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -59,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -78,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -86,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen1.5/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen1.5/README.md
index 830d4d26..3c3d6233 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen1.5/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen1.5/README.md
@@ -14,6 +14,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install transformers==4.37.0 # install transformers which supports Qwen2
 ```
 
@@ -22,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
-pip install transformers==4.37.2 # install transformers which supports Qwen2
+
+pip install transformers==4.37.0 # install transformers which supports Qwen2
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -48,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -59,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -78,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -86,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/redpajama/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/redpajama/README.md
index 201046af..5af141a5 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/redpajama/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/redpajama/README.md
@@ -8,9 +8,7 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for an redpajama model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11
 conda activate llm
@@ -23,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -48,9 +49,9 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
-
 </details>
 
 <details>
@@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/replit/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/replit/README.md
index 9e6930a5..adb93547 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/replit/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/replit/README.md
@@ -8,14 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for an Replit model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install "transformers<4.35"
 ```
 
@@ -24,21 +23,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -49,9 +51,9 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
-
 </details>
 
 <details>
@@ -61,11 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -80,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -88,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv4/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv4/README.md
index 5ec3e3f0..cc3a6210 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv4/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv4/README.md
@@ -17,25 +17,29 @@ conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
+
 #### 1.2 Installation on Windows
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
@@ -47,6 +51,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -58,11 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -77,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -85,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 
 ### 4. Running examples
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv5/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv5/README.md
index c924fc25..62ac598c 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv5/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv5/README.md
@@ -17,25 +17,30 @@ conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
+
 #### 1.2 Installation on Windows
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
+
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -46,6 +51,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -57,10 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
+
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -75,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -83,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
 python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/solar/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/solar/README.md
index 72a3562d..ed66eca3 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/solar/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/solar/README.md
@@ -14,6 +14,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install transformers==4.35.2 # required by SOLAR
 ```
 
@@ -22,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install transformers==4.35.2 # required by SOLAR
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -48,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -59,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -78,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -86,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/stablelm/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/stablelm/README.md
index ce694c49..e000b7a3 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/stablelm/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/stablelm/README.md
@@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a StableLM model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 
@@ -27,6 +24,9 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 
@@ -34,17 +34,17 @@ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-exte
 pip install transformers==4.38.0
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -55,6 +55,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -66,11 +67,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -85,7 +98,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -93,15 +106,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder/readme.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder/readme.md
index d0c6a257..3b6f4053 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder/readme.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder/readme.md
@@ -21,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -46,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -57,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -76,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -84,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/vicuna/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/vicuna/README.md
index 9b719625..81f91d4e 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/vicuna/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/vicuna/README.md
@@ -23,22 +23,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
-
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -49,6 +51,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -60,11 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -79,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -87,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/voiceassistant/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/voiceassistant/README.md
index 74afab59..44fdfff2 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/voiceassistant/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/voiceassistant/README.md
@@ -16,6 +16,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install librosa soundfile datasets
 pip install accelerate
 pip install SpeechRecognition sentencepiece colorama
@@ -28,25 +29,29 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install librosa soundfile datasets
 pip install accelerate
 pip install SpeechRecognition sentencepiece colorama
 pip install PyAudio inquirer
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -57,6 +62,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -68,11 +74,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -87,7 +105,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -95,15 +113,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper/readme.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper/readme.md
index 377b8592..3925bc44 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper/readme.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper/readme.md
@@ -15,6 +15,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install datasets soundfile librosa # required by audio processing
 ```
 
@@ -23,23 +24,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install datasets soundfile librosa # required by audio processing
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
-
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -50,6 +54,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -61,11 +66,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -80,7 +97,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -88,15 +105,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
 python ./recognize.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --repo-id-or-data-path REPO_ID_OR_DATA_PATH --language LANGUAGE
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yi/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yi/README.md
index cb020717..9bfa9637 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yi/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yi/README.md
@@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Yi model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install einops # additional package required for Yi-6B to conduct generation
 ```
 
@@ -25,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install einops # additional package required for Yi-6B to conduct generation
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -51,9 +53,9 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
-
 </details>
 
 <details>
@@ -63,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -82,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -90,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yuan2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yuan2/README.md
index d67ac916..98bca49e 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yuan2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yuan2/README.md
@@ -15,31 +15,37 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install einops # additional package required for Yuan2 to conduct generation
 pip install pandas # additional package required for Yuan2 to conduct generation
 ```
+
 #### 1.2 Installation on Windows
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install einops # additional package required for Yuan2 to conduct generation
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -50,6 +56,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -61,10 +68,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
+
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -79,7 +99,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -87,15 +107,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/aquila2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/aquila2/README.md
index 32da14ea..c81fc6d7 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/aquila2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/aquila2/README.md
@@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Aquila2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@@ -24,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/baichuan/README.md b/python/llm/example/GPU/PyTorch-Models/Model/baichuan/README.md
index be7501ec..41a986e5 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/baichuan/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/baichuan/README.md
@@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Baichuan model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install transformers_stream_generator  # additional package required for Baichuan-13B-Chat to conduct generation
 ```
 
@@ -25,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install transformers_stream_generator  # additional package required for Baichuan-13B-Chat to conduct generation
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -51,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -62,10 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
+
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -80,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -88,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/baichuan2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/baichuan2/README.md
index 11e5dad8..9f62c2fd 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/baichuan2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/baichuan2/README.md
@@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Baichuan2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install transformers_stream_generator  # additional package required for Baichuan2-7B-Chat to conduct generation
 ```
 
@@ -25,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install transformers_stream_generator  # additional package required for Baichuan2-7B-Chat to conduct generation
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -51,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -62,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -81,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -89,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/bark/README.md b/python/llm/example/GPU/PyTorch-Models/Model/bark/README.md
index 05f34949..31eb9afc 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/bark/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/bark/README.md
@@ -8,15 +8,13 @@ To run these examples with IPEX-LLM, we have some recommended requirements for y
 In the example [synthesize_speech.py](./synthesize_speech.py), we show a basic use case for Bark model to synthesize speech based on the given text, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install scipy
 ```
 
@@ -25,23 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install scipy
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -52,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -63,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -82,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -90,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/bluelm/README.md b/python/llm/example/GPU/PyTorch-Models/Model/bluelm/README.md
index fc6f47fb..301e3525 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/bluelm/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/bluelm/README.md
@@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a BlueLM model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@@ -24,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/chatglm2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/chatglm2/README.md
index afda5bb6..6ad1407d 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/chatglm2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/chatglm2/README.md
@@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@@ -24,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
@@ -132,13 +138,10 @@ Inference time: xxxx s
 In the example [streamchat.py](./streamchat.py), we show a basic use case for a ChatGLM2 model to stream chat, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@@ -148,21 +151,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -173,6 +179,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -184,10 +191,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
+
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -202,7 +222,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -210,15 +230,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 **Stream Chat using `stream_chat()` API**:
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/chatglm3/README.md b/python/llm/example/GPU/PyTorch-Models/Model/chatglm3/README.md
index 278888b9..a7077195 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/chatglm3/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/chatglm3/README.md
@@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM3 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@@ -24,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
@@ -131,13 +137,10 @@ AI stands for Artificial Intelligence. It refers to the development of computer
 In the example [streamchat.py](./streamchat.py), we show a basic use case for a ChatGLM3 model to stream chat, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@@ -147,21 +150,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -172,6 +178,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -183,10 +190,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
+
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -201,7 +221,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -209,15 +229,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 **Stream Chat using `stream_chat()` API**:
 ```
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/codellama/README.md b/python/llm/example/GPU/PyTorch-Models/Model/codellama/README.md
index 01115cef..617b8bf5 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/codellama/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/codellama/README.md
@@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a CodeLlama model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install transformers==4.34.1 # CodeLlamaTokenizer is supported in higher version of transformers
 ```
 
@@ -25,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install transformers==4.34.1 # CodeLlamaTokenizer is supported in higher version of transformers
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -51,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -62,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -81,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -89,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/deciLM-7b/README.md b/python/llm/example/GPU/PyTorch-Models/Model/deciLM-7b/README.md
index 644c0205..3bbdc081 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/deciLM-7b/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/deciLM-7b/README.md
@@ -8,17 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a DeciLM-7B model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
-
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
-# below command will install intel_extension_for_pytorch==2.0.110+xpu as default
-# you can install specific ipex/torch version for your need
+# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install transformers==4.35.2 # required by DeciLM-7B
 ```
 
@@ -27,20 +23,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
+
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -51,6 +51,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -62,11 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -81,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -89,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 
 ### 4. Running examples
 
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/deepseek/README.md b/python/llm/example/GPU/PyTorch-Models/Model/deepseek/README.md
index d3c76f9f..3f777467 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/deepseek/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/deepseek/README.md
@@ -8,15 +8,11 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Deepseek model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
-# below command will install intel_extension_for_pytorch==2.0.110+xpu as default
-# you can install specific ipex/torch version for your need
+# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
@@ -25,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -50,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -61,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -80,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -88,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/distil-whisper/README.md b/python/llm/example/GPU/PyTorch-Models/Model/distil-whisper/README.md
index d72abcf3..f5407486 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/distil-whisper/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/distil-whisper/README.md
@@ -9,14 +9,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [recognize.py](./recognize.py), we show a basic use case for a Distil-Whisper model to conduct transcription using `pipeline()` API for long audio input, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install datasets soundfile librosa # required by audio processing
 ```
 
@@ -25,22 +24,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install datasets soundfile librosa # required by audio processing
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -51,6 +54,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -62,11 +66,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -81,7 +97,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -89,15 +105,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/dolly-v1/README.md b/python/llm/example/GPU/PyTorch-Models/Model/dolly-v1/README.md
index 4f80a814..006c68dd 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/dolly-v1/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/dolly-v1/README.md
@@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Dolly v1 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@@ -24,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/dolly-v2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/dolly-v2/README.md
index 28dab67b..8de52c9c 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/dolly-v2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/dolly-v2/README.md
@@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Dolly v2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@@ -24,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/flan-t5/README.md b/python/llm/example/GPU/PyTorch-Models/Model/flan-t5/README.md
index d42a7cb2..d4a45ec4 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/flan-t5/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/flan-t5/README.md
@@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Flan-t5 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@@ -24,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/internlm2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/internlm2/README.md
index a6e32dd8..ea314a1d 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/internlm2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/internlm2/README.md
@@ -21,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -46,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -57,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -76,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -84,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/llama2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/llama2/README.md
index b801c7fb..05f24e86 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/llama2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/llama2/README.md
@@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@@ -24,30 +21,35 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
 <details>
 
 <summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
+
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -59,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -78,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -86,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/llava/README.md b/python/llm/example/GPU/PyTorch-Models/Model/llava/README.md
index 424f3ec5..06449d2b 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/llava/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/llava/README.md
@@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a LLaVA model to start a multi-turn chat centered around an image using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install einops # install dependencies required by llava
 pip install transformers==4.36.2
 
@@ -31,8 +29,12 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install einops # install dependencies required by llava
 pip install transformers==4.36.2
 
@@ -40,29 +42,30 @@ git clone https://github.com/haotian-liu/LLaVA.git # clone the llava libary
 copy generate.py .\LLaVA\ # copy our example to the LLaVA folder
 cd LLaVA # change the working directory to the LLaVA folder
 git checkout tags/v1.2.0 -b 1.2.0 # Get the branch which is compatible with transformers 4.36
-
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
 <details>
 
 <summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
+
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -74,11 +77,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -93,7 +108,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -101,15 +116,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/mamba/README.md b/python/llm/example/GPU/PyTorch-Models/Model/mamba/README.md
index 7c30497a..d6c4f53a 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/mamba/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/mamba/README.md
@@ -7,33 +7,107 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 ## Example: Predict Tokens using `generate()` API
 In the example [generate.py](./generate.py), we show a basic use case for a Mamba model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+#### 1.1 Installation on Linux
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
-# below command will install intel_extension_for_pytorch==2.0.110+xpu as default
-# you can install specific ipex/torch version for your need
+# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install einops # package required by Mamba
 ```
 
-### 2. Configures OneAPI environment variables
+#### 1.2 Installation on Windows
+We suggest using conda to manage environment:
+```bash
+conda create -n llm python=3.11 libuv
+conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
+# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
+pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
+pip install einops # package required by Mamba
+```
+
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-### 3. Run
+### 3. Runtime Configurations
+For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
+#### 3.1 Configurations for Linux
+<details>
 
-For optimal performance on Arc, it is recommended to set several environment variables.
+<summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
 
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
+</details>
+
+<details>
+
+<summary>For Intel Data Center GPU Max Series</summary>
+
+```bash
+export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
+export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
+export ENABLE_SDP_FUSION=1
+```
+> Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
+</details>
+
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
+#### 3.2 Configurations for Windows
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```cmd
+set SYCL_CACHE_PERSISTENT=1
+set BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
+<details>
+
+<summary>For Intel Arc™ A-Series Graphics</summary>
+
+```cmd
+set SYCL_CACHE_PERSISTENT=1
+```
+
+</details>
+
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+### 4. Running examples
+
 ```bash
 python ./generate.py
 ```
@@ -45,7 +119,7 @@ In the example, several arguments can be passed to satisfy your requirements:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-#### 2.3 Sample Output
+#### 4.3 Sample Output
 #### [state-spaces/mamba-1.4b](https://huggingface.co/state-spaces/mamba-1.4b)
 ```log
 Inference time: xxxx s
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/mistral/README.md b/python/llm/example/GPU/PyTorch-Models/Model/mistral/README.md
index 565470e5..9423f514 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/mistral/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/mistral/README.md
@@ -10,13 +10,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Mistral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 
@@ -29,21 +26,27 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
+# Refer to https://huggingface.co/mistralai/Mistral-7B-v0.1#troubleshooting, please make sure you are using a stable version of Transformers, 4.34.0 or newer.
 pip install transformers==4.34.0
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
+
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -54,6 +57,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -65,11 +69,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -84,7 +100,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -92,15 +108,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/mixtral/README.md b/python/llm/example/GPU/PyTorch-Models/Model/mixtral/README.md
index 8f4a4dab..c54a257f 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/mixtral/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/mixtral/README.md
@@ -10,13 +10,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Mixtral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 
@@ -29,6 +26,9 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 
@@ -36,26 +36,28 @@ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-exte
 pip install transformers==4.36.0
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
 <details>
 
 <summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
+
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -67,11 +69,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -86,7 +100,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -94,15 +108,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/phi-1_5/README.md b/python/llm/example/GPU/PyTorch-Models/Model/phi-1_5/README.md
index 54a72a07..945d2665 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/phi-1_5/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/phi-1_5/README.md
@@ -8,14 +8,13 @@ To run these examples with IPEX-LLM, we have some recommended requirements for y
 In the example [generate.py](./generate.py), we show a basic use case for a phi-1_5 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
+# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install einops # additional package required for phi-1_5 to conduct generation
 ```
 
@@ -24,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install einops # additional package required for phi-1_5 to conduct generation
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -50,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -61,10 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
+
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -79,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -87,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/phi-2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/phi-2/README.md
index 4a201625..7c0707f9 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/phi-2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/phi-2/README.md
@@ -8,36 +8,39 @@ To run these examples with IPEX-LLM, we have some recommended requirements for y
 In the example [generate.py](./generate.py), we show a basic use case for a phi-2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
+# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install einops # additional package required for phi-2 to conduct generation
 ```
+
 #### 1.2 Installation on Windows
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -48,7 +51,9 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
+
 </details>
 
 <details>
@@ -58,10 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
+
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -76,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -84,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/phixtral/README.md b/python/llm/example/GPU/PyTorch-Models/Model/phixtral/README.md
index 9f1a33be..37068e0d 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/phixtral/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/phixtral/README.md
@@ -8,14 +8,13 @@ To run these examples with IPEX-LLM, we have some recommended requirements for y
 In the example [generate.py](./generate.py), we show a basic use case for a phixtral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
+# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install einops # additional package required for phixtral to conduct generation
 ```
 
@@ -24,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install einops # additional package required for phixtral to conduct generation
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -50,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -61,10 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
+
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -79,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -87,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/qwen-vl/README.md b/python/llm/example/GPU/PyTorch-Models/Model/qwen-vl/README.md
index 473cac40..6d0bc5b5 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/qwen-vl/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/qwen-vl/README.md
@@ -8,14 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [chat.py](./chat.py), we show a basic use case for a Qwen-VL model to start a multimodal chat using `chat()` API, with IPEX-LLM 'optimize_model' API on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install accelerate tiktoken einops transformers_stream_generator==0.0.4 scipy torchvision pillow tensorboard matplotlib # additional package required for Qwen-VL-Chat to conduct generation
 ```
 
@@ -24,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install accelerate tiktoken einops transformers_stream_generator==0.0.4 scipy torchvision pillow tensorboard matplotlib # additional package required for Qwen-VL-Chat to conduct generation
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -50,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -61,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -80,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -88,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/qwen1.5/README.md b/python/llm/example/GPU/PyTorch-Models/Model/qwen1.5/README.md
index 86b0f8c7..e8ca82f2 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/qwen1.5/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/qwen1.5/README.md
@@ -14,6 +14,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install transformers==4.37.0 # install transformers which supports Qwen2
 ```
 
@@ -22,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
-pip install transformers==4.37.2 # install transformers which supports Qwen2
+
+pip install transformers==4.37.0 # install transformers which supports Qwen2
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -48,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -59,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -78,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -86,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/replit/README.md b/python/llm/example/GPU/PyTorch-Models/Model/replit/README.md
index 8ad73633..3627c58e 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/replit/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/replit/README.md
@@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Replit model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install "transformers<4.35"
 ```
 
@@ -25,21 +23,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -47,13 +48,10 @@ For optimal performance, it is recommended to set several environment variables.
 
 <summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
 
-### 3. Run
-
-For optimal performance on Arc, it is recommended to set several environment variables.
-
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -65,10 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
+
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -83,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -91,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/solar/README.md b/python/llm/example/GPU/PyTorch-Models/Model/solar/README.md
index e0802db7..bd549640 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/solar/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/solar/README.md
@@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a SOLAR model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install transformers==4.35.2 # required by SOLAR
 ```
 
@@ -25,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install transformers==4.35.2 # required by SOLAR
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -51,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -62,10 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
+
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -80,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -88,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/speech-t5/README.md b/python/llm/example/GPU/PyTorch-Models/Model/speech-t5/README.md
index a0a1020c..bec5410a 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/speech-t5/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/speech-t5/README.md
@@ -8,15 +8,13 @@ To run these examples with IPEX-LLM, we have some recommended requirements for y
 In the example [synthesize_speech.py](./synthesize_speech.py), we show a basic use case for SpeechT5 model to synthesize speech based on the given text, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install "datasets<2.18" soundfile # additional package required for SpeechT5 to conduct generation
 ```
 
@@ -25,23 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install "datasets<2.18" soundfile # additional package required for SpeechT5 to conduct generation
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -52,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -63,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -82,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -90,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/stablelm/README.md b/python/llm/example/GPU/PyTorch-Models/Model/stablelm/README.md
index f322d64f..bb68e00a 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/stablelm/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/stablelm/README.md
@@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a StableLM model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 
@@ -27,6 +24,9 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 
@@ -34,17 +34,17 @@ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-exte
 pip install transformers==4.38.0
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -55,6 +55,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -66,11 +67,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -85,7 +98,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -93,15 +106,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/starcoder/README.md b/python/llm/example/GPU/PyTorch-Models/Model/starcoder/README.md
index 9580c1a8..aec53ec9 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/starcoder/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/starcoder/README.md
@@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a StarCoder model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@@ -24,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/yi/README.md b/python/llm/example/GPU/PyTorch-Models/Model/yi/README.md
index bac21baf..b3f751af 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/yi/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/yi/README.md
@@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Yi model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install einops # additional package required for Yi-6B to conduct generation
 ```
 
@@ -25,31 +23,37 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install einops # additional package required for Yi-6B to conduct generation
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
 <details>
 
 <summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
+
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
 
 </details>
@@ -61,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -80,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -88,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/yuan2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/yuan2/README.md
index 2def531d..8c562b22 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/yuan2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/yuan2/README.md
@@ -10,38 +10,42 @@ In addition, you need to modify some files in Yuan2-2B-hf folder, since Flash at
 In the example [generate.py](./generate.py), we show a basic use case for an Yuan2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-After installing conda, create a Python environment for IPEX-LLM:
+We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11
 conda activate llm
+# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
+pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 
-pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install einops # additional package required for Yuan2 to conduct generation
 pip install pandas # additional package required for Yuan2 to conduct generation
 ```
+
 #### 1.2 Installation on Windows
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
+# below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
+pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
+
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+
 pip install einops # additional package required for Yuan2 to conduct generation
 ```
 
-### 2. Configures OneAPI environment variables
-#### 2.1 Configurations for Linux
+### 2. Configures OneAPI environment variables for Linux
+
+> [!NOTE]
+> Skip this step if you are running on Windows.
+
+This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
+
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 
-#### 2.2 Configurations for Windows
-```cmd
-call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
-```
-> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@@ -49,12 +53,12 @@ For optimal performance, it is recommended to set several environment variables.
 
 <summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
 
-For optimal performance on Arc, it is recommended to set several environment variables.
-
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 ```
+
 </details>
 
 <details>
@@ -64,10 +68,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
+
+<details>
+
+<summary>For Intel iGPU</summary>
+
+```bash
+export SYCL_CACHE_PERSISTENT=1
+export BIGDL_LLM_XMX_DISABLED=1
+```
+
+</details>
+
 #### 3.2 Configurations for Windows
 <details>
 
@@ -82,7 +99,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 
 <details>
 
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@@ -90,15 +107,8 @@ set SYCL_CACHE_PERSISTENT=1
 
 </details>
 
-<details>
-
-<summary>For other Intel dGPU Series</summary>
-
-There is no need to set further environment variables.
-
-</details>
-
-> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
+> [!NOTE]
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 
 ```bash