GPU configuration update for examples (windows pip installer, etc.) (#10762)

* renew chatglm3-6b gpu example readme fix fix fix * fix for comments * fix * fix * fix * fix * fix * apply on HF-Transformers-AutoModels * apply on PyTorch-Models * fix * fix
2024-04-15 17:42:52 +08:00 · 2024-04-15 17:42:52 +08:00 · 73a67804a4
commit 73a67804a4
parent 1bd431976d
74 changed files with 2253 additions and 1532 deletions
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila/README.md
@ -21,27 +21,30 @@ conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 #### 1.2 Installation on Windows
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -52,6 +55,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -63,11 +67,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -82,7 +98,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -90,15 +106,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila2/README.md
@ -27,22 +27,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -53,6 +55,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -64,11 +67,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -83,7 +98,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -91,15 +106,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan/README.md
@ -14,6 +14,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install transformers_stream_generator  # additional package required for Baichuan-13B-Chat to conduct generation
 ```
@ -22,23 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install transformers_stream_generator  # additional package required for Baichuan-13B-Chat to conduct generation
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -49,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -60,10 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -78,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -86,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/README.md
@ -14,6 +14,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install transformers_stream_generator  # additional package required for Baichuan-7B-Chat to conduct generation
 ```
@ -22,23 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install transformers_stream_generator  # additional package required for Baichuan-7B-Chat to conduct generation
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -49,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -60,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -79,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -87,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/bluelm/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/bluelm/README.md
@ -21,22 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -47,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -58,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -77,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -85,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2/README.md
@ -17,25 +17,29 @@ conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 #### 1.2 Installation on Windows
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
@ -47,6 +51,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -58,11 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -77,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -85,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm3/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm3/README.md
@ -22,31 +22,35 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
 <details>
 <summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -58,11 +62,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -77,7 +93,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -85,15 +101,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
@ -149,20 +158,23 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
@ -174,6 +186,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -185,11 +198,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -204,7 +229,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -212,15 +237,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 **Stream Chat using `stream_chat()` API**:
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chinese-llama2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chinese-llama2/README.md
@ -21,20 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
-#### 2.2 Configurations for Windows
+
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -45,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -56,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -75,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -83,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/codellama/readme.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/codellama/readme.md
@ -14,6 +14,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install transformers==4.34.1 # CodeLlamaTokenizer is supported in higher version of transformers
 ```
@ -22,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install transformers==4.34.1 # CodeLlamaTokenizer is supported in higher version of transformers
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -48,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -59,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -78,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -86,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deciLM-7b/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deciLM-7b/README.md
@ -8,38 +8,41 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a DeciLM-7B model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.11
 conda activate llm
-# below command will install intel_extension_for_pytorch==2.0.110+xpu as default
+# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 # you can install specific ipex/torch version for your need
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install transformers==4.35.2  # required by DeciLM-7B
 ```
 #### 1.2 Installation on Windows
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install transformers==4.35.2  # required by DeciLM-7B
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
-#### 2.2 Configurations for Windows
+
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -50,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -61,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -80,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -88,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deepseek/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deepseek/README.md
@ -12,8 +12,7 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11
 conda activate llm
-# below command will install intel_extension_for_pytorch==2.0.110+xpu as default
+# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 # you can install specific ipex/torch version for your need
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@ -22,20 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
-#### 2.2 Configurations for Windows
+
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -46,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -57,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -76,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -84,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/distil-whisper/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/distil-whisper/README.md
@ -9,14 +9,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [recognize.py](./recognize.py), we show a basic use case for a Distil-Whisper model to conduct transcription using `pipeline()` API for long audio input, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install datasets soundfile librosa # required by audio processing
 ```
@ -25,22 +24,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install datasets soundfile librosa # required by audio processing
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -51,6 +54,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -62,11 +66,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -81,7 +97,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -89,15 +105,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v1/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v1/README.md
@ -23,22 +23,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -49,6 +51,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -60,11 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -79,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -87,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v2/README.md
@ -21,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -46,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -57,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -76,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -84,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon/README.md
@ -15,6 +15,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops # additional package required for falcon-7b-instruct to conduct generation
 ```
@ -23,8 +24,12 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops # additional package required for falcon-7b-instruct to conduct generation
 ```
@ -53,18 +58,17 @@ print(f'tiiuae/falcon-7b-instruct checkpoint is downloaded to {model_path}')
 #### 2.2 Replace `modelling_RW.py`
 For `tiiuae/falcon-7b-instruct`, you should replace the `modelling_RW.py` with [falcon-7b-instruct/modelling_RW.py](./falcon-7b-instruct/modelling_RW.py).
 ### 3. Configures OneAPI environment variables for Linux
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ### 3. Configures OneAPI environment variables
 #### 3.1 Configurations for Linux
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 3.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 4. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 4.1 Configurations for Linux
@ -75,6 +79,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -86,11 +91,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 4.2 Configurations for Windows
 <details>
@ -105,7 +122,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -113,15 +130,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 5. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/flan-t5/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/flan-t5/README.md
@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Flan-t5 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@ -24,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gemma/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gemma/README.md
@ -10,13 +10,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Gemma model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
@ -29,6 +26,9 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
@ -36,17 +36,17 @@ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-exte
 pip install transformers==4.38.1
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -57,6 +57,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -68,11 +69,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -87,7 +100,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -95,15 +108,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gpt-j/readme.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gpt-j/readme.md
@ -21,22 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -47,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -58,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -77,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -85,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm/README.md
@ -21,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -46,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -57,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -76,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -84,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm2/README.md
@ -21,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -46,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -57,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -76,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -84,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2/README.md
@ -21,20 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
-#### 2.2 Configurations for Windows
+
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -45,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -56,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -75,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -83,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral/README.md
@ -10,13 +10,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Mistral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
@ -29,6 +26,9 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
@ -36,17 +36,17 @@ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-exte
 pip install transformers==4.34.0
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -57,6 +57,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -68,11 +69,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -87,7 +100,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -95,15 +108,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral/README.md
@ -10,13 +10,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Mixtral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
@ -29,6 +26,9 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
@ -36,17 +36,17 @@ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-exte
 pip install transformers==4.36.0
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -57,6 +57,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -68,11 +69,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -87,7 +100,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -95,15 +108,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mpt/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mpt/README.md
@ -14,6 +14,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops  # additional package required for mpt-7b-chat and mpt-30b-chat to conduct generation
 ```
@ -22,21 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops  # additional package required for mpt-7b-chat and mpt-30b-chat to conduct generation
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -47,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -58,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -77,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -85,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-1_5/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-1_5/README.md
@ -14,6 +14,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops # additional package required for phi-1_5 to conduct generation
 ```
@ -22,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops # additional package required for phi-1_5 to conduct generation
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -48,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -59,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -78,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -86,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-2/README.md
@ -14,27 +14,34 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops # additional package required for phi-2 to conduct generation
 ```
 #### 1.2 Installation on Windows
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops # additional package required for phi-2 to conduct generation
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
@ -46,7 +53,9 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
 <details>
@ -56,10 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -74,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -82,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phixtral/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phixtral/README.md
@ -21,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -46,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -57,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -76,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -84,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen-vl/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen-vl/README.md
@ -8,14 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [chat.py](./chat.py), we show a basic use case for a Qwen-VL model to start a multimodal chat using `chat()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install accelerate tiktoken einops transformers_stream_generator==0.0.4 scipy torchvision pillow tensorboard matplotlib # additional package required for Qwen-VL-Chat to conduct generation
 ```
@ -24,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install accelerate tiktoken einops transformers_stream_generator==0.0.4 scipy torchvision pillow tensorboard matplotlib # additional package required for Qwen-VL-Chat to conduct generation
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -50,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -61,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -80,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -88,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen/README.md
@ -14,6 +14,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install tiktoken einops transformers_stream_generator  # additional package required for Qwen-7B-Chat to conduct generation
 ```
@ -22,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install tiktoken einops transformers_stream_generator  # additional package required for Qwen-7B-Chat to conduct generation
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -48,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -59,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -78,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -86,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen1.5/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen1.5/README.md
@ -14,6 +14,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install transformers==4.37.0 # install transformers which supports Qwen2
 ```
@ -22,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
-pip install transformers==4.37.2 # install transformers which supports Qwen2
+
 pip install transformers==4.37.0 # install transformers which supports Qwen2
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -48,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -59,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -78,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -86,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/redpajama/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/redpajama/README.md
@ -8,9 +8,7 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for an redpajama model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.11
 conda activate llm
@ -23,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -48,9 +49,9 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
 <details>
@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/replit/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/replit/README.md
@ -8,14 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for an Replit model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install "transformers<4.35"
 ```
@ -24,21 +23,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -49,9 +51,9 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
 <details>
@ -61,11 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -80,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -88,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv4/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv4/README.md
@ -17,25 +17,29 @@ conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 #### 1.2 Installation on Windows
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
@ -47,6 +51,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -58,11 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -77,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -85,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv5/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv5/README.md
@ -17,25 +17,30 @@ conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 #### 1.2 Installation on Windows
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
-#### 2.2 Configurations for Windows
+
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -46,6 +51,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -57,10 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -75,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -83,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
 python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/solar/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/solar/README.md
@ -14,6 +14,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install transformers==4.35.2 # required by SOLAR
 ```
@ -22,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install transformers==4.35.2 # required by SOLAR
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -48,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -59,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -78,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -86,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/stablelm/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/stablelm/README.md
@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a StableLM model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
@ -27,6 +24,9 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
@ -34,17 +34,17 @@ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-exte
 pip install transformers==4.38.0
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -55,6 +55,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -66,11 +67,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -85,7 +98,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -93,15 +106,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder/readme.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder/readme.md
@ -21,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -46,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -57,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -76,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -84,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/vicuna/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/vicuna/README.md
@ -23,22 +23,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -49,6 +51,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -60,11 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -79,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -87,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/voiceassistant/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/voiceassistant/README.md
@ -16,6 +16,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install librosa soundfile datasets
 pip install accelerate
 pip install SpeechRecognition sentencepiece colorama
@ -28,25 +29,29 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install librosa soundfile datasets
 pip install accelerate
 pip install SpeechRecognition sentencepiece colorama
 pip install PyAudio inquirer
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -57,6 +62,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -68,11 +74,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -87,7 +105,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -95,15 +113,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper/readme.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper/readme.md
@ -15,6 +15,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install datasets soundfile librosa # required by audio processing
 ```
@ -23,23 +24,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install datasets soundfile librosa # required by audio processing
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -50,6 +54,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -61,11 +66,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -80,7 +97,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -88,15 +105,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
 python ./recognize.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --repo-id-or-data-path REPO_ID_OR_DATA_PATH --language LANGUAGE
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yi/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yi/README.md
@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Yi model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops # additional package required for Yi-6B to conduct generation
 ```
@ -25,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops # additional package required for Yi-6B to conduct generation
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -51,9 +53,9 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
 <details>
@ -63,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -82,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -90,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yuan2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yuan2/README.md
@ -15,31 +15,37 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops # additional package required for Yuan2 to conduct generation
 pip install pandas # additional package required for Yuan2 to conduct generation
 ```
 #### 1.2 Installation on Windows
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops # additional package required for Yuan2 to conduct generation
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -50,6 +56,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -61,10 +68,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -79,7 +99,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -87,15 +107,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/PyTorch-Models/Model/aquila2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/aquila2/README.md
@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Aquila2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@ -24,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/PyTorch-Models/Model/baichuan/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/baichuan/README.md
@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Baichuan model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install transformers_stream_generator  # additional package required for Baichuan-13B-Chat to conduct generation
 ```
@ -25,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install transformers_stream_generator  # additional package required for Baichuan-13B-Chat to conduct generation
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -51,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -62,10 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -80,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -88,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/PyTorch-Models/Model/baichuan2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/baichuan2/README.md
@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Baichuan2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install transformers_stream_generator  # additional package required for Baichuan2-7B-Chat to conduct generation
 ```
@ -25,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install transformers_stream_generator  # additional package required for Baichuan2-7B-Chat to conduct generation
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -51,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -62,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -81,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -89,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/PyTorch-Models/Model/bark/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/bark/README.md
@ -8,15 +8,13 @@ To run these examples with IPEX-LLM, we have some recommended requirements for y
 In the example [synthesize_speech.py](./synthesize_speech.py), we show a basic use case for Bark model to synthesize speech based on the given text, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install scipy
 ```
@ -25,23 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install scipy
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -52,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -63,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -82,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -90,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/PyTorch-Models/Model/bluelm/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/bluelm/README.md
@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a BlueLM model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@ -24,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/PyTorch-Models/Model/chatglm2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/chatglm2/README.md
@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@ -24,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
@ -132,13 +138,10 @@ Inference time: xxxx s
 In the example [streamchat.py](./streamchat.py), we show a basic use case for a ChatGLM2 model to stream chat, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@ -148,21 +151,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -173,6 +179,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -184,10 +191,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -202,7 +222,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -210,15 +230,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 **Stream Chat using `stream_chat()` API**:
--- a/python/llm/example/GPU/PyTorch-Models/Model/chatglm3/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/chatglm3/README.md
@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM3 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@ -24,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
@ -131,13 +137,10 @@ AI stands for Artificial Intelligence. It refers to the development of computer
 In the example [streamchat.py](./streamchat.py), we show a basic use case for a ChatGLM3 model to stream chat, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@ -147,21 +150,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -172,6 +178,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -183,10 +190,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -201,7 +221,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -209,15 +229,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 **Stream Chat using `stream_chat()` API**:
 ```
--- a/python/llm/example/GPU/PyTorch-Models/Model/codellama/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/codellama/README.md
@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a CodeLlama model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install transformers==4.34.1 # CodeLlamaTokenizer is supported in higher version of transformers
 ```
@ -25,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install transformers==4.34.1 # CodeLlamaTokenizer is supported in higher version of transformers
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -51,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -62,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -81,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -89,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/PyTorch-Models/Model/deciLM-7b/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/deciLM-7b/README.md
@ -8,17 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a DeciLM-7B model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
+# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 # below command will install intel_extension_for_pytorch==2.0.110+xpu as default
 # you can install specific ipex/torch version for your need
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install transformers==4.35.2 # required by DeciLM-7B
 ```
@ -27,20 +23,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
-#### 2.2 Configurations for Windows
+
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -51,6 +51,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -62,11 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -81,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -89,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
--- a/python/llm/example/GPU/PyTorch-Models/Model/deepseek/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/deepseek/README.md
@ -8,15 +8,11 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Deepseek model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
+# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 # below command will install intel_extension_for_pytorch==2.0.110+xpu as default
 # you can install specific ipex/torch version for your need
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@ -25,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -50,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -61,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -80,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -88,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/PyTorch-Models/Model/distil-whisper/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/distil-whisper/README.md
@ -9,14 +9,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [recognize.py](./recognize.py), we show a basic use case for a Distil-Whisper model to conduct transcription using `pipeline()` API for long audio input, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install datasets soundfile librosa # required by audio processing
 ```
@ -25,22 +24,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install datasets soundfile librosa # required by audio processing
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -51,6 +54,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -62,11 +66,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -81,7 +97,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -89,15 +105,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/PyTorch-Models/Model/dolly-v1/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/dolly-v1/README.md
@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Dolly v1 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@ -24,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/PyTorch-Models/Model/dolly-v2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/dolly-v2/README.md
@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Dolly v2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@ -24,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/PyTorch-Models/Model/flan-t5/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/flan-t5/README.md
@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Flan-t5 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@ -24,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/PyTorch-Models/Model/internlm2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/internlm2/README.md
@ -21,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -46,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -57,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -76,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -84,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/PyTorch-Models/Model/llama2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/llama2/README.md
@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@ -24,30 +21,35 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
 <details>
 <summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -59,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -78,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -86,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/PyTorch-Models/Model/llava/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/llava/README.md
@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a LLaVA model to start a multi-turn chat centered around an image using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops # install dependencies required by llava
 pip install transformers==4.36.2
@ -31,8 +29,12 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops # install dependencies required by llava
 pip install transformers==4.36.2
@ -40,29 +42,30 @@ git clone https://github.com/haotian-liu/LLaVA.git # clone the llava libary
 copy generate.py .\LLaVA\ # copy our example to the LLaVA folder
 cd LLaVA # change the working directory to the LLaVA folder
 git checkout tags/v1.2.0 -b 1.2.0 # Get the branch which is compatible with transformers 4.36
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
 <details>
 <summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -74,11 +77,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -93,7 +108,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -101,15 +116,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/PyTorch-Models/Model/mamba/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/mamba/README.md
@ -7,33 +7,107 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 ## Example: Predict Tokens using `generate()` API
 In the example [generate.py](./generate.py), we show a basic use case for a Mamba model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+#### 1.1 Installation on Linux
-
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
+# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 # below command will install intel_extension_for_pytorch==2.0.110+xpu as default
 # you can install specific ipex/torch version for your need
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops # package required by Mamba
 ```
-### 2. Configures OneAPI environment variables
+#### 1.2 Installation on Windows
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops # package required by Mamba
 ```
 ### 2. Configures OneAPI environment variables for Linux
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
-### 3. Run
+### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
 <details>
-For optimal performance on Arc, it is recommended to set several environment variables.
+<summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
 <details>
 <summary>For Intel Data Center GPU Max Series</summary>
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
 <summary>For Intel iGPU</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
 set BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 <details>
 <summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
 ```
 </details>
 > [!NOTE]
 > For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
 python ./generate.py
 ```
@ -45,7 +119,7 @@ In the example, several arguments can be passed to satisfy your requirements:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
-#### 2.3 Sample Output
+#### 4.3 Sample Output
 #### [state-spaces/mamba-1.4b](https://huggingface.co/state-spaces/mamba-1.4b)
 ```log
 Inference time: xxxx s
--- a/python/llm/example/GPU/PyTorch-Models/Model/mistral/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/mistral/README.md
@ -10,13 +10,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Mistral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
@ -29,21 +26,27 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 # Refer to https://huggingface.co/mistralai/Mistral-7B-v0.1#troubleshooting, please make sure you are using a stable version of Transformers, 4.34.0 or newer.
 pip install transformers==4.34.0
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
-#### 2.2 Configurations for Windows
+
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -54,6 +57,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -65,11 +69,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -84,7 +100,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -92,15 +108,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/PyTorch-Models/Model/mixtral/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/mixtral/README.md
@ -10,13 +10,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Mixtral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
@ -29,6 +26,9 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
@ -36,26 +36,28 @@ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-exte
 pip install transformers==4.36.0
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
 <details>
 <summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -67,11 +69,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -86,7 +100,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -94,15 +108,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/PyTorch-Models/Model/phi-1_5/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/phi-1_5/README.md
@ -8,14 +8,13 @@ To run these examples with IPEX-LLM, we have some recommended requirements for y
 In the example [generate.py](./generate.py), we show a basic use case for a phi-1_5 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
+# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops # additional package required for phi-1_5 to conduct generation
 ```
@ -24,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops # additional package required for phi-1_5 to conduct generation
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -50,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -61,10 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -79,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -87,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/PyTorch-Models/Model/phi-2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/phi-2/README.md
@ -8,36 +8,39 @@ To run these examples with IPEX-LLM, we have some recommended requirements for y
 In the example [generate.py](./generate.py), we show a basic use case for a phi-2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
+# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops # additional package required for phi-2 to conduct generation
 ```
 #### 1.2 Installation on Windows
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -48,7 +51,9 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
 <details>
@ -58,10 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -76,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -84,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/PyTorch-Models/Model/phixtral/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/phixtral/README.md
@ -8,14 +8,13 @@ To run these examples with IPEX-LLM, we have some recommended requirements for y
 In the example [generate.py](./generate.py), we show a basic use case for a phixtral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
-
+# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops # additional package required for phixtral to conduct generation
 ```
@ -24,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops # additional package required for phixtral to conduct generation
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -50,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -61,10 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -79,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -87,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/PyTorch-Models/Model/qwen-vl/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/qwen-vl/README.md
@ -8,14 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [chat.py](./chat.py), we show a basic use case for a Qwen-VL model to start a multimodal chat using `chat()` API, with IPEX-LLM 'optimize_model' API on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install accelerate tiktoken einops transformers_stream_generator==0.0.4 scipy torchvision pillow tensorboard matplotlib # additional package required for Qwen-VL-Chat to conduct generation
 ```
@ -24,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install accelerate tiktoken einops transformers_stream_generator==0.0.4 scipy torchvision pillow tensorboard matplotlib # additional package required for Qwen-VL-Chat to conduct generation
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -50,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -61,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -80,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -88,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/PyTorch-Models/Model/qwen1.5/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/qwen1.5/README.md
@ -14,6 +14,7 @@ conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install transformers==4.37.0 # install transformers which supports Qwen2
 ```
@ -22,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
-pip install transformers==4.37.2 # install transformers which supports Qwen2
+
 pip install transformers==4.37.0 # install transformers which supports Qwen2
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -48,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -59,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -78,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -86,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```
--- a/python/llm/example/GPU/PyTorch-Models/Model/replit/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/replit/README.md
@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Replit model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install "transformers<4.35"
 ```
@ -25,21 +23,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -47,13 +48,10 @@ For optimal performance, it is recommended to set several environment variables.
 <summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
 ### 3. Run
 For optimal performance on Arc, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -65,10 +63,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -83,7 +94,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -91,15 +102,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/PyTorch-Models/Model/solar/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/solar/README.md
@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a SOLAR model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install transformers==4.35.2 # required by SOLAR
 ```
@ -25,22 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install transformers==4.35.2 # required by SOLAR
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -51,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -62,10 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -80,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -88,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/PyTorch-Models/Model/speech-t5/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/speech-t5/README.md
@ -8,15 +8,13 @@ To run these examples with IPEX-LLM, we have some recommended requirements for y
 In the example [synthesize_speech.py](./synthesize_speech.py), we show a basic use case for SpeechT5 model to synthesize speech based on the given text, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install "datasets<2.18" soundfile # additional package required for SpeechT5 to conduct generation
 ```
@ -25,23 +23,26 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install "datasets<2.18" soundfile # additional package required for SpeechT5 to conduct generation
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -52,6 +53,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -63,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -82,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -90,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/PyTorch-Models/Model/stablelm/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/stablelm/README.md
@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a StableLM model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
@ -27,6 +24,9 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
@ -34,17 +34,17 @@ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-exte
 pip install transformers==4.38.0
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -55,6 +55,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -66,11 +67,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -85,7 +98,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -93,15 +106,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/PyTorch-Models/Model/starcoder/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/starcoder/README.md
@ -8,13 +8,10 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a StarCoder model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
@ -24,21 +21,24 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -49,6 +49,7 @@ For optimal performance, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -60,11 +61,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -79,7 +92,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -87,15 +100,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/PyTorch-Models/Model/yi/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/yi/README.md
@ -8,15 +8,13 @@ To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requ
 In the example [generate.py](./generate.py), we show a basic use case for a Yi model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
-conda create -n llm python=3.11 # recommend to use Python 3.11
+conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops # additional package required for Yi-6B to conduct generation
 ```
@ -25,31 +23,37 @@ We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops # additional package required for Yi-6B to conduct generation
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
 <details>
 <summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
@ -61,11 +65,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -80,7 +96,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -88,15 +104,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash
--- a/python/llm/example/GPU/PyTorch-Models/Model/yuan2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/yuan2/README.md
@ -10,38 +10,42 @@ In addition, you need to modify some files in Yuan2-2B-hf folder, since Flash at
 In the example [generate.py](./generate.py), we show a basic use case for an Yuan2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
-We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
+We suggest using conda to manage environment:
 After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.11
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install einops # additional package required for Yuan2 to conduct generation
 pip install pandas # additional package required for Yuan2 to conduct generation
 ```
 #### 1.2 Installation on Windows
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.11 libuv
 conda activate llm
 # below command will use pip to install the Intel oneAPI Base Toolkit 2024.0
 pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 pip install einops # additional package required for Yuan2 to conduct generation
 ```
-### 2. Configures OneAPI environment variables
+### 2. Configures OneAPI environment variables for Linux
-#### 2.1 Configurations for Linux
+
 > [!NOTE]
 > Skip this step if you are running on Windows.
 This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 ```bash
 source /opt/intel/oneapi/setvars.sh
 ```
 #### 2.2 Configurations for Windows
 ```cmd
 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 ```
 > Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
 ### 3. Runtime Configurations
 For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 #### 3.1 Configurations for Linux
@ -49,12 +53,12 @@ For optimal performance, it is recommended to set several environment variables.
 <summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
 For optimal performance on Arc, it is recommended to set several environment variables.
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 ```
 </details>
 <details>
@ -64,10 +68,23 @@ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```bash
 export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 export SYCL_CACHE_PERSISTENT=1
 export ENABLE_SDP_FUSION=1
 ```
 > Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
 </details>
 <details>
 <summary>For Intel iGPU</summary>
 ```bash
 export SYCL_CACHE_PERSISTENT=1
 export BIGDL_LLM_XMX_DISABLED=1
 ```
 </details>
 #### 3.2 Configurations for Windows
 <details>
@ -82,7 +99,7 @@ set BIGDL_LLM_XMX_DISABLED=1
 <details>
-<summary>For Intel Arc™ A300-Series or Pro A60</summary>
+<summary>For Intel Arc™ A-Series Graphics</summary>
 ```cmd
 set SYCL_CACHE_PERSISTENT=1
@ -90,15 +107,8 @@ set SYCL_CACHE_PERSISTENT=1
 </details>
-<details>
+> [!NOTE]
-
+> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 <summary>For other Intel dGPU Series</summary>
 There is no need to set further environment variables.
 </details>
 > Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 ### 4. Running examples
 ```bash