diff --git a/README.md b/README.md
index 90bcea27..259dbd54 100644
--- a/README.md
+++ b/README.md
@@ -9,14 +9,16 @@ _**Fast, Distributed, Secure AI for Big Data**_
---
## Latest News
-- **Try the latest [`bigdl-llm`](python/llm) library for running LLM (large language model) on your Intel laptop using INT4 with very low latency!**[^1] *(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), etc., and supports any Hugging Face Transformers model)*
+- **Try the latest [`bigdl-llm`](python/llm) library for running LLM (large language model) on your Intel laptop or GPU using INT4 with very low latency!**[^1] *(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [qlora](https://github.com/artidoro/qlora), etc., and supports any Hugging Face Transformers model)*
-- **[Update] Over a dozen models have been verified on [`bigdl-llm`](python/llm)**, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, MPT, Falcon, Dolly-v1/Dolly-v2, StarCoder, Whisper, QWen, Baichuan,* and more; see the complete list [here](python/llm/README.md#verified-models).
+- **[Update] `bigdl-llm` now supports Intel Arc or Flex GPU; see the the latest GPU examples [here](https://github.com/jason-dai/BigDL/tree/main/python/llm/example/gpu).**
+
+- **Over a dozen models have been verified on [`bigdl-llm`](python/llm)**, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, MPT, Falcon, Dolly-v1/Dolly-v2, StarCoder, Whisper, QWen, Baichuan,* and more; see the complete list [here](python/llm/README.md#verified-models).
---
## Overview
diff --git a/python/llm/README.md b/python/llm/README.md
index 29293084..7e3e98a7 100644
--- a/python/llm/README.md
+++ b/python/llm/README.md
@@ -1,9 +1,12 @@
## BigDL-LLM
-**`bigdl-llm`** is a library for running ***LLM*** (large language model) on your Intel ***laptop*** using INT4 with very low latency[^1] (for any Hugging Face *Transformers* model).
+**`bigdl-llm`** is a library for running ***LLM*** (large language model) on your Intel ***laptop*** or ***GPU*** using INT4 with very low latency[^1] (for any Hugging Face *Transformers* model).
->*(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [ggml](https://github.com/ggerganov/ggml), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.)*
+> *It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [ggml](https://github.com/ggerganov/ggml), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [qlora](https://github.com/artidoro/qlora), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.*
+### Latest update
+ - `bigdl-llm` now supports Intel Arc or Flex GPU; see the the latest GPU examples [here](https://github.com/jason-dai/BigDL/tree/main/python/llm/example/gpu).
+
### Demos
See the ***optimized performance*** of `chatglm2-6b`, `llama-2-13b-chat`, and `starcoder-15b` models on a 12th Gen Intel Core CPU below.
diff --git a/python/llm/example/gpu/README.md b/python/llm/example/gpu/README.md
new file mode 100644
index 00000000..d5c4dff3
--- /dev/null
+++ b/python/llm/example/gpu/README.md
@@ -0,0 +1,24 @@
+# BigDL-LLM Transformers INT4 Optimization for Large Language Model on Intel GPUs
+You can use BigDL-LLM to run almost every Huggingface Transformer models with INT4 optimizations on your laptops with Intel GPUs. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
+
+## Verified Hardware Platforms
+
+- Intel Arc™ A-Series Graphics
+- Intel Data Center GPU Flex Series
+
+## Recommended Requirements
+To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation.
+
+Step 1, only Linux system is supported now, Ubuntu 22.04 is prefered.
+
+Step 2, please refer to our [drive installation](https://dgpu-docs.intel.com/driver/installation.html) for general purpose GPU capabilities.
+
+Step 3, you also need to download and install [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html). OneMKL and DPC++ compiler are needed, others are optional.
+> **Note**: IPEX 2.0.110+xpu requires Intel® oneAPI Base Toolkit's version >= 2023.2.0.
+
+## Best Known Configuration on Linux
+For better performance, it is recommended to set environment variables on Linux:
+```bash
+export USE_XETLA=OFF
+export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+```
diff --git a/python/llm/example/transformers/transformers_int4/GPU/baichuan/README.md b/python/llm/example/gpu/baichuan/README.md
similarity index 100%
rename from python/llm/example/transformers/transformers_int4/GPU/baichuan/README.md
rename to python/llm/example/gpu/baichuan/README.md
diff --git a/python/llm/example/transformers/transformers_int4/GPU/baichuan/generate.py b/python/llm/example/gpu/baichuan/generate.py
similarity index 100%
rename from python/llm/example/transformers/transformers_int4/GPU/baichuan/generate.py
rename to python/llm/example/gpu/baichuan/generate.py
diff --git a/python/llm/example/transformers/transformers_int4/GPU/chatglm2/README.md b/python/llm/example/gpu/chatglm2/README.md
similarity index 100%
rename from python/llm/example/transformers/transformers_int4/GPU/chatglm2/README.md
rename to python/llm/example/gpu/chatglm2/README.md
diff --git a/python/llm/example/transformers/transformers_int4/GPU/chatglm2/generate.py b/python/llm/example/gpu/chatglm2/generate.py
similarity index 100%
rename from python/llm/example/transformers/transformers_int4/GPU/chatglm2/generate.py
rename to python/llm/example/gpu/chatglm2/generate.py
diff --git a/python/llm/example/transformers/transformers_int4/GPU/chatglm2/streamchat.py b/python/llm/example/gpu/chatglm2/streamchat.py
similarity index 100%
rename from python/llm/example/transformers/transformers_int4/GPU/chatglm2/streamchat.py
rename to python/llm/example/gpu/chatglm2/streamchat.py
diff --git a/python/llm/example/transformers/transformers_int4/GPU/falcon/README.md b/python/llm/example/gpu/falcon/README.md
similarity index 100%
rename from python/llm/example/transformers/transformers_int4/GPU/falcon/README.md
rename to python/llm/example/gpu/falcon/README.md
diff --git a/python/llm/example/transformers/transformers_int4/GPU/falcon/falcon-7b-instruct/modelling_RW.py b/python/llm/example/gpu/falcon/falcon-7b-instruct/modelling_RW.py
similarity index 100%
rename from python/llm/example/transformers/transformers_int4/GPU/falcon/falcon-7b-instruct/modelling_RW.py
rename to python/llm/example/gpu/falcon/falcon-7b-instruct/modelling_RW.py
diff --git a/python/llm/example/transformers/transformers_int4/GPU/falcon/generate.py b/python/llm/example/gpu/falcon/generate.py
similarity index 100%
rename from python/llm/example/transformers/transformers_int4/GPU/falcon/generate.py
rename to python/llm/example/gpu/falcon/generate.py
diff --git a/python/llm/example/transformers/transformers_int4/GPU/internlm/README.md b/python/llm/example/gpu/internlm/README.md
similarity index 100%
rename from python/llm/example/transformers/transformers_int4/GPU/internlm/README.md
rename to python/llm/example/gpu/internlm/README.md
diff --git a/python/llm/example/transformers/transformers_int4/GPU/internlm/generate.py b/python/llm/example/gpu/internlm/generate.py
similarity index 100%
rename from python/llm/example/transformers/transformers_int4/GPU/internlm/generate.py
rename to python/llm/example/gpu/internlm/generate.py
diff --git a/python/llm/example/transformers/transformers_int4/GPU/llama2/README.md b/python/llm/example/gpu/llama2/README.md
similarity index 100%
rename from python/llm/example/transformers/transformers_int4/GPU/llama2/README.md
rename to python/llm/example/gpu/llama2/README.md
diff --git a/python/llm/example/transformers/transformers_int4/GPU/llama2/generate.py b/python/llm/example/gpu/llama2/generate.py
similarity index 100%
rename from python/llm/example/transformers/transformers_int4/GPU/llama2/generate.py
rename to python/llm/example/gpu/llama2/generate.py
diff --git a/python/llm/example/transformers/transformers_int4/GPU/mpt/README.md b/python/llm/example/gpu/mpt/README.md
similarity index 100%
rename from python/llm/example/transformers/transformers_int4/GPU/mpt/README.md
rename to python/llm/example/gpu/mpt/README.md
diff --git a/python/llm/example/transformers/transformers_int4/GPU/mpt/generate.py b/python/llm/example/gpu/mpt/generate.py
similarity index 100%
rename from python/llm/example/transformers/transformers_int4/GPU/mpt/generate.py
rename to python/llm/example/gpu/mpt/generate.py
diff --git a/python/llm/example/transformers/transformers_int4/GPU/qwen/README.md b/python/llm/example/gpu/qwen/README.md
similarity index 100%
rename from python/llm/example/transformers/transformers_int4/GPU/qwen/README.md
rename to python/llm/example/gpu/qwen/README.md
diff --git a/python/llm/example/transformers/transformers_int4/GPU/qwen/generate.py b/python/llm/example/gpu/qwen/generate.py
similarity index 100%
rename from python/llm/example/transformers/transformers_int4/GPU/qwen/generate.py
rename to python/llm/example/gpu/qwen/generate.py
diff --git a/python/llm/example/transformers/transformers_int4/GPU/starcoder/generate.py b/python/llm/example/gpu/starcoder/generate.py
similarity index 100%
rename from python/llm/example/transformers/transformers_int4/GPU/starcoder/generate.py
rename to python/llm/example/gpu/starcoder/generate.py
diff --git a/python/llm/example/transformers/transformers_int4/GPU/starcoder/readme.md b/python/llm/example/gpu/starcoder/readme.md
similarity index 100%
rename from python/llm/example/transformers/transformers_int4/GPU/starcoder/readme.md
rename to python/llm/example/gpu/starcoder/readme.md
diff --git a/python/llm/example/transformers/transformers_int4/GPU/voiceassistant/README.md b/python/llm/example/gpu/voiceassistant/README.md
similarity index 100%
rename from python/llm/example/transformers/transformers_int4/GPU/voiceassistant/README.md
rename to python/llm/example/gpu/voiceassistant/README.md
diff --git a/python/llm/example/transformers/transformers_int4/GPU/voiceassistant/generate.py b/python/llm/example/gpu/voiceassistant/generate.py
similarity index 100%
rename from python/llm/example/transformers/transformers_int4/GPU/voiceassistant/generate.py
rename to python/llm/example/gpu/voiceassistant/generate.py
diff --git a/python/llm/example/transformers/transformers_int4/GPU/whisper/readme.md b/python/llm/example/gpu/whisper/readme.md
similarity index 100%
rename from python/llm/example/transformers/transformers_int4/GPU/whisper/readme.md
rename to python/llm/example/gpu/whisper/readme.md
diff --git a/python/llm/example/transformers/transformers_int4/GPU/whisper/recognize.py b/python/llm/example/gpu/whisper/recognize.py
similarity index 100%
rename from python/llm/example/transformers/transformers_int4/GPU/whisper/recognize.py
rename to python/llm/example/gpu/whisper/recognize.py
diff --git a/python/llm/example/transformers/transformers_int4/GPU/README.md b/python/llm/example/transformers/transformers_int4/GPU/README.md
index d5c4dff3..f12e7824 100644
--- a/python/llm/example/transformers/transformers_int4/GPU/README.md
+++ b/python/llm/example/transformers/transformers_int4/GPU/README.md
@@ -1,24 +1 @@
-# BigDL-LLM Transformers INT4 Optimization for Large Language Model on Intel GPUs
-You can use BigDL-LLM to run almost every Huggingface Transformer models with INT4 optimizations on your laptops with Intel GPUs. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
-
-## Verified Hardware Platforms
-
-- Intel Arc™ A-Series Graphics
-- Intel Data Center GPU Flex Series
-
-## Recommended Requirements
-To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation.
-
-Step 1, only Linux system is supported now, Ubuntu 22.04 is prefered.
-
-Step 2, please refer to our [drive installation](https://dgpu-docs.intel.com/driver/installation.html) for general purpose GPU capabilities.
-
-Step 3, you also need to download and install [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html). OneMKL and DPC++ compiler are needed, others are optional.
-> **Note**: IPEX 2.0.110+xpu requires Intel® oneAPI Base Toolkit's version >= 2023.2.0.
-
-## Best Known Configuration on Linux
-For better performance, it is recommended to set environment variables on Linux:
-```bash
-export USE_XETLA=OFF
-export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
-```
+### The GPU examples for `bigdl-llm` have been moved to [here](../../../gpu).