diff --git a/docs/readthedocs/source/doc/LLM/Overview/examples.rst b/docs/readthedocs/source/doc/LLM/Overview/examples.rst
index e61f1c4b..c531d8b7 100644
--- a/docs/readthedocs/source/doc/LLM/Overview/examples.rst
+++ b/docs/readthedocs/source/doc/LLM/Overview/examples.rst
@@ -1,7 +1,7 @@
 BigDL-LLM Examples
 ================================
 
-You can use BigDL-LLM to run any Huggingface *Transfomers* models with INT4 optimizations on either servers or laptops.
+You can use BigDL-LLM to run any PyTorch model with INT4 optimizations on Intel XPU (from Laptop to GPU to Cloud).
 
 Here, we provide examples to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Please refer to the appropriate guide based on your device:
 
diff --git a/docs/readthedocs/source/doc/LLM/Overview/examples_cpu.md b/docs/readthedocs/source/doc/LLM/Overview/examples_cpu.md
index 7fdb934b..462231fb 100644
--- a/docs/readthedocs/source/doc/LLM/Overview/examples_cpu.md
+++ b/docs/readthedocs/source/doc/LLM/Overview/examples_cpu.md
@@ -6,21 +6,59 @@ To run these examples, please first refer to [here](./install_cpu.html) for more
 
 The following models have been verified on either servers or laptops with Intel CPUs.
 
-| Model     | Example                                                  |
-|-----------|----------------------------------------------------------|
-| LLaMA *(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.)* | [link1](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/native_int4), [link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/vicuna)    |
-| LLaMA 2   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/llama2)    |
-| MPT       | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/mpt)       |
-| Falcon    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/falcon)    |
-| ChatGLM   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/chatglm)   | 
-| ChatGLM2  | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/chatglm2)  | 
-| Qwen      | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/qwen)      |
-| MOSS      | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/moss)      | 
-| Baichuan  | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/baichuan)  | 
-| Dolly-v1  | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/dolly_v1)  | 
-| Dolly-v2  | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/dolly_v2)  | 
-| RedPajama | [link1](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/native_int4), [link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/redpajama) | 
-| Phoenix   | [link1](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/native_int4), [link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/phoenix)   | 
-| StarCoder | [link1](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/native_int4), [link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/starcoder) | 
-| InternLM  | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/internlm)  |
-| Whisper   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/whisper)   |
+## Example of PyTorch API
+
+| Model      | Example of PyTorch API                                |
+|------------|-------------------------------------------------------|
+| LLaMA 2    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/llama2)  |
+| ChatGLM    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/chatglm) |
+| Mistral    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/mistral) |
+| Bark       | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/bark)    |
+| BERT       | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/bert)    |
+| Openai Whisper    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/openai-whisper) |
+
+```eval_rst
+.. important::
+
+   In addition to INT4 optimization, BigDL-LLM also provides other low bit optimizations (such as INT8, INT5, NF4, etc.). You may apply other low bit optimizations through PyTorch API as `example <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/More-Data-Types>`_.
+```
+
+
+## Example of `transformers`-style API
+
+| Model      | Example of `transformers`-style API                   |
+|------------|-------------------------------------------------------|
+| LLaMA *(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.)* | [link1](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/Native-Models), [link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/vicuna) |
+| LLaMA 2    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/llama2) | [link1](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/Native-Models), [link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/llama2) |
+| ChatGLM    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/chatglm) | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm)   |
+| ChatGLM2   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm2)  |
+| Mistral    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mistral)   |
+| Falcon     | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/falcon)    |
+| MPT        | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mpt)       |
+| Dolly-v1   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v1)  |
+| Dolly-v2   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v2)  |
+| Replit Code| [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/replit)    |
+| RedPajama  | [link1](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/Native-Models), [link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/redpajama) |
+| Phoenix    | [link1](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/Native-Models), [link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/phoenix)   |
+| StarCoder  | [link1](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/Native-Models), [link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/starcoder) |
+| Baichuan   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan)  |
+| Baichuan2  | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan2) |
+| InternLM   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm)  |
+| Qwen       | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen)      |
+| Aquila     | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/aquila)    |
+| MOSS       | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/moss)      |
+| Whisper    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/whisper)   |
+
+```eval_rst
+.. important::
+
+   In addition to INT4 optimization, BigDL-LLM also provides other low bit optimizations (such as INT8, INT5, NF4, etc.). You may apply other low bit optimizations through ``transformers``-style API as `example <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/More-Data-Types>`_.
+```
+
+
+```eval_rst
+.. seealso::
+
+   See the complete examples `here <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU>`_.
+```
+
diff --git a/docs/readthedocs/source/doc/LLM/Overview/examples_gpu.md b/docs/readthedocs/source/doc/LLM/Overview/examples_gpu.md
index 48b83b59..b5504cbb 100644
--- a/docs/readthedocs/source/doc/LLM/Overview/examples_gpu.md
+++ b/docs/readthedocs/source/doc/LLM/Overview/examples_gpu.md
@@ -12,15 +12,59 @@ To run these examples, please first refer to [here](./install_gpu.html) for more
 
 The following models have been verified on either servers or laptops with Intel GPUs.
 
-| Model     | Example                                                  |
-|-----------|----------------------------------------------------------|
-| LLaMA 2   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/gpu/llama2)    |
-| MPT       | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/gpu/mpt)       |
-| Falcon    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/gpu/falcon)    |
-| ChatGLM2  | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/gpu/chatglm2)  | 
-| Qwen      | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/gpu/qwen)      |
-| Baichuan  | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/gpu/baichuan)  | 
-| StarCoder | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/gpu/starcoder) | 
-| InternLM  | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/gpu/internlm)  |
-| Whisper   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/gpu/whisper)   |
-| GPT-J     | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/gpu/gpt-j)     |
+## Example of PyTorch API
+
+| Model      | Example of PyTorch API                                |
+|------------|-------------------------------------------------------|
+| LLaMA 2    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/llama2)    |
+| ChatGLM 2  | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/chatglm2)  |
+| Mistral    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/mistral)   |
+| Baichuan   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/baichuan)  |
+| Baichuan2  | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/baichuan2) |
+| Replit     | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/replit)    |
+| StarCoder  | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/starcoder) |
+| Dolly-v1   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/dolly-v1)  |
+| Dolly-v2   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/dolly-v2)  |
+
+```eval_rst
+.. important::
+
+   In addition to INT4 optimization, BigDL-LLM also provides other low bit optimizations (such as INT8, INT5, NF4, etc.). You may apply other low bit optimizations through PyTorch API as `example <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/More-Data-Types>`_.
+```
+
+
+## Example of `transformers`-style API
+
+| Model      | Example of `transformers`-style API                   |
+|------------|-------------------------------------------------------|
+| LLaMA *(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.)* |[link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/vicuna)|
+| LLaMA 2    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2) |
+| ChatGLM2   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2)   |
+| Mistral    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral)    |
+| Falcon     | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon)     |
+| MPT        | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mpt)        |
+| Dolly-v1   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v1)   | 
+| Dolly-v2   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v2)   | 
+| Replit     | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/replit)     |
+| StarCoder  | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder)  | 
+| Baichuan   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan)   |
+| Baichuan2  | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2)  |
+| InternLM   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm)   |
+| Qwen       | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen)       |
+| Aquila     | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila)     |
+| Whisper    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper)    |
+| Chinese Llama2	    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chinese-llama2)    |
+| GPT-J    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gpt-j)    |
+
+```eval_rst
+.. important::
+
+   In addition to INT4 optimization, BigDL-LLM also provides other low bit optimizations (such as INT8, INT5, NF4, etc.). You may apply other low bit optimizations through ``transformers``-style API as `example <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types>`_.
+```
+
+
+```eval_rst
+.. seealso::
+
+   See the complete examples `here <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU>`_.
+```