diff --git a/README.md b/README.md
index 4c6110fe..ab006507 100644
--- a/README.md
+++ b/README.md
@@ -1,38 +1,31 @@
-<div align="center">
+## IPEX-LLM
 
-<p align="center"> <img src="https://llm-assets.readthedocs.io/en/latest/_images/bigdl_logo.jpg" height="140px"><br></p>
-
-</div>
-
----
-## BigDL-LLM
-
-**`bigdl-llm`** is a library for running **LLM** (large language model) on Intel **XPU** (from *Laptop* to *GPU* to *Cloud*) using **INT4/FP4/INT8/FP8** with very low latency[^1] (for any **PyTorch** model).
+**`ipex-llm`** is a library for running **LLM** (large language model) on Intel **XPU** (from *Laptop* to *GPU* to *Cloud*) using **INT4/FP4/INT8/FP8** with very low latency[^1] (for any **PyTorch** model).
 
 > *It is built on the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [qlora](https://github.com/artidoro/qlora), [gptq](https://github.com/IST-DASLab/gptq), [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ), [awq](https://github.com/mit-han-lab/llm-awq), [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), [vLLM](https://github.com/vllm-project/vllm), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.*
 
 ### Latest update 🔥 
-- [2024/03] **LangChain** added support for `bigdl-llm`; see the details [here](https://python.langchain.com/docs/integrations/llms/bigdl).
-- [2024/02] `bigdl-llm` now supports directly loading model from [ModelScope](python/llm/example/GPU/ModelScope-Models) ([魔搭](python/llm/example/CPU/ModelScope-Models)).
-- [2024/02] `bigdl-llm` added inital **INT2** support (based on llama.cpp [IQ2](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF-IQ2) mechanism), which makes it possible to run large-size LLM (e.g., Mixtral-8x7B) on Intel GPU with 16GB VRAM.
-- [2024/02] Users can now use `bigdl-llm` through [Text-Generation-WebUI](https://github.com/intel-analytics/text-generation-webui) GUI.
-- [2024/02] `bigdl-llm` now supports *[Self-Speculative Decoding](https://bigdl.readthedocs.io/en/latest/doc/LLM/Inference/Self_Speculative_Decoding.html)*, which in practice brings **~30% speedup** for FP16 and BF16 inference latency on Intel [GPU](python/llm/example/GPU/Speculative-Decoding) and [CPU](python/llm/example/CPU/Speculative-Decoding) respectively.
-- [2024/02] `bigdl-llm` now supports a comprehensive list of LLM finetuning on Intel GPU (including [LoRA](python/llm/example/GPU/LLM-Finetuning/LoRA), [QLoRA](python/llm/example/GPU/LLM-Finetuning/QLoRA), [DPO](python/llm/example/GPU/LLM-Finetuning/DPO), [QA-LoRA](python/llm/example/GPU/LLM-Finetuning/QA-LoRA) and [ReLoRA](python/llm/example/GPU/LLM-Finetuning/ReLora)).
-- [2024/01] Using `bigdl-llm` [QLoRA](python/llm/example/GPU/LLM-Finetuning/QLoRA), we managed to finetune LLaMA2-7B in **21 minutes** and LLaMA2-70B in **3.14 hours** on 8 Intel Max 1550 GPU for [Standford-Alpaca](python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora) (see the blog [here](https://www.intel.com/content/www/us/en/developer/articles/technical/finetuning-llms-on-intel-gpus-using-bigdl-llm.html)).
-- [2024/01] 🔔🔔🔔 ***The default `bigdl-llm` GPU Linux installation has switched from PyTorch 2.0 to PyTorch 2.1, which requires new oneAPI and GPU driver versions. (See the [GPU installation guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for more details.)***
-- [2023/12] `bigdl-llm` now supports [ReLoRA](python/llm/example/GPU/LLM-Finetuning/ReLora) (see *["ReLoRA: High-Rank Training Through Low-Rank Updates"](https://arxiv.org/abs/2307.05695)*).
-- [2023/12] `bigdl-llm` now supports [Mixtral-8x7B](python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral) on both Intel [GPU](python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral) and [CPU](python/llm/example/CPU/HF-Transformers-AutoModels/Model/mixtral).
-- [2023/12] `bigdl-llm` now supports [QA-LoRA](python/llm/example/GPU/LLM-Finetuning/QA-LoRA) (see *["QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models"](https://arxiv.org/abs/2309.14717)*).
-- [2023/12] `bigdl-llm` now supports [FP8 and FP4 inference](python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types) on Intel ***GPU***.
-- [2023/11] Initial support for directly loading [GGUF](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF), [AWQ](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ) and [GPTQ](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GPTQ) models into `bigdl-llm` is available.
-- [2023/11] `bigdl-llm` now supports [vLLM continuous batching](python/llm/example/GPU/vLLM-Serving) on both Intel [GPU](python/llm/example/GPU/vLLM-Serving) and [CPU](python/llm/example/CPU/vLLM-Serving).
-- [2023/10] `bigdl-llm` now supports [QLoRA finetuning](python/llm/example/GPU/LLM-Finetuning/QLoRA) on both Intel [GPU](python/llm/example/GPU/LLM-Finetuning/QLoRA) and [CPU](python/llm/example/CPU/QLoRA-FineTuning).
-- [2023/10] `bigdl-llm` now supports [FastChat serving](python/llm/src/bigdl/llm/serving) on on both Intel CPU and GPU.
-- [2023/09] `bigdl-llm` now supports [Intel GPU](python/llm/example/GPU) (including iGPU, Arc, Flex and MAX).
-- [2023/09] `bigdl-llm` [tutorial](https://github.com/intel-analytics/bigdl-llm-tutorial) is released.
-- [2023/09] Over 40 models have been optimized/verified on `bigdl-llm`, including *LLaMA/LLaMA2, ChatGLM2/ChatGLM3, Mistral, Falcon, MPT, LLaVA, WizardCoder, Dolly, Whisper, Baichuan/Baichuan2, InternLM, Skywork, QWen/Qwen-VL, Aquila, MOSS,* and more; see the complete list [here](#verified-models).
-     
-### `bigdl-llm` Demos
+- [2024/03] **LangChain** added support for `ipex-llm`; see the details [here](https://python.langchain.com/docs/integrations/llms/bigdl).
+- [2024/02] `ipex-llm` now supports directly loading model from [ModelScope](python/llm/example/GPU/ModelScope-Models) ([魔搭](python/llm/example/CPU/ModelScope-Models)).
+- [2024/02] `ipex-llm` added inital **INT2** support (based on llama.cpp [IQ2](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF-IQ2) mechanism), which makes it possible to run large-size LLM (e.g., Mixtral-8x7B) on Intel GPU with 16GB VRAM.
+- [2024/02] Users can now use `ipex-llm` through [Text-Generation-WebUI](https://github.com/intel-analytics/text-generation-webui) GUI.
+- [2024/02] `ipex-llm` now supports *[Self-Speculative Decoding](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Inference/Self_Speculative_Decoding.html)*, which in practice brings **~30% speedup** for FP16 and BF16 inference latency on Intel [GPU](python/llm/example/GPU/Speculative-Decoding) and [CPU](python/llm/example/CPU/Speculative-Decoding) respectively.
+- [2024/02] `ipex-llm` now supports a comprehensive list of LLM finetuning on Intel GPU (including [LoRA](python/llm/example/GPU/LLM-Finetuning/LoRA), [QLoRA](python/llm/example/GPU/LLM-Finetuning/QLoRA), [DPO](python/llm/example/GPU/LLM-Finetuning/DPO), [QA-LoRA](python/llm/example/GPU/LLM-Finetuning/QA-LoRA) and [ReLoRA](python/llm/example/GPU/LLM-Finetuning/ReLora)).
+- [2024/01] Using `ipex-llm` [QLoRA](python/llm/example/GPU/LLM-Finetuning/QLoRA), we managed to finetune LLaMA2-7B in **21 minutes** and LLaMA2-70B in **3.14 hours** on 8 Intel Max 1550 GPU for [Standford-Alpaca](python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora) (see the blog [here](https://www.intel.com/content/www/us/en/developer/articles/technical/finetuning-llms-on-intel-gpus-using-bigdl-llm.html)).
+- [2024/01] 🔔🔔🔔 ***The default `ipex-llm` GPU Linux installation has switched from PyTorch 2.0 to PyTorch 2.1, which requires new oneAPI and GPU driver versions. (See the [GPU installation guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for more details.)***
+- [2023/12] `ipex-llm` now supports [ReLoRA](python/llm/example/GPU/LLM-Finetuning/ReLora) (see *["ReLoRA: High-Rank Training Through Low-Rank Updates"](https://arxiv.org/abs/2307.05695)*).
+- [2023/12] `ipex-llm` now supports [Mixtral-8x7B](python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral) on both Intel [GPU](python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral) and [CPU](python/llm/example/CPU/HF-Transformers-AutoModels/Model/mixtral).
+- [2023/12] `ipex-llm` now supports [QA-LoRA](python/llm/example/GPU/LLM-Finetuning/QA-LoRA) (see *["QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models"](https://arxiv.org/abs/2309.14717)*).
+- [2023/12] `ipex-llm` now supports [FP8 and FP4 inference](python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types) on Intel ***GPU***.
+- [2023/11] Initial support for directly loading [GGUF](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF), [AWQ](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ) and [GPTQ](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GPTQ) models into `ipex-llm` is available.
+- [2023/11] `ipex-llm` now supports [vLLM continuous batching](python/llm/example/GPU/vLLM-Serving) on both Intel [GPU](python/llm/example/GPU/vLLM-Serving) and [CPU](python/llm/example/CPU/vLLM-Serving).
+- [2023/10] `ipex-llm` now supports [QLoRA finetuning](python/llm/example/GPU/LLM-Finetuning/QLoRA) on both Intel [GPU](python/llm/example/GPU/LLM-Finetuning/QLoRA) and [CPU](python/llm/example/CPU/QLoRA-FineTuning).
+- [2023/10] `ipex-llm` now supports [FastChat serving](python/llm/src/ipex_llm/llm/serving) on on both Intel CPU and GPU.
+- [2023/09] `ipex-llm` now supports [Intel GPU](python/llm/example/GPU) (including iGPU, Arc, Flex and MAX).
+- [2023/09] `ipex-llm` [tutorial](https://github.com/intel-analytics/ipex-llm-tutorial) is released.
+- [2023/09] Over 40 models have been optimized/verified on `ipex-llm`, including *LLaMA/LLaMA2, ChatGLM2/ChatGLM3, Mistral, Falcon, MPT, LLaVA, WizardCoder, Dolly, Whisper, Baichuan/Baichuan2, InternLM, Skywork, QWen/Qwen-VL, Aquila, MOSS,* and more; see the complete list [here](#verified-models).
+
+### `ipex-llm` Demos
 See the ***optimized performance*** of `chatglm2-6b` and `llama-2-13b-chat` models on 12th Gen Intel Core CPU and Intel Arc GPU below.
 
 <table width="100%">
@@ -62,11 +55,11 @@ See the ***optimized performance*** of `chatglm2-6b` and `llama-2-13b-chat` mode
   </tr>
 </table>
 
-### `bigdl-llm` quickstart
+### `ipex-llm` quickstart
 
-- [Windows GPU installation](https://bigdl.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html)
-- [Run BigDL-LLM in Text-Generation-WebUI](https://bigdl.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html)
-- [Run BigDL-LLM using Docker](docker/llm)
+- [Windows GPU installation](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html)
+- [Run IPEX-LLM in Text-Generation-WebUI](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html)
+- [Run IPEX-LLM using Docker](docker/llm)
 - [CPU INT4](#cpu-int4)
 - [GPU INT4](#gpu-int4)
 - [More Low-Bit support](#more-low-bit-support)
@@ -74,12 +67,12 @@ See the ***optimized performance*** of `chatglm2-6b` and `llama-2-13b-chat` mode
 
 #### CPU INT4
 ##### Install
-You may install **`bigdl-llm`** on Intel CPU as follows:
-> Note: See the [CPU installation guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_cpu.html) for more details.
+You may install **`ipex-llm`** on Intel CPU as follows:
+> Note: See the [CPU installation guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_cpu.html) for more details.
 ```bash
-pip install --pre --upgrade bigdl-llm[all]
+pip install --pre --upgrade ipex-llm[all]
 ```
-> Note: `bigdl-llm` has been tested on Python 3.9, 3.10 and 3.11
+> Note: `ipex-llm` has been tested on Python 3.9, 3.10 and 3.11
 
 ##### Run Model
 You may apply INT4 optimizations to any Hugging Face *Transformers* models as follows.
@@ -100,13 +93,13 @@ output = tokenizer.batch_decode(output_ids)
 
 #### GPU INT4
 ##### Install
-You may install **`bigdl-llm`** on Intel GPU as follows:
-> Note: See the [GPU installation guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for more details.
+You may install **`ipex-llm`** on Intel GPU as follows:
+> Note: See the [GPU installation guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for more details.
 ```bash
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
-> Note: `bigdl-llm` has been tested on Python 3.9, 3.10 and 3.11
+> Note: `ipex-llm` has been tested on Python 3.9, 3.10 and 3.11
 
 ##### Run Model
 You may apply INT4 optimizations to any Hugging Face *Transformers* models as follows.
@@ -130,7 +123,7 @@ output = tokenizer.batch_decode(output_ids.cpu())
 #### More Low-Bit Support
 ##### Save and load
 
-After the model is optimized using `bigdl-llm`, you may save and load the model as follows:
+After the model is optimized using `ipex-llm`, you may save and load the model as follows:
 ```python
 model.save_low_bit(model_path)
 new_model = AutoModelForCausalLM.load_low_bit(model_path)
@@ -138,7 +131,7 @@ new_model = AutoModelForCausalLM.load_low_bit(model_path)
 *See the complete example [here](python/llm/example/CPU/HF-Transformers-AutoModels/Save-Load).*
 
 ##### Additonal data types
- 
+
 In addition to INT4, You may apply other low bit optimizations (such as *INT8*, *INT5*, *NF4*, etc.) as follows: 
 ```python
 model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_low_bit="sym_int8")
@@ -146,470 +139,62 @@ model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_low_bit=
 *See the complete example [here](python/llm/example/CPU/HF-Transformers-AutoModels/More-Data-Types).*
 
 #### Verified Models
-Over 40 models have been optimized/verified on `bigdl-llm`, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, Mistral, Falcon, MPT, Baichuan/Baichuan2, InternLM, QWen* and more; see the example list below.
-  
-| Model      | CPU Example                                                    | GPU Example                                                     |
-|------------|----------------------------------------------------------------|-----------------------------------------------------------------|
-| LLaMA *(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.)* | [link1](python/llm/example/CPU/Native-Models), [link2](python/llm/example/CPU/HF-Transformers-AutoModels/Model/vicuna) |[link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/vicuna)|
-| LLaMA 2    | [link1](python/llm/example/CPU/Native-Models), [link2](python/llm/example/CPU/HF-Transformers-AutoModels/Model/llama2) | [link1](python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2), [link2-low GPU memory example](python/llm/example/GPU/PyTorch-Models/Model/llama2#example-2---low-memory-version-predict-tokens-using-generate-api) |
-| ChatGLM    | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm)   |    | 
-| ChatGLM2   | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm2)  | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2)   |
-| ChatGLM3   | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm3)  | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm3)   |
-| Mistral    | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/mistral)   | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral)    |
-| Mixtral    | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/mixtral)   | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral)    |
-| Falcon     | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/falcon)    | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon)     |
-| MPT        | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/mpt)       | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/mpt)        |
-| Dolly-v1   | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v1)  | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v1)   | 
-| Dolly-v2   | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v2)  | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v2)   | 
-| Replit Code| [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/replit)    | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/replit)     |
-| RedPajama  | [link1](python/llm/example/CPU/Native-Models), [link2](python/llm/example/CPU/HF-Transformers-AutoModels/Model/redpajama) |    | 
-| Phoenix    | [link1](python/llm/example/CPU/Native-Models), [link2](python/llm/example/CPU/HF-Transformers-AutoModels/Model/phoenix)   |    | 
-| StarCoder  | [link1](python/llm/example/CPU/Native-Models), [link2](python/llm/example/CPU/HF-Transformers-AutoModels/Model/starcoder) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder) | 
-| Baichuan   | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan)  | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan)   |
-| Baichuan2  | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan2) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2)  |
-| InternLM   | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm)  | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm)   |
-| Qwen       | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen)      | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen)       |
-| Qwen1.5 | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen1.5) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen1.5) |
-| Qwen-VL    | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen-vl)   | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen-vl)    |
-| Aquila     | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/aquila)    | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila)     |
-| Aquila2     | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/aquila2)    | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila2)     |
-| MOSS       | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/moss)      |    | 
-| Whisper    | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/whisper)   | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper)    |
-| Phi-1_5    | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/phi-1_5)   | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-1_5)    |
-| Flan-t5    | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/flan-t5)   | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/flan-t5)    |
-| LLaVA      | [link](python/llm/example/CPU/PyTorch-Models/Model/llava)                 | [link](python/llm/example/GPU/PyTorch-Models/Model/llava)                  |
-| CodeLlama  | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/codellama) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/codellama)  |
-| Skywork      | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/skywork)                 |    |
-| InternLM-XComposer  | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm-xcomposer)   |    |
-| WizardCoder-Python | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/wizardcoder-python) | |
-| CodeShell | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/codeshell) | |
-| Fuyu      | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/fuyu) | |
-| Distil-Whisper | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/distil-whisper) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/distil-whisper) |
-| Yi | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/yi) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/yi) |
-| BlueLM | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/bluelm) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/bluelm) |
-| Mamba | [link](python/llm/example/CPU/PyTorch-Models/Model/mamba) | [link](python/llm/example/GPU/PyTorch-Models/Model/mamba) |
-| SOLAR | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/solar) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/solar) |
-| Phixtral | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/phixtral) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/phixtral) |
-| InternLM2 | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm2) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm2) |
-| RWKV4 |  | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv4) |
-| RWKV5 |  | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv5) |
-| Bark | [link](python/llm/example/CPU/PyTorch-Models/Model/bark) | [link](python/llm/example/GPU/PyTorch-Models/Model/bark) |
-| SpeechT5 |  | [link](python/llm/example/GPU/PyTorch-Models/Model/speech-t5) |
-| DeepSeek-MoE | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/deepseek-moe) |  |
-| Ziya-Coding-34B-v1.0 | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/ziya) | |
-| Phi-2 | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/phi-2) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-2) |
-| Yuan2 | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/yuan2) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/yuan2) |
-| Gemma | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/gemma) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/gemma) |
-| DeciLM-7B | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/deciLM-7b) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/deciLM-7b) |
-| Deepseek | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/deepseek) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/deepseek) |
-
-
-***For more details, please refer to the `bigdl-llm` [Document](https://test-bigdl-llm.readthedocs.io/en/main/doc/LLM/index.html), [Readme](python/llm), [Tutorial](https://github.com/intel-analytics/bigdl-llm-tutorial) and [API Doc](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/LLM/index.html).***
-
----
-## Overview of the complete BigDL project
-
-BigDL seamlessly scales your data analytics & AI applications from laptop to cloud, with the following libraries:
-
-- [LLM](python/llm): Low-bit (INT3/INT4/INT5/INT8) large language model library for Intel CPU/GPU
-
-- [Orca](#orca): Distributed Big Data & AI (TF & PyTorch) Pipeline on Spark and Ray
-
-- [Nano](#nano): Transparent Acceleration of Tensorflow & PyTorch Programs on Intel CPU/GPU
-
-- [DLlib](#dllib): “Equivalent of Spark MLlib” for Deep Learning
-
-- [Chronos](#chronos): Scalable Time Series Analysis using AutoML
-
-- [Friesian](#friesian): End-to-End Recommendation Systems
-
-- [PPML](#ppml): Secure Big Data and AI (with SGX/TDX Hardware Security)
-
-For more information, you may [read the docs](https://bigdl.readthedocs.io/).
-
----
-
-## Choosing the right BigDL library
-```mermaid
-flowchart TD;
-    Feature1{{HW Secured Big Data & AI?}};
-    Feature1-- No -->Feature2{{Python vs. Scala/Java?}};
-    Feature1-- "Yes"  -->ReferPPML([<em><strong>PPML</strong></em>]);
-    Feature2-- Python -->Feature3{{What type of application?}};
-    Feature2-- Scala/Java -->ReferDLlib([<em><strong>DLlib</strong></em>]);
-    Feature3-- "Large Language Model" -->ReferLLM([<em><strong>LLM</strong></em>]);
-    Feature3-- "Big Data + AI (TF/PyTorch)" -->ReferOrca([<em><strong>Orca</strong></em>]);
-    Feature3-- Accelerate TensorFlow / PyTorch -->ReferNano([<em><strong>Nano</strong></em>]);
-    Feature3-- DL for Spark MLlib -->ReferDLlib2([<em><strong>DLlib</strong></em>]);
-    Feature3-- High Level App Framework -->Feature4{{Domain?}};
-    Feature4-- Time Series -->ReferChronos([<em><strong>Chronos</strong></em>]);
-    Feature4-- Recommender System -->ReferFriesian([<em><strong>Friesian</strong></em>]);
-    
-    click ReferLLM "https://github.com/intel-analytics/bigdl/tree/main/python/llm"
-    click ReferNano "https://github.com/intel-analytics/bigdl#nano"
-    click ReferOrca "https://github.com/intel-analytics/bigdl#orca"
-    click ReferDLlib "https://github.com/intel-analytics/bigdl#dllib"
-    click ReferDLlib2 "https://github.com/intel-analytics/bigdl#dllib"
-    click ReferChronos "https://github.com/intel-analytics/bigdl#chronos"
-    click ReferFriesian "https://github.com/intel-analytics/bigdl#friesian"
-    click ReferPPML "https://github.com/intel-analytics/bigdl#ppml"
-    
-    classDef ReferStyle1 fill:#5099ce,stroke:#5099ce;
-    classDef Feature fill:#FFF,stroke:#08409c,stroke-width:1px;
-    class ReferLLM,ReferNano,ReferOrca,ReferDLlib,ReferDLlib2,ReferChronos,ReferFriesian,ReferPPML ReferStyle1;
-    class Feature1,Feature2,Feature3,Feature4,Feature5,Feature6,Feature7 Feature;
-    
-```
----
-## Installing
-
- - To install BigDL, we recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/)  environment:
-
-    ```bash
-    conda create -n my_env 
-    conda activate my_env
-    pip install bigdl
-    ```
-    To install latest nightly build, use `pip install --pre --upgrade bigdl`; see [Python](https://bigdl.readthedocs.io/en/latest/doc/UserGuide/python.html) and [Scala](https://bigdl.readthedocs.io/en/latest/doc/UserGuide/scala.html) user guide for more details.
-
- - To install each individual library, such as Chronos, use `pip install bigdl-chronos`; see the [document website](https://bigdl.readthedocs.io/) for more details.
----
-
-## Getting Started
-### Orca
-
-- The _Orca_ library seamlessly scales out your single node **TensorFlow**, **PyTorch** or **OpenVINO** programs across large clusters (so as to process distributed Big Data).
-
-  <details><summary>Show Orca example</summary>
-  <br/>
-
-  You can build end-to-end, distributed data processing & AI programs using _Orca_ in 4 simple steps:
-
-  ```python
-  # 1. Initilize Orca Context (to run your program on K8s, YARN or local laptop)
-  from bigdl.orca import init_orca_context, OrcaContext
-  sc = init_orca_context(cluster_mode="k8s", cores=4, memory="10g", num_nodes=2) 
-
-  # 2. Perform distribtued data processing (supporting Spark DataFrames,
-  # TensorFlow Dataset, PyTorch DataLoader, Ray Dataset, Pandas, Pillow, etc.)
-  spark = OrcaContext.get_spark_session()
-  df = spark.read.parquet(file_path)
-  df = df.withColumn('label', df.label-1)
-  ...
-
-  # 3. Build deep learning models using standard framework APIs
-  # (supporting TensorFlow, PyTorch, Keras, OpenVino, etc.)
-  from tensorflow import keras
-  ...
-  model = keras.models.Model(inputs=[user, item], outputs=predictions)  
-  model.compile(...)
-
-  # 4. Use Orca Estimator for distributed training/inference
-  from bigdl.orca.learn.tf.estimator import Estimator
-  est = Estimator.from_keras(keras_model=model)  
-  est.fit(data=df,
-          feature_cols=['user', 'item'],
-          label_cols=['label'],
-          ...)
-  ```
-
-  </details> 
-
-  *See Orca [user guide](https://bigdl.readthedocs.io/en/latest/doc/Orca/Overview/orca.html), as well as [TensorFlow](https://bigdl.readthedocs.io/en/latest/doc/Orca/Howto/tf2keras-quickstart.html) and [PyTorch](https://bigdl.readthedocs.io/en/latest/doc/Orca/Howto/pytorch-quickstart.html) quickstarts, for more details.*
-
-- In addition, you can also run standard **Ray** programs on Spark cluster using _**RayOnSpark**_ in Orca.
-
-  <details><summary>Show RayOnSpark example</summary>
-  <br/>
-  
-  You can not only run Ray program on Spark cluster, but also write Ray code inline with Spark code (so as to process the in-memory Spark RDDs or DataFrames) using _RayOnSpark_ in Orca.
- 
-  ```python
-  # 1. Initilize Orca Context (to run your program on K8s, YARN or local laptop)
-  from bigdl.orca import init_orca_context, OrcaContext
-  sc = init_orca_context(cluster_mode="yarn", cores=4, memory="10g", num_nodes=2, init_ray_on_spark=True) 
-
-  # 2. Distribtued data processing using Spark
-  spark = OrcaContext.get_spark_session()
-  df = spark.read.parquet(file_path).withColumn(...)
-  
-  # 3. Convert Spark DataFrame to Ray Dataset
-  from bigdl.orca.data import spark_df_to_ray_dataset
-  dataset = spark_df_to_ray_dataset(df)
-  
-  # 4. Use Ray to operate on Ray Datasets
-  import ray
-
-  @ray.remote
-  def consume(data) -> int:
-     num_batches = 0
-     for batch in data.iter_batches(batch_size=10):
-         num_batches += 1
-     return num_batches
-
-  print(ray.get(consume.remote(dataset)))
-  ```
-
-  </details>  
-  
-  *See RayOnSpark [user guide](https://bigdl.readthedocs.io/en/latest/doc/Orca/Overview/ray.html) and [quickstart](https://bigdl.readthedocs.io/en/latest/doc/Orca/Howto/ray-quickstart.html) for more details.*
-### Nano
-You can transparently accelerate your TensorFlow or PyTorch programs on your laptop or server using *Nano*. With minimum code changes, *Nano* automatically applies modern CPU optimizations (e.g., SIMD,  multiprocessing, low precision, etc.) to standard TensorFlow and PyTorch code, with up-to 10x speedup.
-
-<details><summary>Show Nano inference example</summary>
-<br/>
-
-You can automatically optimize a trained PyTorch model for inference or deployment using _Nano_:
-
-```python
-model = ResNet18().load_state_dict(...)
-train_dataloader = ...
-val_dataloader = ...
-def accuracy (pred, target):
-  ... 
-
-from bigdl.nano.pytorch import InferenceOptimizer
-optimizer = InferenceOptimizer()
-optimizer.optimize(model,
-                   training_data=train_dataloader,
-                   validation_data=val_dataloader,
-                   metric=accuracy)
-new_model, config = optimizer.get_best_model()
-
-optimizer.summary()
-```
-The output of `optimizer.summary()` will be something like:
-```
- -------------------------------- ---------------------- -------------- ----------------------
-|             method             |        status        | latency(ms)  |     metric value     |
- -------------------------------- ---------------------- -------------- ----------------------
-|            original            |      successful      |    45.145    |        0.975         |
-|              bf16              |      successful      |    27.549    |        0.975         |
-|          static_int8           |      successful      |    11.339    |        0.975         |
-|         jit_fp32_ipex          |      successful      |    40.618    |        0.975*        |
-|  jit_fp32_ipex_channels_last   |      successful      |    19.247    |        0.975*        |
-|         jit_bf16_ipex          |      successful      |    10.149    |        0.975         |
-|  jit_bf16_ipex_channels_last   |      successful      |    9.782     |        0.975         |
-|         openvino_fp32          |      successful      |    22.721    |        0.975*        |
-|         openvino_int8          |      successful      |    5.846     |        0.962         |
-|        onnxruntime_fp32        |      successful      |    20.838    |        0.975*        |
-|    onnxruntime_int8_qlinear    |      successful      |    7.123     |        0.981         |
- -------------------------------- ---------------------- -------------- ----------------------
-* means we assume the metric value of the traced model does not change, so we don't recompute metric value to save time.
-Optimization cost 60.8s in total.
-```
-
-</details>
-
-<details><summary>Show Nano Training example</summary>
-<br/>
-You may easily accelerate PyTorch training (e.g., IPEX, BF16, Multi-Instance Training, etc.) using Nano:
-
-```python
-model = ResNet18()
-optimizer = torch.optim.SGD(...)
-train_loader = ...
-val_loader = ...
-
-from bigdl.nano.pytorch import TorchNano
-
-# Define your training loop inside `TorchNano.train`
-class Trainer(TorchNano):
-	def train(self):
-	# call `setup` to prepare for model, optimizer(s) and dataloader(s) for accelerated training
-	model, optimizer, (train_loader, val_loader) = self.setup(model, optimizer,
-  train_loader, val_loader)
-  
-    for epoch in range(num_epochs):  
-      model.train()  
-      for data, target in train_loader:  
-        optimizer.zero_grad()  
-        output = model(data)  
-        # replace the loss.backward() with self.backward(loss)  
-        loss = loss_fuc(output, target)  
-        self.backward(loss)  
-        optimizer.step()   
-
-# Accelerated training (IPEX, BF16 and Multi-Instance Training)
-Trainer(use_ipex=True, precision='bf16', num_processes=2).train()
-```
-
-</details>  
-
-*See Nano [user guide](https://bigdl.readthedocs.io/en/latest/doc/Nano/Overview/nano.html) and [tutotial](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial) for more details.*
-    
-### DLlib
-
-With _DLlib_, you can write distributed deep learning applications as standard (**Scala** or **Python**) Spark programs, using the same **Spark DataFrames** and **ML Pipeline** APIs.
-
-<details><summary>Show DLlib Scala example</summary>
-<br/>
-
-You can build distributed deep learning applications for Spark using *DLlib* Scala APIs in 3 simple steps:
-
-```scala
-// 1. Call `initNNContext` at the beginning of the code: 
-import com.intel.analytics.bigdl.dllib.NNContext
-val sc = NNContext.initNNContext()
-
-// 2. Define the deep learning model using Keras-style API in DLlib:
-import com.intel.analytics.bigdl.dllib.keras.layers._
-import com.intel.analytics.bigdl.dllib.keras.Model
-val input = Input[Float](inputShape = Shape(10))  
-val dense = Dense[Float](12).inputs(input)  
-val output = Activation[Float]("softmax").inputs(dense)  
-val model = Model(input, output)
-
-// 3. Use `NNEstimator` to train/predict/evaluate the model using Spark DataFrame and ML pipeline APIs
-import org.apache.spark.sql.SparkSession
-import org.apache.spark.ml.feature.MinMaxScaler
-import org.apache.spark.ml.Pipeline
-import com.intel.analytics.bigdl.dllib.nnframes.NNEstimator
-import com.intel.analytics.bigdl.dllib.nn.CrossEntropyCriterion
-import com.intel.analytics.bigdl.dllib.optim.Adam
-val spark = SparkSession.builder().getOrCreate()
-val trainDF = spark.read.parquet("train_data")
-val validationDF = spark.read.parquet("val_data")
-val scaler = new MinMaxScaler().setInputCol("in").setOutputCol("value")
-val estimator = NNEstimator(model, CrossEntropyCriterion())  
-        .setBatchSize(128).setOptimMethod(new Adam()).setMaxEpoch(5)
-val pipeline = new Pipeline().setStages(Array(scaler, estimator))
-
-val pipelineModel = pipeline.fit(trainDF)  
-val predictions = pipelineModel.transform(validationDF)
-```
-
-</details>
-
-<details><summary>Show DLlib Python example</summary>
-<br/>
-
-You can build distributed deep learning applications for Spark using *DLlib* Python APIs in 3 simple steps:
-
-```python
-# 1. Call `init_nncontext` at the beginning of the code:
-from bigdl.dllib.nncontext import init_nncontext
-sc = init_nncontext()
-
-# 2. Define the deep learning model using Keras-style API in DLlib:
-from bigdl.dllib.keras.layers import Input, Dense, Activation
-from bigdl.dllib.keras.models import Model
-input = Input(shape=(10,))
-dense = Dense(12)(input)
-output = Activation("softmax")(dense)
-model = Model(input, output)
-
-# 3. Use `NNEstimator` to train/predict/evaluate the model using Spark DataFrame and ML pipeline APIs
-from pyspark.sql import SparkSession
-from pyspark.ml.feature import MinMaxScaler
-from pyspark.ml import Pipeline
-from bigdl.dllib.nnframes import NNEstimator
-from bigdl.dllib.nn.criterion import CrossEntropyCriterion
-from bigdl.dllib.optim.optimizer import Adam
-spark = SparkSession.builder.getOrCreate()
-train_df = spark.read.parquet("train_data")
-validation_df = spark.read.parquet("val_data")
-scaler = MinMaxScaler().setInputCol("in").setOutputCol("value")
-estimator = NNEstimator(model, CrossEntropyCriterion())\
-    .setBatchSize(128)\
-    .setOptimMethod(Adam())\
-    .setMaxEpoch(5)
-pipeline = Pipeline(stages=[scaler, estimator])
-
-pipelineModel = pipeline.fit(train_df)
-predictions = pipelineModel.transform(validation_df)
-```
-
-</details>
-
-*See DLlib [NNFrames](https://bigdl.readthedocs.io/en/latest/doc/DLlib/Overview/nnframes.html) and [Keras API](https://bigdl.readthedocs.io/en/latest/doc/DLlib/Overview/keras-api.html) user guides for more details.*
-
-### Chronos
-
-The *Chronos* library makes it easy to build fast, accurate and scalable **time series analysis** applications (with AutoML).
-
-<details><summary>Show Chronos example</summary>
-<br/>
-
-You can train a time series forecaster using _Chronos_ in 3 simple steps:
-
-```python
-from bigdl.chronos.forecaster import TCNForecaster 
-from bigdl.chronos.data.repo_dataset import get_public_dataset
-
-# 1. Process time series data using `TSDataset`
-tsdata_train, tsdata_val, tsdata_test = get_public_dataset(name='nyc_taxi')
-for tsdata in [tsdata_train, tsdata_val, tsdata_test]:
-    data.roll(lookback=100, horizon=1)
-
-# 2. Create a `TCNForecaster` (automatically configured based on train_data)
-forecaster = TCNForecaster.from_tsdataset(train_data)
-
-# 3. Train the forecaster for prediction
-forecaster.fit(train_data)
-
-pred = forecaster.predict(test_data)
-```
-
-To apply AutoML, use `AutoTSEstimator` instead of normal forecasters.
-```python
-# Create and fit an `AutoTSEstimator`
-from bigdl.chronos.autots import AutoTSEstimator
-autotsest = AutoTSEstimator(model="tcn", future_seq_len=10)
-
-tsppl = autotsest.fit(data=tsdata_train, validation_data=tsdata_val)
-pred = tsppl.predict(tsdata_test)
-```
-
-</details>  
-
-*See Chronos [user guide](https://bigdl.readthedocs.io/en/latest/doc/Chronos/index.html) and [quick start](https://bigdl.readthedocs.io/en/latest/doc/Chronos/QuickStart/chronos-autotsest-quickstart.html) for more details.*
-
-### Friesian
-The *Friesian* library makes it easy to build end-to-end, large-scale **recommedation system** (including *offline* feature transformation and traning, *near-line* feature and model update, and *online* serving pipeline). 
-
-*See Freisian [readme](https://github.com/intel-analytics/BigDL/blob/main/python/friesian/README.md) for more details.* 
-
-### PPML
-
-*BigDL PPML* provides a **hardware (Intel SGX) protected** *Trusted Cluster Environment* for running distributed Big Data & AI applications (in a secure fashion on private or public cloud). 
-
-*See PPML [user guide](https://bigdl.readthedocs.io/en/latest/doc/PPML/Overview/ppml.html) and [tutorial](https://github.com/intel-analytics/BigDL/blob/main/ppml/README.md) for more details.* 
-
-## Getting Support
-
-- [Mail List](mailto:bigdl-user-group+subscribe@googlegroups.com)
-- [User Group](https://groups.google.com/forum/#!forum/bigdl-user-group)
-- [Github Issues](https://github.com/intel-analytics/BigDL/issues)
----
-
-## Citation
-
-If you've found BigDL useful for your project, you may cite our papers as follows:
-
-- *[BigDL 2.0](https://arxiv.org/abs/2204.01715): Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster*
-  ```
-  @INPROCEEDINGS{9880257,
-      title={BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster}, 
-      author={Dai, Jason Jinquan and Ding, Ding and Shi, Dongjie and Huang, Shengsheng and Wang, Jiao and Qiu, Xin and Huang, Kai and Song, Guoqiong and Wang, Yang and Gong, Qiyuan and Song, Jiaming and Yu, Shan and Zheng, Le and Chen, Yina and Deng, Junwei and Song, Ge},
-      booktitle={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, 
-      year={2022},
-      pages={21407-21414},
-      doi={10.1109/CVPR52688.2022.02076}
-  }
-  ```
-
-[^1]: Performance varies by use, configuration and other factors. `bigdl-llm` may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex.
-
-- *[BigDL](https://arxiv.org/abs/1804.05839): A Distributed Deep Learning Framework for Big Data*
-  ```
-  @INPROCEEDINGS{10.1145/3357223.3362707,
-      title = {BigDL: A Distributed Deep Learning Framework for Big Data},
-      author = {Dai, Jason Jinquan and Wang, Yiheng and Qiu, Xin and Ding, Ding and Zhang, Yao and Wang, Yanzhang and Jia, Xianyan and Zhang, Cherry Li and Wan, Yan and Li, Zhichao and Wang, Jiao and Huang, Shengsheng and Wu, Zhongyuan and Wang, Yang and Yang, Yuhao and She, Bowen and Shi, Dongjie and Lu, Qi and Huang, Kai and Song, Guoqiong},
-      booktitle = {Proceedings of the ACM Symposium on Cloud Computing (SoCC)},
-      year = {2019},
-      pages = {50–60},
-      doi = {10.1145/3357223.3362707}
-  }
-  ```
-  
+Over 40 models have been optimized/verified on `ipex-llm`, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, Mistral, Falcon, MPT, Baichuan/Baichuan2, InternLM, QWen* and more; see the example list below.
+
+| Model                                    | CPU Example                              | GPU Example                              |
+| ---------------------------------------- | ---------------------------------------- | ---------------------------------------- |
+| LLaMA *(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.)* | [link1](python/llm/example/CPU/Native-Models), [link2](python/llm/example/CPU/HF-Transformers-AutoModels/Model/vicuna) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/vicuna) |
+| LLaMA 2                                  | [link1](python/llm/example/CPU/Native-Models), [link2](python/llm/example/CPU/HF-Transformers-AutoModels/Model/llama2) | [link1](python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2), [link2-low GPU memory example](python/llm/example/GPU/PyTorch-Models/Model/llama2#example-2---low-memory-version-predict-tokens-using-generate-api) |
+| ChatGLM                                  | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm) |                                          |
+| ChatGLM2                                 | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm2) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2) |
+| ChatGLM3                                 | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm3) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm3) |
+| Mistral                                  | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/mistral) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral) |
+| Mixtral                                  | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/mixtral) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral) |
+| Falcon                                   | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/falcon) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon) |
+| MPT                                      | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/mpt) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/mpt) |
+| Dolly-v1                                 | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v1) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v1) |
+| Dolly-v2                                 | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v2) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v2) |
+| Replit Code                              | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/replit) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/replit) |
+| RedPajama                                | [link1](python/llm/example/CPU/Native-Models), [link2](python/llm/example/CPU/HF-Transformers-AutoModels/Model/redpajama) |                                          |
+| Phoenix                                  | [link1](python/llm/example/CPU/Native-Models), [link2](python/llm/example/CPU/HF-Transformers-AutoModels/Model/phoenix) |                                          |
+| StarCoder                                | [link1](python/llm/example/CPU/Native-Models), [link2](python/llm/example/CPU/HF-Transformers-AutoModels/Model/starcoder) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder) |
+| Baichuan                                 | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan) |
+| Baichuan2                                | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan2) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2) |
+| InternLM                                 | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm) |
+| Qwen                                     | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen) |
+| Qwen1.5                                  | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen1.5) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen1.5) |
+| Qwen-VL                                  | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen-vl) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen-vl) |
+| Aquila                                   | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/aquila) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila) |
+| Aquila2                                  | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/aquila2) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila2) |
+| MOSS                                     | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/moss) |                                          |
+| Whisper                                  | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/whisper) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper) |
+| Phi-1_5                                  | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/phi-1_5) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-1_5) |
+| Flan-t5                                  | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/flan-t5) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/flan-t5) |
+| LLaVA                                    | [link](python/llm/example/CPU/PyTorch-Models/Model/llava) | [link](python/llm/example/GPU/PyTorch-Models/Model/llava) |
+| CodeLlama                                | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/codellama) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/codellama) |
+| Skywork                                  | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/skywork) |                                          |
+| InternLM-XComposer                       | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm-xcomposer) |                                          |
+| WizardCoder-Python                       | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/wizardcoder-python) |                                          |
+| CodeShell                                | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/codeshell) |                                          |
+| Fuyu                                     | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/fuyu) |                                          |
+| Distil-Whisper                           | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/distil-whisper) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/distil-whisper) |
+| Yi                                       | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/yi) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/yi) |
+| BlueLM                                   | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/bluelm) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/bluelm) |
+| Mamba                                    | [link](python/llm/example/CPU/PyTorch-Models/Model/mamba) | [link](python/llm/example/GPU/PyTorch-Models/Model/mamba) |
+| SOLAR                                    | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/solar) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/solar) |
+| Phixtral                                 | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/phixtral) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/phixtral) |
+| InternLM2                                | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm2) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm2) |
+| RWKV4                                    |                                          | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv4) |
+| RWKV5                                    |                                          | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv5) |
+| Bark                                     | [link](python/llm/example/CPU/PyTorch-Models/Model/bark) | [link](python/llm/example/GPU/PyTorch-Models/Model/bark) |
+| SpeechT5                                 |                                          | [link](python/llm/example/GPU/PyTorch-Models/Model/speech-t5) |
+| DeepSeek-MoE                             | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/deepseek-moe) |                                          |
+| Ziya-Coding-34B-v1.0                     | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/ziya) |                                          |
+| Phi-2                                    | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/phi-2) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-2) |
+| Yuan2                                    | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/yuan2) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/yuan2) |
+| Gemma                                    | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/gemma) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/gemma) |
+| DeciLM-7B                                | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/deciLM-7b) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/deciLM-7b) |
+| Deepseek                                 | [link](python/llm/example/CPU/HF-Transformers-AutoModels/Model/deepseek) | [link](python/llm/example/GPU/HF-Transformers-AutoModels/Model/deepseek) |
+
+
+***For more details, please refer to the `ipex-llm` [Document](https://test-ipex-llm.readthedocs.io/en/main/doc/LLM/index.html), [Readme](python/llm), [Tutorial](https://github.com/intel-analytics/ipex-llm-tutorial) and [API Doc](https://ipex-llm.readthedocs.io/en/latest/doc/PythonAPI/LLM/index.html).***
diff --git a/docs/readthedocs/source/doc/Application/blogs.md b/docs/readthedocs/source/doc/Application/blogs.md
deleted file mode 100644
index 783bc202..00000000
--- a/docs/readthedocs/source/doc/Application/blogs.md
+++ /dev/null
@@ -1,49 +0,0 @@
-Blogs
----
-**2023**
-- [Large-scale Offline Book Recommendation with BigDL at Dangdang.com](https://www.intel.com/content/www/us/en/developer/articles/technical/dangdang-offline-recommendation-service-with-bigdl.html)
-
-**2022**
-- [Optimized Large-Scale Item Search with Intel BigDL at Yahoo! JAPAN Shopping](https://www.intel.com/content/www/us/en/developer/articles/technical/offline-item-search-with-bigdl-at-yahoo-japan.html)
-- [Tencent Trusted Computing Solution on SGX with Intel BigDL PPML](https://www.intel.com/content/www/us/en/developer/articles/technical/tencent-trusted-computing-solution-with-bigdl-ppml.html)
-- [BigDL Privacy Preserving Machine Learning with Occlum OSS on Azure Confidential Computing](https://techcommunity.microsoft.com/t5/azure-confidential-computing/bigdl-privacy-preserving-machine-learning-with-occlum-oss-on/ba-p/3658667)
-- ["AI at Scale" in Mastercard with BigDL](https://www.intel.com/content/www/us/en/developer/articles/technical/ai-at-scale-in-mastercard-with-bigdl.html)
-- [BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster](https://arxiv.org/abs/2204.01715)
-- [Project Bose: A smart way to enable sustainable 5G networks in Capgemini](https://www.capgemini.com/insights/expert-perspectives/project-bose-a-smart-way-to-enable-sustainable-5g-networks/)
-- [Intelligent Power Prediction Solution in Goldwind](https://www.intel.com/content/www/us/en/customer-spotlight/stories/goldwind-customer-story.html)
-- [5G Core Network Power Saving using BigDL Chronos Framework in China Unicom](https://www.intel.cn/content/www/cn/zh/customer-spotlight/cases/china-unicom-bigdl-chronos-framework-5gc.html) (in Chinese)
-
-**2021**
-- [From Ray to Chronos: Build end-to-end AI use cases using BigDL on top of Ray](https://www.anyscale.com/blog/from-ray-to-chronos-build-end-to-end-ai-use-cases-using-bigdl-on-top-of-ray)
-- [Scalable AutoXGBoost Using Analytics Zoo AutoML](https://medium.com/intel-analytics-software/scalable-autoxgboost-using-analytics-zoo-automl-30d576cb138a)
-- [Intelligent 5G L2 MAC Scheduler: Powered by Capgemini NetAnticipate 5G on Intel Architecture](https://networkbuilders.intel.com/solutionslibrary/intelligent-5g-l2-mac-scheduler-powered-by-capgemini-netanticipate-5g-on-intel-architecture)
-- [Better Together: Privacy-Preserving Machine Learning Powered by Intel SGX and Intel DL Boost](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/alibaba-privacy-preserving-machine-learning.html)
-
-**2020**
-- [SK Telecom, Intel Build AI Pipeline to Improve Network Quality](https://networkbuilders.intel.com/solutionslibrary/sk-telecom-intel-build-ai-pipeline-to-improve-network-quality)
-- [Build End-to-End AI Pipelines Using Ray and Apache Spark](https://medium.com/distributed-computing-with-ray/build-end-to-end-ai-pipeline-using-ray-and-apache-spark-23f70f36115e)
-- [Tencent Cloud Leverages Analytics Zoo to Improve Performance of TI-ONE ML Platform](https://www.intel.com/content/www/us/en/developer/articles/technical/tencent-cloud-leverages-analytics-zoo-to-improve-performance-of-ti-one-ml-platform.html)
-- [Context-Aware Fast Food Recommendation at Burger King with RayOnSpark](https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king-with-rayonspark-2e7a6009dd2d)
-- [Seamlessly Scaling AI for Distributed Big Data](https://medium.com/swlh/seamlessly-scaling-ai-for-distributed-big-data-5b589ead2434)
-- [Distributed Inference Made Easy with Analytics Zoo Cluster Serving](https://www.intel.com/content/www/us/en/developer/articles/technical/distributed-inference-made-easy-with-analytics-zoo-cluster-serving.html)
-
-**2019**
-- [BigDL: A Distributed Deep-Learning Framework for Big Data](https://arxiv.org/abs/1804.05839)
-- [Scalable AutoML for Time-Series Prediction Using Ray and BigDL & Analytics Zoo](https://medium.com/riselab/scalable-automl-for-time-series-prediction-using-ray-and-analytics-zoo-b79a6fd08139)
-- [RayOnSpark: Run Emerging AI Applications on Big Data Clusters with Ray and BigDL & Analytics Zoo](https://medium.com/riselab/rayonspark-running-emerging-ai-applications-on-big-data-clusters-with-ray-and-analytics-zoo-923e0136ed6a)
-- [Real-time Product Recommendations for Office Depot Using Apache Spark and Analytics Zoo on AWS](https://www.intel.com/content/www/us/en/developer/articles/technical/real-time-product-recommendations-for-office-depot-using-apache-spark-and-analytics-zoo-on.html)
-- [Machine Learning Pipelines for High Energy Physics Using Apache Spark with BigDL and Analytics Zoo](https://db-blog.web.cern.ch/blog/luca-canali/machine-learning-pipelines-high-energy-physics-using-apache-spark-bigdl)
-- [Deep Learning with Analytic Zoo Optimizes Mastercard Recommender AI Service](https://www.intel.com/content/www/us/en/developer/articles/technical/deep-learning-with-analytic-zoo-optimizes-mastercard-recommender-ai-service.html)
-- [Using Intel Analytics Zoo to Inject AI into Customer Service Platform (Part II)](https://www.infoq.com/articles/analytics-zoo-qa-module/)
-- [Talroo Uses Analytics Zoo and AWS to Leverage Deep Learning for Job Recommendations](https://www.intel.com/content/www/us/en/developer/articles/technical/talroo-uses-analytics-zoo-and-aws-to-leverage-deep-learning-for-job-recommendations.html)
-
-**2018**
-- [Analytics Zoo: Unified Analytics + AI Platform for Distributed Tensorflow, and BigDL on Apache Spark](https://www.infoq.com/articles/analytics-zoo/)
-- [Industrial Inspection Platform in Midea and KUKA: Using Distributed TensorFlow on Analytics Zoo](https://www.intel.com/content/www/us/en/developer/articles/technical/industrial-inspection-platform-in-midea-and-kuka-using-distributed-tensorflow-on-analytics.html)
-- [Use Analytics Zoo to Inject AI Into Customer Service Platforms on Microsoft Azure](https://www.intel.com/content/www/us/en/developer/articles/technical/use-analytics-zoo-to-inject-ai-into-customer-service-platforms-on-microsoft-azure-part-1.html)
-- [LSTM-Based Time Series Anomaly Detection Using Analytics Zoo for Apache Spark and BigDL at Baosight](https://www.intel.com/content/www/us/en/developer/articles/technical/lstm-based-time-series-anomaly-detection-using-analytics-zoo-for-apache-spark-and-bigdl.html)
-
-**2017**
-- [Accelerating Deep-Learning Training with BigDL and Drizzle on Apache Spark](https://rise.cs.berkeley.edu/blog/accelerating-deep-learning-training-with-bigdl-and-drizzle-on-apache-spark)
-- [Using BigDL to Build Image Similarity-Based House Recommendations](https://www.intel.com/content/www/us/en/developer/articles/technical/using-bigdl-to-build-image-similarity-based-house-recommendations.html)
-- [Building Large-Scale Image Feature Extraction with BigDL at JD.com](https://www.intel.com/content/www/us/en/developer/articles/technical/building-large-scale-image-feature-extraction-with-bigdl-at-jdcom.html)
diff --git a/docs/readthedocs/source/doc/Application/index.rst b/docs/readthedocs/source/doc/Application/index.rst
deleted file mode 100644
index 7ec694eb..00000000
--- a/docs/readthedocs/source/doc/Application/index.rst
+++ /dev/null
@@ -1,2 +0,0 @@
-Real-World Application
-=========================
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Application/powered-by.md b/docs/readthedocs/source/doc/Application/powered-by.md
deleted file mode 100644
index 61c6b7a4..00000000
--- a/docs/readthedocs/source/doc/Application/powered-by.md
+++ /dev/null
@@ -1,93 +0,0 @@
-# Powered By
----
-
-* __Alibaba__
-  <br>• [Alibaba Cloud and Intel synergize BigDL PPML and Alibaba Cloud Data Trust to protect E2E privacy of AI and big data](https://www.intel.com/content/www/us/en/customer-spotlight/stories/alibaba-cloud-ppml-customer-story.html)
-  <br>• [Better Together: Alibaba Cloud Realtime Compute and Distributed AI Inference](https://www.intel.cn/content/dam/www/central-libraries/cn/zh/documents/better-together-alibaba-cloud-realtime-compute-and-distibuted-ai-inference.pdf) (in Chinese)
-  <br>• [Better Together: Privacy-Preserving Machine Learning](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/alibaba-privacy-preserving-machine-learning.html)
-* __AsiaInfo__
-  <br>• [AsiaInfo Technology Leverages Hardware and Software Products and Technologies to Create New Intelligent Energy Saving Solutions for 5G Cloud Based Base Station Products](https://www.intel.cn/content/www/cn/zh/communications/asiainfo-create-intelligent-energy-saving-solution.html) (in Chinese)
-  <br>• [Network AI Applications using BigDL and oneAPI toolkit on Intel Xeon](https://www.intel.cn/content/www/cn/zh/customer-spotlight/cases/asiainfo-taps-intelligent-network-applications.html)
-* __Baosight__
-  <br>• [LSTM-Based Time Series Anomaly Detection Using Analytics Zoo for Apache Spark and BigDL at Baosight](https://www.intel.com/content/www/us/en/developer/articles/technical/lstm-based-time-series-anomaly-detection-using-analytics-zoo-for-apache-spark-and-bigdl.html)
-* __BBVA__
-  <br>• [A Graph Convolutional Network Implementation](https://emartinezs44.medium.com/graph-convolutions-networks-ad8295b3ce57)
-* __Burger King__
-  <br>• [Context-Aware Fast Food Recommendation at Burger King with RayOnSpark](https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king-with-rayonspark-2e7a6009dd2d)
-  <br>• [How Intel and Burger King built an order recommendation system that preserves customer privacy](https://venturebeat.com/2021/04/06/how-intel-and-burger-king-built-an-order-recommendation-system-that-preserves-customer-privacy/)
-  <br>• [Burger King: Context-Aware Recommendations (video)](https://www.intel.com/content/www/us/en/customer-spotlight/stories/burger-king-ai-customer-story.html)
-* __Capgemini__
-<br>• [Project Bose: A smart way to enable sustainable 5G networks in Capgemini](https://www.capgemini.com/insights/expert-perspectives/project-bose-a-smart-way-to-enable-sustainable-5g-networks/)
-<br>• [Intelligent 5G L2 MAC Scheduler: Powered by Capgemini NetAnticipate 5G on Intel Architecture](https://networkbuilders.intel.com/solutionslibrary/intelligent-5g-l2-mac-scheduler-powered-by-capgemini-netanticipate-5g-on-intel-architecture)
-* __China Unicom__
-  <br>• [China Unicom Data Center Energy Saving and Emissions Reduction with Intel Intelligent Energy Management](https://www.intel.com/content/www/us/en/content-details/768821/china-unicom-data-center-energy-saving-and-emissions-reduction-with-intel-intelligent-energy-management.html)
-  <br>• [Cloud Data Center Power Saving using BigDL Chronos in China Unicom](https://www.intel.cn/content/www/cn/zh/customer-spotlight/cases/china-unicom-bigdl-chronos-framework-5gc.html)
-* __CERN__
- <br>• [Deep Learning Pipelines for High Energy Physics using Apache Spark with Distributed Keras on Analytics Zoo](https://databricks.com/session_eu19/deep-learning-pipelines-for-high-energy-physics-using-apache-spark-with-distributed-keras-on-analytics-zoo)
- <br>• [Topology classification at CERN's Large Hadron Collider using Analytics Zoo](https://db-blog.web.cern.ch/blog/luca-canali/machine-learning-pipelines-high-energy-physics-using-apache-spark-bigdl)
- <br>• [Deep Learning on Apache Spark at CERN's Large Hadron Collider with Intel Technologies](https://databricks.com/session/deep-learning-on-apache-spark-at-cerns-large-hadron-collider-with-intel-technologies)
-* __China Telecom__
- <br>• [Face Recognition Application and Practice Based on Intel Analytics Zoo: Part 1](https://mp.weixin.qq.com/s/FEiXoTDi-yy04PJ2Mlfl4A) (in Chinese)
- <br>• [Face Recognition Application and Practice Based on Intel Analytics Zoo: Part 2](https://mp.weixin.qq.com/s/VIyWRORTAVAAsC4v6Fi0xw) (in Chinese)
-* __Cray__ 
-<br>• [A deep learning approach for precipitation nowcasting with RNN using Analytics Zoo in Cray](https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/69413)
-* __Dangdang__
- <br>• [Large-scale Offline Book Recommendation with BigDL at Dangdang.com](https://www.intel.com/content/www/us/en/developer/articles/technical/dangdang-offline-recommendation-service-with-bigdl.html)
-* __Dell EMC__
-<br>• [AI-assisted Radiology Using Distributed Deep
-Learning on Apache Spark and Analytics Zoo](https://www.dellemc.com/resources/en-us/asset/white-papers/solutions/h17686_hornet_wp.pdf)
-<br>• [Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest X-rays](https://databricks.com/session/using-deep-learning-on-apache-spark-to-diagnose-thoracic-pathology-from-chest-x-rays)
-* __GoldWind__
-<br>• [Goldwind SE: Intelligent Power Prediction Solution](https://www.intel.com/content/www/us/en/customer-spotlight/stories/goldwind-customer-story.html)
-<br>• [Intel big data analysis + AI platform helps GoldWind to build a new energy intelligent power prediction solution](https://www.intel.cn/content/www/cn/zh/analytics/artificial-intelligence/create-power-forecasting-solutions.html)
-* __Inspur__
-<br>• [Inspur’s Big Data Intelligent Computing AIO Solution Based on Intel Architecture](https://dpgresources.intel.com/asset-library/inspur-insight-big-data-platform-solution-icx-prc/)
-<br>• [Inspur E2E Smart Transportation CV application](https://jason-dai.github.io/cvpr2021/slides/Inspur%20E2E%20Smart%20Transportation%20CV%20application%20-CVPR21.pdf)
-<br>• [Inspur End-to-End Smart Computing Solution with Intel Analytics Zoo](https://dpgresources.intel.com/asset-library/inspur-end-to-end-smart-computing-solution-with-intel-analytics-zoo/)
-* __JD__
-<br>• [Object Detection and Image Feature Extraction at JD.com](https://software.intel.com/en-us/articles/building-large-scale-image-feature-extraction-with-bigdl-at-jdcom)
-* __MasterCard__
-<br>• ["AI at Scale" in Mastercard with BigDL](https://www.intel.com/content/www/us/en/developer/articles/technical/ai-at-scale-in-mastercard-with-bigdl0.html)
-<br>• [Deep Learning with Analytic Zoo Optimizes Mastercard Recommender AI Service](https://www.intel.com/content/www/us/en/developer/articles/technical/deep-learning-with-analytic-zoo-optimizes-mastercard-recommender-ai-service.html)
-* __Microsoft Azure__
-<br>• [Use Analytics Zoo to Inject AI Into Customer Service Platforms on Microsoft Azure: Part 1](https://www.intel.com/content/www/us/en/developer/articles/technical/use-analytics-zoo-to-inject-ai-into-customer-service-platforms-on-microsoft-azure-part-1.html)
-<br>• [Use Analytics Zoo to Inject AI Into Customer Service Platforms on Microsoft Azure: Part 2](https://www.infoq.com/articles/analytics-zoo-qa-module/?from=timeline&isappinstalled=0)
-* __Midea__
-<br>• [Industrial Inspection Platform in Midea and KUKA: Using Distributed TensorFlow on Analytics Zoo](https://www.intel.com/content/www/us/en/developer/articles/technical/industrial-inspection-platform-in-midea-and-kuka-using-distributed-tensorflow-on-analytics.html) 
-<br>• [Ability to add "eyes" and "brains" to smart manufacturing](https://www.intel.cn/content/www/cn/zh/analytics/artificial-intelligence/midea-case-study.html) (in Chinese)
-* __MLSListings__
-<br>• [Image Similarity-Based House Recommendations and Search](https://www.intel.com/content/www/us/en/developer/articles/technical/using-bigdl-to-build-image-similarity-based-house-recommendations.html)
-* __NeuSoft/BMW__
-<br>• [Neusoft RealSight APM partners with Intel to create an application performance management platform with active defense capabilities](https://platform.neusoft.com/2020/01/17/xw-intel.html) (in Chinese)
-* __NeuSoft/Mazda__
-<br>• [JD, Neusoft and Intel Jointly Building Intelligent and Connected Vehicle Cloud for HaiMa(former Hainan Mazda)](https://www.neusoft.com/Products/Platforms/2472/4735110231.html)
-<br>• [JD, Neusoft and Intel Jointly Building Intelligent and Connected Vehicle Cloud for Hainan-Mazda](https://platform.neusoft.com/2020/06/11/jjfa-haimaqiche.html) (in Chinese)
-* __Office Depot__
-<br>• [Real-time Product Recommendations for Office Depot Using Apache Spark and Analytics Zoo on AWS](https://www.intel.com/content/www/us/en/developer/articles/technical/real-time-product-recommendations-for-office-depot-using-apache-spark-and-analytics-zoo-on.html)
-<br>• [Office Depot product recommender using Analytics Zoo on AWS](https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/73079)
-* __SK Telecom__
-<br>• [Reference Architecture for Confidential Computing on SKT 5G MEC](https://networkbuilders.intel.com/solutionslibrary/reference-architecture-for-confidential-computing-on-skt-5g-mec)
-<br>• [SK Telecom, Intel Build AI Pipeline to Improve Network Quality](https://networkbuilders.intel.com/solutionslibrary/sk-telecom-intel-build-ai-pipeline-to-improve-network-quality)
-<br>• [Vectorized Deep Learning Acceleration from Preprocessing to Inference and Training on Apache Spark in SK Telecom](https://databricks.com/session_na20/vectorized-deep-learning-acceleration-from-preprocessing-to-inference-and-training-on-apache-spark-in-sk-telecom)
-<br>• [Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction with Geospatial Visualization](https://databricks.com/session_eu19/apache-spark-ai-use-case-in-telco-network-quality-analysis-and-prediction-with-geospatial-visualization)
- * __Talroo__
-<br>• [Uses Analytics Zoo and AWS to Leverage Deep Learning for Job Recommendations](https://www.intel.com/content/www/us/en/developer/articles/technical/talroo-uses-analytics-zoo-and-aws-to-leverage-deep-learning-for-job-recommendations.html)
-<br>• [Job recommendations leveraging deep learning using Analytics Zoo on Apache Spark and BigDL](https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/69113)
-* __Telefonica__
- <br>• [Running Analytics Zoo jobs on Telefónica Open Cloud’s MRS Service](https://medium.com/@fernando.delaiglesia/running-analytics-zoo-jobs-on-telef%C3%B3nica-open-clouds-mrs-service-2e64bc823c50)
-* __Tencent__
-<br>• [Tencent Trusted Computing Solution on SGX with Intel BigDL PPML](https://www.intel.com/content/www/us/en/developer/articles/technical/tencent-trusted-computing-solution-with-bigdl-ppml.html)
-<br>• [Analytics Zoo helps Tencent Cloud improve the performance of its intelligent titanium machine learning platform](https://www.intel.cn/content/www/cn/zh/service-providers/analytics-zoo-helps-tencent-cloud-improve-ti-ml-platform-performance.html)
-<br>• [Tencent Cloud Leverages Analytics Zoo to Improve Performance of TI-ONE ML Platform](https://software.intel.com/content/www/us/en/develop/articles/tencent-cloud-leverages-analytics-zoo-to-improve-performance-of-ti-one-ml-platform.html)
-<br>• [Enhance Tencent's TUSI Identity Practice with Intel Analytics Zoo](https://mp.weixin.qq.com/s?__biz=MzAwNzc5NzM5Mw==&mid=2651030944&idx=1&sn=d6e06c6e14a7355971953a501689b232&chksm=808f8a5eb7f80348fc8e88c4c9e415341bf43ef6bdf3fd4f3001da89e2c9ba7fa2ed5deeb09a&mpshare=1&scene=1&srcid=0412WxM3eWdsLLoO2TYJGWbS&pass_ticket=E6l%2FfOZNKjhr05lsU7inAVCi7mAy5LFEehvEJOS2ZGdHg6%2FH%2BeBQisHA9sfXDOoy#rd) (in Chinese)
-* __UC Berkeley RISELab__
-<br>• [RayOnSpark: Running Emerging AI Applications on Big Data Clusters with Ray and Analytics Zoo](https://medium.com/riselab/rayonspark-running-emerging-ai-applications-on-big-data-clusters-with-ray-and-analytics-zoo-923e0136ed6a)
-<br>• [Scalable AutoML for Time Series Prediction Using Ray and Analytics Zoo](https://medium.com/riselab/scalable-automl-for-time-series-prediction-using-ray-and-analytics-zoo-b79a6fd08139)
-* __UnionPay__
-<br>• [Technical Verification of SGX and BigDL Based Privacy Computing for Multi Source Financial Big Data](https://www.intel.cn/content/www/cn/zh/now/data-centric/sgx-bigdl-financial-big-data.html) (in Chinese)
-* __World Bank__
-<br>• [Using Crowdsourced Images to Create Image Recognition Models with Analytics Zoo using BigDL](https://databricks.com/session/using-crowdsourced-images-to-create-image-recognition-models-with-bigdl)
-*  __Yahoo! JAPAN__
-<br>• [Optimized Large-Scale Item Search with Intel BigDL at Yahoo! JAPAN Shopping](https://www.intel.com/content/www/us/en/developer/articles/technical/offline-item-search-with-bigdl-at-yahoo-japan.html)
-*  __Yunda__
-<br>• [Intelligent transformation brings "quality change" to the express delivery industry](https://www.intel.cn/content/www/cn/zh/analytics/artificial-intelligence/yunda-brings-quality-change-to-the-express-delivery-industry.html) (in Chinese)
diff --git a/docs/readthedocs/source/doc/Application/presentations.md b/docs/readthedocs/source/doc/Application/presentations.md
deleted file mode 100644
index 2a87e6f5..00000000
--- a/docs/readthedocs/source/doc/Application/presentations.md
+++ /dev/null
@@ -1,99 +0,0 @@
-# Presentations
----
-
-**Tutorial:**
-- Seamlessly Scaling out Big Data AI on Ray and Apache Spark, [CVPR 2021](https://cvpr2021.thecvf.com/program) [tutorial](https://jason-dai.github.io/cvpr2021/), June 2021 ([slides](https://jason-dai.github.io/cvpr2021/slides/End-to-End%20Big%20Data%20AI%20Pipeline%20using%20Analytics%20Zoo%20-%20CVPR21.pdf))
-
-- Automated Machine Learning Workflow for Distributed Big Data Using Analytics Zoo,  [CVPR 2020](https://cvpr2020.thecvf.com/program/tutorials) [tutorial](https://jason-dai.github.io/cvpr2020/), June 2020 ([slides](https://jason-dai.github.io/cvpr2020/slides/AIonBigData_cvpr20.pdf))
-
-- Building Deep Learning Applications for Big Data, [AAAI 2019]( https://aaai.org/Conferences/AAAI-19/aaai19tutorials/#sp2) [tutorial](https://jason-dai.github.io/aaai2019/), January 2019 ([slides](https://jason-dai.github.io/aaai2019/slides/AI%20on%20Big%20Data%20(Jason%20Dai).pdf))
-
-- Analytics Zoo: Distributed TensorFlow and Keras on Apache Spark, [AI conference](https://conferences.oreilly.com/artificial-intelligence/ai-ca-2019/public/schedule/detail/77069), Sep 2019, San Jose ([slides](https://github.com/analytics-zoo/analytics-zoo.github.io/blob/master/presentations/Tutorial%20Analytics%20ZOO.pdf))
-
-- Building Deep Learning Applications on Big Data Platforms, [CVPR 2018](https://cvpr2018.thecvf.com/) [tutorial](https://jason-dai.github.io/cvpr2018/), June 2018 ([slides](https://jason-dai.github.io/cvpr2018/slides/BigData_DL_Jason-CVPR.pdf))
-
-**Talks:**
-- BigDL 2.0: Seamlessly scaling end-to-end AI pipelines, [Ray Summit 2022](https://www.anyscale.com/ray-summit-2022/agenda/sessions/174), August 2022 ([slides](https://github.com/analytics-zoo/analytics-zoo.github.io/blob/master/presentations/BigDL-2.0-Seamlessly-scaling-end-to-end-AI-pipelines.pdf))
-
-- Exploration on Confidential Computing for Big Data & AI, [oneAPI DevSummit for AI 2022](https://www.oneapi.io/event-sessions/exploration-on-confidential-computing-for-big-data-ai-ai-2022/), July 2022 ([slides](https://simplecore.intel.com/oneapi-io/wp-content/uploads/sites/98/Qiyuan-Gong-and-Chunyang-Hui-Exploration-on-Confidential-Computing-for-Big-Data-AI.pdf))
-
-- Privacy Preserving Machine Learning and Big Data Analytics Using Apache Spark, [Data + AI Summit 2022](https://www.databricks.com/dataaisummit/session/privacy-preserving-machine-learning-and-big-data-analytics-using-apache-spark), June 2022 ([slides](https://microsites.databricks.com/sites/default/files/2022-07/Privacy-Preserving-Machine-Learning-and-Big-Data-Analytics-Using-Apache-Spark.pdf))
-
-- E2E Smart Transportation CV application in Inspur (using Insight Data-Intelligence platform), [CVPR 2021](https://jason-dai.github.io/cvpr2021/), July 2021 ([slides](https://jason-dai.github.io/cvpr2021/slides/Inspur%20E2E%20Smart%20Transportation%20CV%20application%20-CVPR21.pdf))
-
-- Mobile Order Click-Through Rate (CTR) Recommendation with Ray on Apache Spark at Burger King, [Ray Summit 2021](https://www.anyscale.com/events/2021/06/22/mobile-order-click-through-rate-ctr-recommendation-with-ray-on-apache-spark-at-burger-king), June 2021 ([slides](https://files.speakerdeck.com/presentations/1870110b5adf4bfc8f0c76255a417f09/Kai_Huang_and_Luyang_Wang.pdf))
-
-- Deep Reinforcement Learning Recommenders using RayOnSpark, *Data + AI Summit 2021*, May 2021 ([slides](https://github.com/analytics-zoo/analytics-zoo.github.io/blob/master/presentations/210527DeepReinforcementLearningRecommendersUsingRayOnSpark2.pdf))
-
-- Cluster Serving: Deep Learning Model Serving for Big Data, *Data + AI Summit 2021*, May 2021 ([slides](https://github.com/analytics-zoo/analytics-zoo.github.io/blob/master/presentations/210526Cluster-Serving.pdf))
-
-- Offer Recommendation System with Apache Spark at Burger King, [Data + AI Summit 2021](https://databricks.com/session_na21/offer-recommendation-system-with-apache-spark-at-burger-king), May 2021 ([slides](https://github.com/analytics-zoo/analytics-zoo.github.io/blob/master/presentations/20210526Offer%20Recommendation.pdf))
-
-- Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King, [Data + AI Summit Europe 2020](https://databricks.com/session_eu20/context-aware-fast-food-recommendation-with-ray-on-apache-spark-at-burger-king), November 2020 ([slides](https://github.com/analytics-zoo/analytics-zoo.github.io/blob/master/presentations/1118%20Context-aware%20Fast%20Food%20Recommendation%20with%20Ray%20on%20Apache%20Spark%20at%20Burger%20King.pdf))
-
-- Cluster Serving: Distributed Model Inference using Apache Flink in Analytics Zoo, [Flink Forward 2020](https://www.flink-forward.org/global-2020/conference-program#cluster-serving--distributed-model-inference-using-apache-flink-in-analytics-zoo), October 2020 ([slides](https://github.com/analytics-zoo/analytics-zoo.github.io/blob/master/presentations/1020%20Cluster%20Serving%20Distributed%20Model%20Inference%20using%20Apache%20Flink%20in%20Analytics%20Zoo%20.pdf))
-
-- Project Zouwu: Scalable AutoML for Telco Time Series Analysis using Ray and Analytics Zoo, [Ray Summit Connect 2020](https://anyscale.com/blog/videos-and-slides-for-the-fourth-ray-summit-connect-august-12-2020/), August 2020 ([slides](https://anyscale.com/wp-content/uploads/2020/08/Ding-Ding-Connect-slides.pdf))
-
-- Cluster Serving: Distributed Model Inference using Big Data Streaming in Analytics Zoo, [OpML 2020](https://www.usenix.org/conference/opml20/presentation/song), July 2020 ([slides](https://www.usenix.org/sites/default/files/conference/protected-files/opml20_talks_43_slides_song.pdf))
-
-- Scalable AutoML for Time Series Forecasting using Ray, [OpML 2020](https://www.usenix.org/conference/opml20/presentation/huang), July 2020 ([slides](https://www.usenix.org/sites/default/files/conference/protected-files/opml20_talks_84_slides_huang.pdf))
-
-- Scalable AutoML for Time Series Forecasting using Ray, [Spark + AI Summit 2020](https://databricks.com/session_na20/scalable-automl-for-time-series-forecasting-using-ray), June 2020 ([slides](https://www.slideshare.net/databricks/scalable-automl-for-time-series-forecasting-using-ray))
-
-- Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark, [Spark + AI Summit 2020](https://databricks.com/session_na20/running-emerging-ai-applications-on-big-data-platforms-with-ray-on-apache-spark), June 2020 ([slides](https://www.slideshare.net/databricks/running-emerging-ai-applications-on-big-data-platforms-with-ray-on-apache-spark))
-
-- Vectorized Deep Learning Acceleration from Preprocessing to Inference and Training on Apache Spark in SK Telecom, [Spark + AI Summit 2020](https://databricks.com/session_na20/vectorized-deep-learning-acceleration-from-preprocessing-to-inference-and-training-on-apache-spark-in-sk-telecom), June 2020 ([slides](https://www.slideshare.net/databricks/vectorized-deep-learning-acceleration-from-preprocessing-to-inference-and-training-on-apache-spark-in-sk-telecom?from_action=save))
-
-- Architecture and practice of big data analysis and deep learning model inference using Analytics Zoo on Flink, [Flink Forward Asia 2019](https://developer.aliyun.com/special/ffa2019-conference?spm=a2c6h.13239638.0.0.21f27955PCNMUB#), Nov 2019, Beijing ([slides](https://github.com/analytics-zoo/analytics-zoo.github.io/blob/master/presentations/Architecture%20and%20practice%20of%20big%20data%20analysis%20and%20deep%20learning%20model%20inference%20using%20Analytics%20Zoo%20on%20Flink(FFA2019)%20.pdf))
-
-- Data analysis + AI platform technology and case studies, [AICon BJ 2019](https://aicon.infoq.cn/2019/beijing/), Nov 2019, Beijing ([slides](https://github.com/analytics-zoo/analytics-zoo.github.io/blob/master/presentations/AICON%20AZ%20Cluster%20Serving%20Beijing%20Qiyuan_v5.pdf))
-
-- Architectural practices for building a unified big data AI application with Analytics-Zoo, [QCon SH 2019](https://qcon.infoq.cn/2019/shanghai/presentation/1921), Oct 2019, Shanghai ([slides](https://github.com/analytics-zoo/analytics-zoo.github.io/blob/master/presentations/Architectural%20practices%20for%20building%20a%20unified%20big%20data%20AI%20application%20with%20Analytics-Zoo.pdf))
-
-- Building AI to play the FIFA video game using distributed TensorFlow, [TensorFlow World](https://conferences.oreilly.com/tensorflow/tf-ca/public/schedule/detail/78309), Oct 2019, Santa Clara ([slides](https://github.com/analytics-zoo/analytics-zoo.github.io/blob/master/presentations/Building%20AI%20to%20play%20the%20FIFA%20video%20game%20using%20distributed%20TensorFlow.pdf))
-
-- Deep Learning Pipelines for High Energy Physics using Apache Spark with Distributed Keras on Analytics Zoo, [Spark+AI Summit](https://databricks.com/session_eu19/deep-learning-pipelines-for-high-energy-physics-using-apache-spark-with-distributed-keras-on-analytics-zoo), Oct 2019, Amsterdam ([slides](https://www.slideshare.net/databricks/deep-learning-pipelines-for-high-energy-physics-using-apache-spark-with-distributed-keras-on-analytics-zoo))
-
-- Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction with Geospatial Visualization, [Spark+AI Summit](https://databricks.com/session_eu19/apache-spark-ai-use-case-in-telco-network-quality-analysis-and-prediction-with-geospatial-visualization), Oct 2019, Amsterdam ([slides](https://www.slideshare.net/databricks/apache-spark-ai-use-case-in-telco-network-quality-analysis-and-prediction-with-geospatial-visualization))
-
-- LSTM-based time series anomaly detection using Analytics Zoo for Spark and BigDL, [Strata Data conference](https://conferences.oreilly.com/strata/strata-eu/public/schedule/detail/74077), May 2019, London ([slides](https://cdn.oreillystatic.com/en/assets/1/event/292/LSTM-based%20time%20series%20anomaly%20detection%20using%20Analytics%20Zoo%20for%20Spark%20and%20BigDL%20Presentation.pptx))
-
-- Game Playing Using AI on Apache Spark, [Spark+AI Summit](https://databricks.com/session/game-playing-using-ai-on-apache-spark), April 2019, San Francisco ([slides](https://github.com/analytics-zoo/analytics-zoo.github.io/blob/master/presentations/game-playing-using-ai-on-apache-spark.pdf))
-
-- Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest X-rays in DELL EMC, [Spark+AI Summit](https://databricks.com/session/using-deep-learning-on-apache-spark-to-diagnose-thoracic-pathology-from-chest-x-rays), April 2019, San Francisco ([slides](https://github.com/analytics-zoo/analytics-zoo.github.io/blob/master/presentations/Using%20Deep%20Learning%20on%20Apache%20Spark%20to%20diagnose%20thoracic%20pathology%20from%20.._.pdf))
-
-- Leveraging NLP and Deep Learning for Document Recommendation in the Cloud, [Spark+AI Summit](https://databricks.com/session/leveraging-nlp-and-deep-learning-for-document-recommendations-in-the-cloud), April 2019, San Francisco ([slides](https://github.com/analytics-zoo/analytics-zoo.github.io/blob/master/presentations/Leveraging%20NLP%20and%20Deep%20Learning%20for%20Document%20Recommendation%20in%20the%20Cloud.pdf))
-
-- Analytics Zoo: Distributed Tensorflow, Keras and BigDL in production on Apache Spark, [Strata Data conference](https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/72802), March 2019, San Francisco ([slides](https://github.com/analytics-zoo/analytics-zoo.github.io/blob/master/presentations/Analytics%20Zoo-Distributed%20Tensorflow%2C%20Keras%20and%20BigDL%20in%20production%20on%20Apache%20Spark.pdf))
-
-- User-based real-time product recommendations leveraging deep learning using Analytics Zoo on Apache Spark in Office Depot, [Strata Data conference](https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/73079), March 2019, San Francisco ([slides](https://github.com/analytics-zoo/analytics-zoo.github.io/blob/master/presentations/User-based%20real-time%20product%20recommendations%20leveraging%20deep%20learning%20using%20Analytics%20Zoo%20on%20Apache%20Spark%20and%20BigDL%20Presentation.pdf))
-
-- Analytics Zoo: Unifying Big Data Analytics and AI for Apache Spark, [Shanghai Apache Spark + AI meetup](https://www.meetup.com/Shanghai-Apache-Spark-AI-Meetup/events/255788956/), Nov 2018, Shanghai ([slides](https://github.com/analytics-zoo/analytics-zoo.github.io/blob/master/presentations/Analytics%20Zoo-Unifying%20Big%20Data%20Analytics%20and%20AI%20for%20Apache%20Spark.pdf))
-
-- Use Intel Analytics Zoo to build an intelligent QA Bot for Microsoft Azure, [Shanghai Apache Spark + AI meetup](https://www.meetup.com/Shanghai-Apache-Spark-AI-Meetup/events/255788956/), Nov 2018, Shanghai ([slides](https://github.com/analytics-zoo/analytics-zoo.github.io/blob/master/presentations/Use%20Intel%20Analytics%20Zoo%20to%20build%20an%20intelligent%20QA%20Bot%20for%20Microsoft%20Azure.pdf))
-
-- A deep learning approach for precipitation nowcasting with RNN using Analytics Zoo in Cray, [Strata Data conference](https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/69413), Sep 2018, New York ([slides](https://github.com/analytics-zoo/analytics-zoo.github.io/blob/master/presentations/A%20deep%20learning%20approach%20for%20precipitation%20nowcasting%20with%20RNN%20using%20Analytics%20Zoo%20on%20BigDL.pdf))
-
-- Job recommendations leveraging deep learning using Analytics Zoo on Apache Spark in Talroo, [Strata Data conference](https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/69113), Sep 2018, New York ([slides](https://cdn.oreillystatic.com/en/assets/1/event/278/Job%20recommendations%20leveraging%20deep%20learning%20using%20Analytics%20Zoo%20on%20Apache%20Spark%20and%20BigDL%20Presentation.pdf))
-
-- Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark, [Spark + AI Summit](https://databricks.com/session/accelerating-deep-learning-training-with-bigdl-and-drizzle-on-apache-spark), June 2018, San Francisco ([slides](https://github.com/analytics-zoo/analytics-zoo.github.io/blob/master/presentations/Accelerating%20deep%20learning%20on%20apache%20spark%20Using%20BigDL%20with%20coarse-grained%20scheduling.pdf))
-
-- Using Crowdsourced Images to Create Image Recognition Models with Analytics Zoo in World Bank, [Spark + AI Summit](https://databricks.com/session/using-crowdsourced-images-to-create-image-recognition-models-with-bigdl), June 2018, San Francisco ([slides](https://github.com/analytics-zoo/analytics-zoo.github.io/blob/master/presentations/Using%20Crowdsourced%20Images%20to%20Create%20Image%20Recognition%20Models%20with%20Analytics%20Zoo%20using%20BigDL.pdf))
-
-- Building Deep Reinforcement Learning Applications on Apache Spark with Analytics Zoo using BigDL, [Spark + AI Summit](https://databricks.com/session/building-deep-reinforcement-learning-applications-on-apache-spark-using-bigdl), June 2018, San Francisco ([slides](https://github.com/analytics-zoo/analytics-zoo.github.io/blob/master/presentations/Building%20Deep%20Reinforcement%20Learning%20Applications%20on%20Apache%20Spark%20with%20Analytics%20Zoo%20using%20BigDL.pdf))
-
-- Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience at Scale, [Spark + AI Summit](https://databricks.com/session/using-bigdl-on-apache-spark-to-improve-the-mls-real-estate-search-experience-at-scale), June 2018, San Francisco
-
-- Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL, [Spark + AI Summit](https://databricks.com/session/analytics-zoo-building-analytics-and-ai-pipeline-for-apache-spark-and-bigdl), June 2018, San Francisco
-
-- Using Siamese CNNs for removing duplicate entries from real estate listing databases, [Strata Data conference](https://conferences.oreilly.com/strata/strata-eu-2018/public/schedule/detail/65518), May 2018, London ([slides](https://cdn.oreillystatic.com/en/assets/1/event/267/Using%20Siamese%20CNNs%20for%20removing%20duplicate%20entries%20from%20real%20estate%20listing%20databases%20Presentation.pdf))
-
-- Classifying images on Spark in World Bank, [AI conference](https://conferences.oreilly.com/artificial-intelligence/ai-ny-2018/public/schedule/detail/64939), May 2018, New York ([slides](https://cdn.oreillystatic.com/en/assets/1/event/280/Classifying%20images%20in%20Spark%20Presentation.pdf))
-
-- Improving user-merchant propensity modeling using neural collaborative filtering and wide and deep models on Spark BigDL in Mastercard, [Strata Data conference](https://conferences.oreilly.com/strata/strata-ca-2018/public/schedule/detail/63897), March 2018, San Jose ([slides](https://cdn.oreillystatic.com/en/assets/1/event/269/Improving%20user-merchant%20propensity%20modeling%20using%20neural%20collaborative%20filtering%20and%20wide%20and%20deep%20models%20on%20Spark%20BigDL%20at%20scale%20Presentation.pdf))
-
-- Accelerating deep learning on Apache Spark using BigDL with coarse-grained scheduling, [Strata Data conference](https://conferences.oreilly.com/strata/strata-ca-2018/public/schedule/detail/63960), March 2018, San Jose ([slides](https://cdn.oreillystatic.com/en/assets/1/event/269/Accelerating%20deep%20learning%20on%20Apache%20Spark%20using%20BigDL%20with%20coarse-grained%20scheduling%20Presentation.pptx))
-
-- Automatic 3D MRI knee damage classification with 3D CNN using BigDL on Spark in UCSF, [Strata Data conference](https://conferences.oreilly.com/strata/strata-ca-2018/public/schedule/detail/64023), March 2018, San Jose ([slides](https://cdn.oreillystatic.com/en/assets/1/event/269/Automatic%203D%20MRI%20knee%20damage%20classification%20with%203D%20CNN%20using%20BigDL%20on%20Spark%20Presentation.pdf))
-
diff --git a/docs/readthedocs/source/doc/Chronos/Howto/docker_guide_single_node.md b/docs/readthedocs/source/doc/Chronos/Howto/docker_guide_single_node.md
deleted file mode 100644
index fb5933ed..00000000
--- a/docs/readthedocs/source/doc/Chronos/Howto/docker_guide_single_node.md
+++ /dev/null
@@ -1,139 +0,0 @@
-# Use Chronos in Container (docker)
-This page helps user to build and use a docker image where Chronos-nightly build version is deployed.
-
-## Download image from Docker Hub
-We provide docker image with Chronos-nightly build version deployed in [Docker Hub](https://hub.docker.com/r/intelanalytics/bigdl-chronos/tags). You can directly download it by running command:
-```bash
-docker pull intelanalytics/bigdl-chronos:latest
-```
-
-## Build an image (Optional)
-**If you have downloaded docker image, you can just skip this part and go on [Use Chronos](#use-chronos).**
-
-First clone the repo `BigDL` to the local.
-```bash
-git clone https://github.com/intel-analytics/BigDL.git
-```
-Then `cd` to the root directory of `BigDL`, and copy the Dockerfile to it. 
-```bash
-cd BigDL
-cp docker/chronos-nightly/Dockerfile ./Dockerfile
-```
-When building image, you can specify some build args to install chronos with necessary dependencies according to your own needs.
-The build args are similar to the install options in [Chronos documentation](https://bigdl.readthedocs.io/en/latest/doc/Chronos/Overview/install.html).
-
-```
-model: which model or framework you want. 
-       value: pytorch
-              tensorflow
-              prophet
-              arima
-              ml (default, for machine learning models).
-
-auto_tuning: whether to enable auto tuning.
-             value: y (for yes)
-                    n (default, for no).
-
-hardware: run chronos on a single machine or a cluster.
-          value: single (default)
-                 cluster
-
-inference: whether to install dependencies for inference optimization (e.g. onnx, openvino, ...).
-           value: y (for yes)
-                  n (default, for no)
-
-extra_dep: whether to install some extra dependencies.
-           value: y (for yes)
-                  n (default, for no)
-           if specified to y, the following dependencies will be installed:
-           tsfresh, pyarrow, prometheus_pandas, xgboost, jupyter, matplotlib
-```
-
-If you want to build image with the default options, you can simply use the following command:
-```bash
-sudo docker build -t intelanalytics/bigdl-chronos:latest . # You may choose any NAME:TAG you want.
-```
-
-You can also build with other options by specifying the build args:
-```bash
-sudo docker build \
-    --build-arg model=pytorch \
-    --build-arg auto_tuning=y \
-    --build-arg hardware=single \
-    --build-arg inference=n \
-    --build-arg extra_dep=n \
-     -t intelanalytics/bigdl-chronos:latest . # You may choose any NAME:TAG you want.
-```
-
-(Optional) If you need a proxy, you can add two additional build args to specify it:
-```bash
-# typically, you need a proxy for building since there will be some downloading.
-sudo docker build \
-    --build-arg http_proxy=http://<your_proxy_ip>:<your_proxy_port> \ #optional
-    --build-arg https_proxy=http://<your_proxy_ip>:<your_proxy_port> \ #optional
-    -t intelanalytics/bigdl-chronos:latest . # You may choose any NAME:TAG you want.
-```
-According to your network status, this building will cost **15-30 mins**. 
-
-**Tips:** When errors happen like `failed: Connection timed out.`, it's usually related to the bad network status. Please build with a proxy.
-
-## Run the image
-```bash
-sudo docker run -it --rm --net=host intelanalytics/bigdl-chronos:latest bash
-```
-
-## Use Chronos
-A conda environment is created for you automatically. `bigdl-chronos` and the necessary depenencies (based on the build args when you build image) are installed inside this environment.
-```bash
-(chronos) root@icx-5:/opt/work# 
-```
-```eval_rst
-.. important::
-
-       Considering the image size, we build docker image with the default args and upload it to Docker Hub. If you use it directly, only ``bigdl-chronos`` is installed inside this environment. There are two methods to install other necessary dependencies according to your own needs:
-
-       1. Make sure network is available and run install command following `Install using Conda <https://bigdl.readthedocs.io/en/latest/doc/Chronos/Overview/install.html#install-using-conda>`_ , such as ``pip install --pre --upgrade bigdl-chronos[pytorch]``.
-
-       2. Make sure network is available and bash ``/opt/install-python-env.sh`` with build args. The values are introduced in `Build an image <#build-an-image-optional>`_.
-
-       .. code-block:: python
-
-              # bash /opt/install-python-env.sh ${model} ${auto_tuning} ${hardware} ${inference} ${extra_dep}
-              # For example, if you want to install bigdl-chronos[pytorch,inference]
-              bash /opt/install-python-env.sh pytorch n single y n
-
-```
-
-## Run unittest examples on Jupyter Notebook for a quick use
-> Note: To use jupyter notebook, you need to specify the build arg `extra_dep` to `y`.
-
-You can run these on Jupyter Notebook on single node server if you pursue a quick use on Chronos.
-```bash
-(chronos) root@icx-5:/opt/work# cd /opt/work/colab-notebook #Unittest examples are here.
-```
-```bash
-(chronos) root@icx-5:/opt/work/colab-notebook# jupyter notebook --notebook-dir=./ --ip=* --allow-root #Start the Jupyter Notebook services.
-```
-After the Jupyter Notebook service is successfully started, you can connect to the Jupyter Notebook service from a browser.
-1. Get the IP address of the container
-2. Launch a browser, and connect to the Jupyter Notebook service with the URL: 
-</br>`https://container-ip-address:port-number/?token=your-token`
-</br>As a result, you will see the Jupyter Notebook opened.
-3. Open one of these `.ipynb` files, run through the example and learn how to use Chronos to predict time series.
-
-## Shut down docker container
-You should shut down the BigDL Docker container after using it.
-1. First, use `ctrl+p+q` to quit the container when you are still in it. 
-2. Then, you can list all the active Docker containers by command line:
-   ```bash
-   sudo docker ps
-   ```
-   You will see your docker containers:
-   ```bash
-   CONTAINER ID   IMAGE                                 COMMAND   CREATED       STATUS       PORTS     NAMES
-   ef133bd732d1   intelanalytics/bigdl-chronos:latest   "bash"    2 hours ago   Up 2 hours             happy_babbage
-   ```
-3. Shut down the corresponding docker container by its ID:
-   ```bash
-   sudo docker rm -f ef133bd732d1
-   ```
diff --git a/docs/readthedocs/source/doc/Chronos/Howto/how_to_choose_forecasting_alg.md b/docs/readthedocs/source/doc/Chronos/Howto/how_to_choose_forecasting_alg.md
deleted file mode 100644
index a1b5fa4f..00000000
--- a/docs/readthedocs/source/doc/Chronos/Howto/how_to_choose_forecasting_alg.md
+++ /dev/null
@@ -1,48 +0,0 @@
-# Choose proper forecasting model
-
-How to choose a forecasting model among so many built-in models (or build one by yourself) in Chronos? That's a common question when users want to build their first forecasting model. Different forecasting models are more suitable for different data and different metrics(accuracy or performances).
-
-The flowchart below is designed to guide our users which forecasting model to try on your own data. Click on the blocks in the chart below to see its documentation/examples.
-
-```eval_rst
-.. note::
-
-    Following flowchart may need some time to load.
-```
-
-
-```eval_rst
-.. mermaid::
-
-    flowchart TD
-        StartPoint[I want to build a forecasting model]
-        StartPoint-- always start from --> TCN[TCNForecaster]
-        TCN -- performance is not satisfying --> TCN_OPT[Make sure optimizations are deploied]
-        TCN_OPT -- further performance improvement is needed --> SER[Performance-awared Hyperparameter Optimization]
-        SER -- only 1 step to be predicted --> LSTMForecaster
-        SER -- only 1 var to be predicted --> NBeatsForecaster
-        LSTMForecaster -- does not work --> CUS[customized model]
-        NBeatsForecaster -- does not work --> CUS[customized model]
-
-        TCN -- accuracy is not satisfying --> Tune[Hyperparameter Optimization]
-        Tune -- only 1 step to be predicted --> LSTMForecaster2[LSTMForecaster]
-        LSTMForecaster2 -- does not work --> AutoformerForecaster
-        Tune -- more than 1 step to be predicted --> AutoformerForecaster
-        AutoformerForecaster -- does not work --> Seq2SeqForecaster
-        Seq2SeqForecaster -- does not work --> CUS[customized model]
-
-        click TCN "https://bigdl.readthedocs.io/en/latest/doc/Chronos/Overview/forecasting.html#tcnforecaster"
-        click LSTMForecaster "https://bigdl.readthedocs.io/en/latest/doc/Chronos/Overview/forecasting.html#lstmforecaster"
-        click LSTMForecaster2 "https://bigdl.readthedocs.io/en/latest/doc/Chronos/Overview/forecasting.html#lstmforecaster"
-        click NBeatsForecaster "https://bigdl.readthedocs.io/en/latest/doc/Chronos/Overview/forecasting.html#nbeatsforecaster"
-        click Seq2SeqForecaster "https://bigdl.readthedocs.io/en/latest/doc/Chronos/Overview/forecasting.html#seq2seqforecaster"
-        click AutoformerForecaster "https://bigdl.readthedocs.io/en/latest/doc/Chronos/Overview/forecasting.html#AutoformerForecaster"
-
-        click TCN_OPT "https://bigdl.readthedocs.io/en/latest/doc/Chronos/Overview/speed_up.html"
-        click SER "https://github.com/intel-analytics/BigDL/blob/main/python/chronos/example/hpo/muti_objective_hpo_with_builtin_latency_tutorial.ipynb"
-        click Tune "https://bigdl.readthedocs.io/en/latest/doc/Chronos/Howto/how_to_tune_forecaster_model.html"
-        click CUS "https://bigdl.readthedocs.io/en/latest/doc/Chronos/Overview/speed_up.html"
-
-        classDef Model fill:#FFF,stroke:#0f29ba,stroke-width:1px;
-        class TCN,LSTMForecaster,NBeatsForecaster,LSTMForecaster2,AutoformerForecaster,Seq2SeqForecaster Model;
-```
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Howto/how_to_create_forecaster.nblink b/docs/readthedocs/source/doc/Chronos/Howto/how_to_create_forecaster.nblink
deleted file mode 100644
index 6a1c5320..00000000
--- a/docs/readthedocs/source/doc/Chronos/Howto/how_to_create_forecaster.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../python/chronos/colab-notebook/howto/how-to-create-forecaster.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Howto/how_to_evaluate_a_forecaster.nblink b/docs/readthedocs/source/doc/Chronos/Howto/how_to_evaluate_a_forecaster.nblink
deleted file mode 100644
index 917ed6ec..00000000
--- a/docs/readthedocs/source/doc/Chronos/Howto/how_to_evaluate_a_forecaster.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../python/chronos/colab-notebook/howto/how_to_evaluate_a_forecaster.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Howto/how_to_export_data_processing_pipeline_to_torchscript.nblink b/docs/readthedocs/source/doc/Chronos/Howto/how_to_export_data_processing_pipeline_to_torchscript.nblink
deleted file mode 100644
index eadc2331..00000000
--- a/docs/readthedocs/source/doc/Chronos/Howto/how_to_export_data_processing_pipeline_to_torchscript.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../python/chronos/colab-notebook/howto/how_to_export_data_processing_pipeline_to_torchscript.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Howto/how_to_export_onnx_files.nblink b/docs/readthedocs/source/doc/Chronos/Howto/how_to_export_onnx_files.nblink
deleted file mode 100644
index 744723e3..00000000
--- a/docs/readthedocs/source/doc/Chronos/Howto/how_to_export_onnx_files.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../python/chronos/colab-notebook/howto/how_to_export_onnx_files.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Howto/how_to_export_openvino_files.nblink b/docs/readthedocs/source/doc/Chronos/Howto/how_to_export_openvino_files.nblink
deleted file mode 100644
index a139b146..00000000
--- a/docs/readthedocs/source/doc/Chronos/Howto/how_to_export_openvino_files.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../python/chronos/colab-notebook/howto/how_to_export_openvino_files.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Howto/how_to_export_torchscript_files.nblink b/docs/readthedocs/source/doc/Chronos/Howto/how_to_export_torchscript_files.nblink
deleted file mode 100644
index a4deeb6e..00000000
--- a/docs/readthedocs/source/doc/Chronos/Howto/how_to_export_torchscript_files.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../python/chronos/colab-notebook/howto/how_to_export_torchscript_files.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Howto/how_to_generate_confidence_interval_for_prediction.nblink b/docs/readthedocs/source/doc/Chronos/Howto/how_to_generate_confidence_interval_for_prediction.nblink
deleted file mode 100644
index 21a5df68..00000000
--- a/docs/readthedocs/source/doc/Chronos/Howto/how_to_generate_confidence_interval_for_prediction.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../python/chronos/colab-notebook/howto/how_to_generate_confidence_interval_for_prediction.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Howto/how_to_optimize_a_forecaster.nblink b/docs/readthedocs/source/doc/Chronos/Howto/how_to_optimize_a_forecaster.nblink
deleted file mode 100644
index 7785f137..00000000
--- a/docs/readthedocs/source/doc/Chronos/Howto/how_to_optimize_a_forecaster.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../python/chronos/colab-notebook/howto/how_to_optimize_a_forecaster.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Howto/how_to_preprocess_my_data.nblink b/docs/readthedocs/source/doc/Chronos/Howto/how_to_preprocess_my_data.nblink
deleted file mode 100644
index 6a0cef76..00000000
--- a/docs/readthedocs/source/doc/Chronos/Howto/how_to_preprocess_my_data.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../python/chronos/colab-notebook/howto/how_to_preprocess_my_data.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Howto/how_to_process_data_in_production_environment.nblink b/docs/readthedocs/source/doc/Chronos/Howto/how_to_process_data_in_production_environment.nblink
deleted file mode 100644
index 50c5564c..00000000
--- a/docs/readthedocs/source/doc/Chronos/Howto/how_to_process_data_in_production_environment.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../python/chronos/colab-notebook/howto/how_to_process_data_in_production_environment.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Howto/how_to_save_and_load_forecaster.nblink b/docs/readthedocs/source/doc/Chronos/Howto/how_to_save_and_load_forecaster.nblink
deleted file mode 100644
index cf0b97af..00000000
--- a/docs/readthedocs/source/doc/Chronos/Howto/how_to_save_and_load_forecaster.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../python/chronos/colab-notebook/howto/how_to_save_and_load_forecaster.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Howto/how_to_speedup_inference_of_forecaster_through_ONNXRuntime.nblink b/docs/readthedocs/source/doc/Chronos/Howto/how_to_speedup_inference_of_forecaster_through_ONNXRuntime.nblink
deleted file mode 100644
index 3c6a4e9c..00000000
--- a/docs/readthedocs/source/doc/Chronos/Howto/how_to_speedup_inference_of_forecaster_through_ONNXRuntime.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../python/chronos/colab-notebook/howto/how_to_speedup_inference_of_forecaster_through_ONNXRuntime.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Howto/how_to_speedup_inference_of_forecaster_through_OpenVINO.nblink b/docs/readthedocs/source/doc/Chronos/Howto/how_to_speedup_inference_of_forecaster_through_OpenVINO.nblink
deleted file mode 100644
index 32cb876c..00000000
--- a/docs/readthedocs/source/doc/Chronos/Howto/how_to_speedup_inference_of_forecaster_through_OpenVINO.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../python/chronos/colab-notebook/howto/how_to_speedup_inference_of_forecaster_through_OpenVINO.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Howto/how_to_train_forecaster_on_one_node.nblink b/docs/readthedocs/source/doc/Chronos/Howto/how_to_train_forecaster_on_one_node.nblink
deleted file mode 100644
index cf39d394..00000000
--- a/docs/readthedocs/source/doc/Chronos/Howto/how_to_train_forecaster_on_one_node.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../python/chronos/colab-notebook/howto/how_to_train_forecaster_on_one_node.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Howto/how_to_tune_forecaster_model.nblink b/docs/readthedocs/source/doc/Chronos/Howto/how_to_tune_forecaster_model.nblink
deleted file mode 100644
index 10d6ab10..00000000
--- a/docs/readthedocs/source/doc/Chronos/Howto/how_to_tune_forecaster_model.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../python/chronos/colab-notebook/howto/how_to_tune_forecaster_model.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Howto/how_to_use_benchmark_tool.md b/docs/readthedocs/source/doc/Chronos/Howto/how_to_use_benchmark_tool.md
deleted file mode 100644
index 87032714..00000000
--- a/docs/readthedocs/source/doc/Chronos/Howto/how_to_use_benchmark_tool.md
+++ /dev/null
@@ -1,174 +0,0 @@
-# Use Chronos benchmark tool
-This page demonstrates how to use Chronos benchmark tool to benchmark forecasting performance on platforms.
-
-## Basic Usage
-The benchmark tool is installed automatically when `bigdl-chronos` is installed. To get information about performance (currently for forecasting only) on the your own machine.
-
-Run benchmark tool with default options using following command:
-```bash
-benchmark-chronos -l 96 -o 720
-```
-```eval_rst
-.. note::
-    **Required Options**:
-
-     ``-l/--lookback`` and ``-o/--horizon`` are required options for Chronos benchmark tool. Use ``-l/--lookback`` to specify the history time steps while use ``-o/--horizon`` to specify the output time steps. For more details, please refer to `here <https://bigdl.readthedocs.io/en/latest/doc/Chronos/Overview/forecasting.html#regular-regression-rr-style>`_.
-```
-By default, the tool will load `tsinghua_electricity` dataset and train a `TCNForecaster` with input lookback and horizon parameters under `PyTorch` framework. As it loads, it prints information about hardware, environment variables and benchmark parameters. When benchmarking is completed, it reports the average throughput during training process. Users may be able to improve forecasting performance by following suggested changes on Nano environment variables.
-
-Besides the default usage, more execution parameters can be set to obtain more benchmark results. Read on to learn more about the configuration options available in Chronos benchmark tool.
-
-## Configuration Options
-The benchmark tool provides various options for configuring execution parameters. Some key configuration options are introduced in this part and a list of all options is given in [**Advanced Options**](#advanced-options).
-
-### Model
-The tool provides several built-in time series forecasting models, including TCN, LSTM, Seq2Seq, NBeats and Autoformer. To specify which model to use, run benchmark tool with `-m/--model`. If not specified, TCN is used as the default.
-```bash
-benchmark-chronos -m lstm -l 96 -o 720
-```
-
-### Stage
-Regarding a model, training and inference stages are most concerned. By setting `-s/--stage` parameter, users can obtain knowledge of throughput during training (`-s train`), accuracy after training(`-s accuracy`). throughput during inference (`-s throughput`) and latency of inference (`-s latency`). If not specified, train is used as the default.
-```bash
-benchmark-chronos -s latency -l 96 -o 720
-```
-```eval_rst
-.. note::
-    **More About Accuracy Results**:
-
-     After setting ``-s accuracy``, the tool will load dataset and split it to train, validation and test set with ratio of 7:1:2. Then validation loss is monitored during training epoches and checkpoint of the epoch with smallest loss is loaded after training. With the trained forecaster, obtain evaluation results corresponding to ``--metrics``.
-```
-
-### Dataset
-Several built-in datasets can be chosen, including nyc_taxi and tsinghua_electricity. If users are with poor Internet connection and hard to download dataset, run benchmark tool with `-d synthetic_dataset` to use synthetic dataset. Default to be tsinghua_electricity if `-d/--dataset` parameter is not specified.
-```bash
-benchmark-chronos -d nyc_taxi -l 96 -o 720
-```
-```eval_rst
-.. note::
-    **Download tsinghua_electricity Dataset**:
-
-     The tsinghua_electricity dataset does not support automatic downloading. Users can download manually from `here <https://github.com/thuml/Autoformer#get-started>`_ to path "~/.chronos/dataset/".
-```
-
-### Framework
-Pytorch and tensorflow are both supported and can be specified by setting `-f torch` or `-f tensorflow`. And the default framework is pytorch.
-```bash
-benchmark-chronos -f tensorflow -l 96 -o 720
-```
-```eval_rst
-.. note::
-     NBeats and Autoformer does not support tensorflow backend now.
-```
-
-### Core number
-By default, the benchmark tool will run on all physical cores. And users can explicitly specify the number of cores through `-c/--cores` parameter.
-```bash
-benchmark-chronos -c 4 -l 96 -o 720
-```
-
-### Lookback
-Forecasting aims at predicting the future by using the knowledge from the history. The required option `-l/--lookback`corresponds to the length of historical data along time.
-```bash
-benchmark-chronos -l 96 -o 720
-```
-
-### Horizon
-Forecasting aims at predicting the future by using the knowledge from the history. The required option `-o/--horizon`corresponds to the length of predicted data along time.
-```bash
-benchmark-chronos -l 96 -o 720
-```
-
-## Advanced Options
-When `-s/--stage accuracy` is set, users can further specify evaluation metrics through `--metrics` which default to be mse and mae.
-```bash
-benchmark-chronos --stage accuracy --metrics mse rmse  -l 96 -o 720
-```
-
-To improve model accuracy, the tool provides with normalization trick to alleviate distribution shift. Once enable `--normalization`, normalization trick will be applied to forecaster.
-```bash
-benchmark-chronos --stage accuracy --normalization -l 96 -o 720
-```
-```eval_rst
-.. note::
-     Only TCNForecaster supports normalization trick now.
-```
-
-Besides, number of processes and epoches can be set by `--training_processes` and `--training_epochs`. Users can also tune batchsize during training and inference through `--training_batchsize` and `--inference_batchsize` respectively.
-```bash
-benchmark-chronos --training_processes 2 --training_epochs 3 --training_batchsize 32 --inference_batchsize 128 -l 96 -o 720
-```
-
-To speed up inference, accelerators like ONNXRuntime and OpenVINO are usually used. To benchmark inference performance with or without accelerator, run tool with `--inference_framework` to specify without accelerator (`--inference_framework torch`)or with ONNXRuntime (`--inference_framework onnx`) or with OpenVINO (`--inference_framework openvino`) or with jit (`--inference_framework jit`).
-```bash
-benchmark-chronos --inference_framework onnx -l 96 -o 720
-```
-
-When benchmark tool is run with `--ipex` enabled, intel-extension-for-pytorch will be used as accelerator for trainer. 
-
-If want to use quantized model to predict, just run the benchmark tool with `--quantize` enabled and the quantize framework can be specified by `--quantize_type`. The parameter`--quantize_type` need to be set as pytorch_ipex when users want to use pytorch_ipex as quantize type. Otherwise, the defaut quantize type will be selected according to `--inference_framework`. If pytorch is the inference framework, then pytorch_fx will be the default. If users choose ONNXRuntime as inference framework, onnxrt_qlinearops will be quantize type. And if OpenVINO is chosen, the openvino quantize type will be selected.
-```bash
-benchmark-chronos --ipex --quantize --quantize_type pytorch_ipex -l 96 -o 720
-```
-
-
-Moreover, if want to benchmark inference performance of a trained model, run benchmark tool with `--ckpt` to specify the checkpoint path of model. By default, the model for inference will be trained first according to input parameters.
-
-Running the benchmark tool with `-h/--help` yields the following usage message, which contains all configuration options:
-```bash
-benchmark-chronos -h
-```
-```eval_rst
-.. code-block:: python
-
-     usage: benchmark-chronos [-h] [-m] [-s] [-d] [-f] [-c] -l lookback -o horizon
-                             [--training_processes] [--training_batchsize]
-                             [--training_epochs] [--inference_batchsize]
-                             [--quantize] [--inference_framework  [...]] [--ipex]
-                             [--quantize_type] [--ckpt] [--metrics  [...]]
-                             [--normalization]
-
-     Benchmarking Parameters
-
-     optional arguments:
-      -h, --help            show this help message and exit
-      -m, --model           model name, choose from
-                            tcn/lstm/seq2seq/nbeats/autoformer, default to "tcn".
-      -s, --stage           stage name, choose from
-                            train/latency/throughput/accuracy, default to "train".
-      -d, --dataset         dataset name, choose from
-                            nyc_taxi/tsinghua_electricity/synthetic_dataset,
-                            default to "tsinghua_electricity".
-      -f, --framework       framework name, choose from torch/tensorflow, default
-                            to "torch".
-      -c, --cores           core number, default to all physical cores.
-      -l lookback, --lookback lookback
-                            required, the history time steps (i.e. lookback).
-      -o horizon, --horizon horizon
-                            required, the output time steps (i.e. horizon).
-      --training_processes 
-                            number of processes when training, default to 1.
-      --training_batchsize 
-                            batch size when training, default to 32.
-      --training_epochs     number of epochs when training, default to 1.
-      --inference_batchsize 
-                            batch size when infering, default to 1.
-      --quantize            if use the quantized model to predict, default to
-                            False.
-      --inference_framework  [ ...]
-                            predict without/with accelerator, choose from
-                            torch/onnx/openvino/jit, default to "torch" (i.e. predict
-                            without accelerator).
-      --ipex                if use ipex as accelerator for trainer, default to
-                            False.
-      --quantize_type       quantize framework, choose from
-                            pytorch_fx/pytorch_ipex/onnxrt_qlinearops/openvino,
-                            default to "pytorch_fx".
-      --ckpt                checkpoint path of a trained model, e.g.
-                            "checkpoints/tcn", default to "checkpoints/tcn".
-      --metrics  [ ...]     evaluation metrics of a trained model, e.g.
-                            "mse"/"mae", default to "mse, mae".
-      --normalization       if to use normalization trick to alleviate
-                            distribution shift.
-```
-
diff --git a/docs/readthedocs/source/doc/Chronos/Howto/how_to_use_built-in_datasets.nblink b/docs/readthedocs/source/doc/Chronos/Howto/how_to_use_built-in_datasets.nblink
deleted file mode 100755
index cf1456b0..00000000
--- a/docs/readthedocs/source/doc/Chronos/Howto/how_to_use_built-in_datasets.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../python/chronos/colab-notebook/howto/how_to_use_built-in_datasets.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Howto/how_to_use_forecaster_to_predict_future_data.nblink b/docs/readthedocs/source/doc/Chronos/Howto/how_to_use_forecaster_to_predict_future_data.nblink
deleted file mode 100644
index 486ca63b..00000000
--- a/docs/readthedocs/source/doc/Chronos/Howto/how_to_use_forecaster_to_predict_future_data.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../python/chronos/colab-notebook/howto/how_to_use_forecaster_to_predict_future_data.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Howto/index.rst b/docs/readthedocs/source/doc/Chronos/Howto/index.rst
deleted file mode 100644
index f93a6941..00000000
--- a/docs/readthedocs/source/doc/Chronos/Howto/index.rst
+++ /dev/null
@@ -1,52 +0,0 @@
-Chronos How-to Guides
-=========================
-How-to guides are bite-sized, executable examples where users could check when meeting with some specific topic during the usage.
-
-Installation
--------------------------
-
-* `Install Chronos on Windows <windows_guide.html>`__
-* `Use Chronos in container(docker) <docker_guide_single_node.html>`__
-
-Data Processing
--------------------------
-* `Preprocess my data <how_to_preprocess_my_data.html>`__
-* `Built-in dataset <how_to_use_built-in_datasets.html>`__
-
-
-Forecasting
--------------------------
-
-Develop a forecaster
-~~~~~~~~~~~~~~~~~~~~~~~~~
-* `Choose a forecaster algorithm <how_to_choose_forecasting_alg.html>`__
-* `Create a forecaster <how_to_create_forecaster.html>`__
-* `Train forecaster on single node <how_to_train_forecaster_on_one_node.html>`__
-* `Tune forecaster on single node <how_to_tune_forecaster_model.html>`__
-* `Evaluate a forecaster <how_to_evaluate_a_forecaster.html>`__
-* `Use forecaster to predict future data <how_to_use_forecaster_to_predict_future_data.html>`__
-* `Generate confidence interval for prediction <how_to_generate_confidence_interval_for_prediction.html>`__
-
-Speed up a forecaster
-~~~~~~~~~~~~~~~~~~~~~~~~~
-* `Speed up inference of forecaster through ONNXRuntime <how_to_speedup_inference_of_forecaster_through_ONNXRuntime.html>`__
-* `Speed up inference of forecaster through OpenVINO <how_to_speedup_inference_of_forecaster_through_OpenVINO.html>`__
-* `Optimize a forecaster by searching the best accelerate method <how_to_optimize_a_forecaster.html>`__
-
-Persist a forecaster
-~~~~~~~~~~~~~~~~~~~~~~~~~
-* `Save and load a forecaster <how_to_save_and_load_forecaster.html>`__
-* `Export the ONNX model files to disk <how_to_export_onnx_files.html>`__
-* `Export the OpenVINO model files to disk <how_to_export_openvino_files.html>`__
-* `Export the TorchScript model files to disk <how_to_export_torchscript_files.html>`__
-* `Preprocess my data <how_to_preprocess_my_data.html>`__
-* `Built-in dataset <how_to_use_built-in_datasets.html>`__
-
-Benchmark a forecaster
-~~~~~~~~~~~~~~~~~~~~~~~~~
-* `Use Chronos benchmark tool <how_to_use_benchmark_tool.html>`__
-
-Deploy a forecaster
-~~~~~~~~~~~~~~~~~~~~~~~~~
-* `A whole workflow in production environment after my forecaster is developed <how_to_process_data_in_production_environment.html>`__
-* `Export data processing pipeline to torchscript for further deployment without Python environment <how_to_export_data_processing_pipeline_to_torchscript.html>`__
diff --git a/docs/readthedocs/source/doc/Chronos/Howto/windows_guide.md b/docs/readthedocs/source/doc/Chronos/Howto/windows_guide.md
deleted file mode 100644
index 400de173..00000000
--- a/docs/readthedocs/source/doc/Chronos/Howto/windows_guide.md
+++ /dev/null
@@ -1,91 +0,0 @@
-# Install Chronos on Windows
-
-There are 2 ways to install Chronos on Windows: install using WSL2 and install on native Windows. With WSL2, all the features of Chronos are available, while on native Windows, there are some limitations now.
-
-## Install using WSL2
-### Step 1: Install WSL2
-
-Follow [BigDL Windows User guide](../../UserGuide/win.md) to install WSL2.
-
-
-### Step 2: Install Chronos
-
-Follow the [Chronos Installation guide](../Overview/chronos.md#install) to install Chronos.
-
-## Install on native Windows
-
-### Step1: Install conda
-
-We recommend using conda to manage the Chronos python environment, for more information on install conda on Windows, you can refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-
-When conda is successfully installed, open the Anaconda Powershell Prompt, then you can create a conda environment using the following command:
-
-```
-# create a conda environment for chronos
-conda create -n my_env python=3.7 setuptools=58.0.4  # you could change my_env to any name you want
-```
-
-### Step2: Install Chronos from PyPI
-You can simply install Chronos from PyPI using the following command:
-
-```
-# activate your conda environment
-conda activate my_env
-
-# install Chronos nightly build version (2.1.0 stable release is not supported on native Windows)
-pip install --pre --upgrade bigdl-chronos[pytorch]
-```
-
-You can use the [install panel](https://bigdl.readthedocs.io/en/latest/doc/Chronos/Overview/install.html#install-using-conda) to select the proper install options based on your need, but there are some limitations now:
-
-- `bigdl-chronos[distributed]` is not supported.
-
-- `intel_extension_for_pytorch (ipex)` is unavailable for Windows now, so the related feature is not supported.
-
-### Known Issues on Native Windows
-
-#### Fail to Install Neural-compressor via pip
-
-**Problem description**
-
-Installing neural-compressor via pip may stuck when installing pycocotools.
-
-**Solution**
-
-Install pycocotools using conda:
-
-`conda install pycocotools -c esri`
-
-Then neural-compressor can be successfully installed using pip, we recommend installing neural-compressor 1.13.1 or higher:
-
-`pip install neural-compressor==1.13.1`
-
-#### RuntimeError during Quantization
-
-**Problem description**
-
-Calling `forecaster.quantize()` without specifying the `metric` parameter (e.g. `forecaster.quantize(train_data)`) will raise runtime error, it may happen when neural-compressor version is lower than `1.13.1`
-
-> [ERROR] Unexpected exception AssertionError('please use start() before end()') happened during tuning.
->
-> RuntimeError: Found no quantized model satisfying accuracy criterion.
-
-**Solution**
-
-Upgrade neural-compressor to 1.13.1 or higher.
-
-`pip install neural-compressor==1.13.1`
-
-#### RuntimeError during forecaster.fit
-
-**Problem description**
-
-`ProphetForecaster.fit` and `ProphetModel.fit_eval` may raise runtime error on native Windows.
-
-> RuntimeError: Error during optimization!
->
-> [ERROR] Chain [1] error: terminated by signal 3221225657
-
-According to our test, this issue only arises on some test machines or environments, you could check it by running `ProphetForecaster.fit` and `ProphetModel.fit_eval` on your own machines or environments.
-
-There is a similar [issue](https://github.com/facebook/prophet/issues/2227) in prophet repo, we will stay tuned for its progress.
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Image/aiops-workflow.png b/docs/readthedocs/source/doc/Chronos/Image/aiops-workflow.png
deleted file mode 100644
index ee1589b1..00000000
Binary files a/docs/readthedocs/source/doc/Chronos/Image/aiops-workflow.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/Chronos/Image/anomaly_detection.svg b/docs/readthedocs/source/doc/Chronos/Image/anomaly_detection.svg
deleted file mode 100644
index 41c488ca..00000000
--- a/docs/readthedocs/source/doc/Chronos/Image/anomaly_detection.svg
+++ /dev/null
@@ -1 +0,0 @@
-<svg width="1320" height="990" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" overflow="hidden"><defs><clipPath id="clip0"><rect x="1907" y="139" width="1320" height="990"/></clipPath></defs><g clip-path="url(#clip0)" transform="translate(-1907 -139)"><rect x="1907" y="139" width="1320" height="990" fill="#FFFFFF"/><path d="M0.00756648 0.0364743 153.008 492.037" stroke="#0171C3" stroke-width="20.5406" stroke-miterlimit="8" fill="none" fill-rule="evenodd" transform="matrix(1 0 0 -1 2090.5 932.5)"/><path d="M2241.86 430.229 2400.86 941.229" stroke="#0171C3" stroke-width="20.5406" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M2047.36 930.729C2047.36 908.089 2065.72 889.728 2088.36 889.728 2111 889.728 2129.36 908.089 2129.36 930.729 2129.36 953.369 2111 971.729 2088.36 971.729 2065.72 971.729 2047.36 953.369 2047.36 930.729Z" fill="#28A745" fill-rule="evenodd"/><path d="M2201.36 433.729C2201.36 410.532 2219.72 391.728 2242.36 391.728 2265 391.728 2283.36 410.532 2283.36 433.729 2283.36 456.924 2265 475.729 2242.36 475.729 2219.72 475.729 2201.36 456.924 2201.36 433.729Z" fill="#28A745" fill-rule="evenodd"/><path d="M0.0637745 0.0364743 153.064 492.037" stroke="#0171C3" stroke-width="20.5406" stroke-miterlimit="8" fill="none" fill-rule="evenodd" transform="matrix(1 0 0 -1 2401.5 932.5)"/><path d="M2554.86 430.229 2713.86 941.229" stroke="#0171C3" stroke-width="20.5406" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M2360.36 930.729C2360.36 908.089 2378.72 889.728 2401.36 889.728 2424 889.728 2442.36 908.089 2442.36 930.729 2442.36 953.369 2424 971.729 2401.36 971.729 2378.72 971.729 2360.36 953.369 2360.36 930.729Z" fill="#28A745" fill-rule="evenodd"/><path d="M2514.37 433.729C2514.37 410.532 2532.5 391.728 2554.87 391.728 2577.23 391.728 2595.37 410.532 2595.37 433.729 2595.37 456.924 2577.23 475.729 2554.87 475.729 2532.5 475.729 2514.37 456.924 2514.37 433.729Z" fill="#28A745" fill-rule="evenodd"/><path d="M0.12133 0.005045 150.122 653.006" stroke="#0171C3" stroke-width="20.5406" stroke-miterlimit="8" fill="none" fill-rule="evenodd" transform="matrix(1 0 0 -1 2718.5 920.5)"/><path d="M2876.86 263.229 3071.86 926.23" stroke="#0171C3" stroke-width="20.5406" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M2675.36 920.729C2675.36 898.089 2694.16 879.728 2717.36 879.728 2740.56 879.728 2759.36 898.089 2759.36 920.729 2759.36 943.369 2740.56 961.729 2717.36 961.729 2694.16 961.729 2675.36 943.369 2675.36 920.729Z" fill="#28A745" fill-rule="evenodd"/><path d="M3030.36 920.729C3030.36 898.089 3049.16 879.728 3072.36 879.728 3095.56 879.728 3114.36 898.089 3114.36 920.729 3114.36 943.369 3095.56 961.729 3072.36 961.729 3049.16 961.729 3030.36 943.369 3030.36 920.729Z" fill="#28A745" fill-rule="evenodd"/><path d="M2835.37 279.232C2835.37 256.864 2853.94 238.732 2876.87 238.732 2899.79 238.732 2918.37 256.864 2918.37 279.232 2918.37 301.6 2899.79 319.732 2876.87 319.732 2853.94 319.732 2835.37 301.6 2835.37 279.232Z" fill="#DC3545" fill-rule="evenodd"/></g></svg>
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Image/automl_hparams.png b/docs/readthedocs/source/doc/Chronos/Image/automl_hparams.png
deleted file mode 100644
index a4f901f2..00000000
Binary files a/docs/readthedocs/source/doc/Chronos/Image/automl_hparams.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/Chronos/Image/automl_monitor.png b/docs/readthedocs/source/doc/Chronos/Image/automl_monitor.png
deleted file mode 100644
index 281ed97d..00000000
Binary files a/docs/readthedocs/source/doc/Chronos/Image/automl_monitor.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/Chronos/Image/automl_scalars.png b/docs/readthedocs/source/doc/Chronos/Image/automl_scalars.png
deleted file mode 100644
index 55adfe3d..00000000
Binary files a/docs/readthedocs/source/doc/Chronos/Image/automl_scalars.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/Chronos/Image/forecast-RR.png b/docs/readthedocs/source/doc/Chronos/Image/forecast-RR.png
deleted file mode 100644
index badbfa31..00000000
Binary files a/docs/readthedocs/source/doc/Chronos/Image/forecast-RR.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/Chronos/Image/forecast-TS.png b/docs/readthedocs/source/doc/Chronos/Image/forecast-TS.png
deleted file mode 100644
index 0684d0ca..00000000
Binary files a/docs/readthedocs/source/doc/Chronos/Image/forecast-TS.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/Chronos/Image/forecasting.svg b/docs/readthedocs/source/doc/Chronos/Image/forecasting.svg
deleted file mode 100644
index 7d1fc66d..00000000
--- a/docs/readthedocs/source/doc/Chronos/Image/forecasting.svg
+++ /dev/null
@@ -1 +0,0 @@
-<svg width="1320" height="990" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" overflow="hidden"><defs><clipPath id="clip0"><rect x="192" y="139" width="1320" height="990"/></clipPath></defs><g clip-path="url(#clip0)" transform="translate(-192 -139)"><rect x="192" y="139" width="1320" height="990" fill="#FFFFFF"/><path d="M316.254 753.236 563.254 1013.24" stroke="#0171C3" stroke-width="20.8448" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M0.048095 0.0538267 156.049 500.054" stroke="#0171C3" stroke-width="20.8448" stroke-miterlimit="8" fill="none" fill-rule="evenodd" transform="matrix(1 0 0 -1 563.5 1014.5)"/><path d="M719.254 504.237 966.254 764.236" stroke="#0171C3" stroke-width="20.8448" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M0.119695 0.00939258 156.12 500.01" stroke="#0171C3" stroke-width="20.8448" stroke-miterlimit="8" stroke-dasharray="83.3791 62.5344" fill="none" fill-rule="evenodd" transform="matrix(1 0 0 -1 964.5 765.5)"/><path d="M1120.25 255.236 1367.25 515.237" stroke="#0171C3" stroke-width="20.8448" stroke-miterlimit="8" stroke-dasharray="83.3791 62.5344" fill="none" fill-rule="evenodd"/><path d="M293.754 776.237C293.754 753.317 312.334 734.737 335.254 734.737 358.175 734.737 376.755 753.317 376.755 776.237 376.755 799.154 358.175 817.737 335.254 817.737 312.334 817.737 293.754 799.154 293.754 776.237Z" fill="#EE9040" fill-rule="evenodd"/><path d="M520.754 1013.24C520.754 990.321 539.782 971.737 563.255 971.737 586.727 971.737 605.754 990.321 605.754 1013.24 605.754 1036.15 586.727 1054.74 563.255 1054.74 539.782 1054.74 520.754 1036.15 520.754 1013.24Z" fill="#EE9040" fill-rule="evenodd"/><path d="M676.754 509.237C676.754 486.317 695.782 467.737 719.254 467.737 742.727 467.737 761.754 486.317 761.754 509.237 761.754 532.157 742.727 550.737 719.254 550.737 695.782 550.737 676.754 532.157 676.754 509.237Z" fill="#EE9040" fill-rule="evenodd"/><path d="M923.754 754.237C923.754 731.317 942.338 712.737 965.254 712.737 988.171 712.737 1006.75 731.317 1006.75 754.237 1006.75 777.154 988.171 795.737 965.254 795.737 942.338 795.737 923.754 777.154 923.754 754.237Z" fill="#EE9040" fill-rule="evenodd"/><path d="M1077.75 255.237C1077.75 231.765 1096.78 212.737 1120.25 212.737 1143.73 212.737 1162.75 231.765 1162.75 255.237 1162.75 278.709 1143.73 297.737 1120.25 297.737 1096.78 297.737 1077.75 278.709 1077.75 255.237Z" fill="#EE9040" fill-rule="evenodd"/><path d="M1326.75 504.237C1326.75 480.765 1345.34 461.737 1368.25 461.737 1391.17 461.737 1409.75 480.765 1409.75 504.237 1409.75 527.709 1391.17 546.737 1368.25 546.737 1345.34 546.737 1326.75 527.709 1326.75 504.237Z" fill="#EE9040" fill-rule="evenodd"/></g></svg>
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Image/simulation.svg b/docs/readthedocs/source/doc/Chronos/Image/simulation.svg
deleted file mode 100644
index 71744d3a..00000000
--- a/docs/readthedocs/source/doc/Chronos/Image/simulation.svg
+++ /dev/null
@@ -1 +0,0 @@
-<svg width="1320" height="990" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" overflow="hidden"><defs><clipPath id="clip0"><rect x="1154" y="1335" width="1320" height="990"/></clipPath></defs><g clip-path="url(#clip0)" transform="translate(-1154 -1335)"><rect x="1154" y="1335" width="1320" height="990" fill="#FFFFFF"/><path d="M0.0086817 0.00865864 114.008 287.009" stroke="#0171C3" stroke-width="22.0877" stroke-miterlimit="8" fill="none" fill-rule="evenodd" transform="matrix(1 0 0 -1 1446 1790)"/><path d="M1566.06 1513.74 1680.06 1790.74" stroke="#0171C3" stroke-width="22.0877" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M1394.06 1788.74C1394.06 1763.89 1413.76 1743.74 1438.06 1743.74 1462.36 1743.74 1482.06 1763.89 1482.06 1788.74 1482.06 1813.59 1462.36 1833.74 1438.06 1833.74 1413.76 1833.74 1394.06 1813.59 1394.06 1788.74Z" fill="#EE9040" fill-rule="evenodd"/><path d="M1514.06 1496.74C1514.06 1471.89 1533.76 1451.74 1558.06 1451.74 1582.36 1451.74 1602.06 1471.89 1602.06 1496.74 1602.06 1521.59 1582.36 1541.74 1558.06 1541.74 1533.76 1541.74 1514.06 1521.59 1514.06 1496.74Z" fill="#EE9040" fill-rule="evenodd"/><path d="M0.0498406 0.00865864 113.05 287.009" stroke="#0171C3" stroke-width="22.0877" stroke-miterlimit="8" fill="none" fill-rule="evenodd" transform="matrix(1 0 0 -1 1694 1790)"/><path d="M1813.06 1513.74 1927.06 1790.74" stroke="#0171C3" stroke-width="22.0877" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M1641.06 1789.74C1641.06 1765.44 1661.2 1745.74 1686.06 1745.74 1710.92 1745.74 1731.06 1765.44 1731.06 1789.74 1731.06 1814.04 1710.92 1833.74 1686.06 1833.74 1661.2 1833.74 1641.06 1814.04 1641.06 1789.74Z" fill="#EE9040" fill-rule="evenodd"/><path d="M1761.05 1497.74C1761.05 1473.44 1781.42 1453.74 1806.55 1453.74 1831.68 1453.74 1852.05 1473.44 1852.05 1497.74 1852.05 1522.04 1831.68 1541.74 1806.55 1541.74 1781.42 1541.74 1761.05 1522.04 1761.05 1497.74Z" fill="#EE9040" fill-rule="evenodd"/><path d="M0.0918813 0.00865864 114.092 287.009" stroke="#0171C3" stroke-width="22.0877" stroke-miterlimit="8" fill="none" fill-rule="evenodd" transform="matrix(1 0 0 -1 1941 1790)"/><path d="M2060.06 1513.74 2174.06 1790.74" stroke="#0171C3" stroke-width="22.0877" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M1890.06 1788.74C1890.06 1763.89 1909.76 1743.74 1934.06 1743.74 1958.36 1743.74 1978.06 1763.89 1978.06 1788.74 1978.06 1813.59 1958.36 1833.74 1934.06 1833.74 1909.76 1833.74 1890.06 1813.59 1890.06 1788.74Z" fill="#EE9040" fill-rule="evenodd"/><path d="M2011.06 1496.74C2011.06 1471.89 2030.76 1451.74 2055.06 1451.74 2079.36 1451.74 2099.06 1471.89 2099.06 1496.74 2099.06 1521.59 2079.36 1541.74 2055.06 1541.74 2030.76 1541.74 2011.06 1521.59 2011.06 1496.74Z" fill="#EE9040" fill-rule="evenodd"/><path d="M2142.06 1788.74C2142.06 1763.89 2162.2 1743.74 2187.06 1743.74 2211.92 1743.74 2232.06 1763.89 2232.06 1788.74 2232.06 1813.59 2211.92 1833.74 2187.06 1833.74 2162.2 1833.74 2142.06 1813.59 2142.06 1788.74Z" fill="#EE9040" fill-rule="evenodd"/><path d="M0.0086817 0.0767511 114.008 288.077" stroke="#0171C3" stroke-width="22.0877" stroke-miterlimit="8" stroke-dasharray="88.3508 66.2631" fill="none" fill-rule="evenodd" transform="matrix(1 0 0 -1 1446 2195)"/><path d="M1566.06 1917.74 1680.06 2194.74" stroke="#0171C3" stroke-width="22.0877" stroke-miterlimit="8" stroke-dasharray="88.3508 66.2631" fill="none" fill-rule="evenodd"/><path d="M1394.06 2192.74C1394.06 2167.88 1413.76 2147.74 1438.06 2147.74 1462.36 2147.74 1482.06 2167.88 1482.06 2192.74 1482.06 2217.6 1462.36 2237.74 1438.06 2237.74 1413.76 2237.74 1394.06 2217.6 1394.06 2192.74Z" fill="#EE9040" fill-rule="evenodd"/><path d="M1514.06 1900.74C1514.06 1875.89 1533.76 1855.74 1558.06 1855.74 1582.36 1855.74 1602.06 1875.89 1602.06 1900.74 1602.06 1925.59 1582.36 1945.74 1558.06 1945.74 1533.76 1945.74 1514.06 1925.59 1514.06 1900.74Z" fill="#EE9040" fill-rule="evenodd"/><path d="M0.0498406 0.0767511 113.05 288.077" stroke="#0171C3" stroke-width="22.0877" stroke-miterlimit="8" stroke-dasharray="88.3508 66.2631" fill="none" fill-rule="evenodd" transform="matrix(1 0 0 -1 1694 2195)"/><path d="M1813.06 1917.74 1927.06 2194.74" stroke="#0171C3" stroke-width="22.0877" stroke-miterlimit="8" stroke-dasharray="88.3508 66.2631" fill="none" fill-rule="evenodd"/><path d="M1641.06 2195.24C1641.06 2170.1 1661.2 2149.74 1686.06 2149.74 1710.92 2149.74 1731.06 2170.1 1731.06 2195.24 1731.06 2220.37 1710.92 2240.74 1686.06 2240.74 1661.2 2240.74 1641.06 2220.37 1641.06 2195.24Z" fill="#EE9040" fill-rule="evenodd"/><path d="M1761.05 1902.74C1761.05 1877.89 1781.42 1857.74 1806.55 1857.74 1831.68 1857.74 1852.05 1877.89 1852.05 1902.74 1852.05 1927.59 1831.68 1947.74 1806.55 1947.74 1781.42 1947.74 1761.05 1927.59 1761.05 1902.74Z" fill="#EE9040" fill-rule="evenodd"/><path d="M0.0918813 0.0767511 114.092 288.077" stroke="#0171C3" stroke-width="22.0877" stroke-miterlimit="8" stroke-dasharray="88.3508 66.2631" fill="none" fill-rule="evenodd" transform="matrix(1 0 0 -1 1941 2195)"/><path d="M2060.06 1917.74 2174.06 2194.74" stroke="#0171C3" stroke-width="22.0877" stroke-miterlimit="8" stroke-dasharray="88.3508 66.2631" fill="none" fill-rule="evenodd"/><path d="M1890.06 2192.74C1890.06 2167.88 1909.76 2147.74 1934.06 2147.74 1958.36 2147.74 1978.06 2167.88 1978.06 2192.74 1978.06 2217.6 1958.36 2237.74 1934.06 2237.74 1909.76 2237.74 1890.06 2217.6 1890.06 2192.74Z" fill="#EE9040" fill-rule="evenodd"/><path d="M2011.06 1900.74C2011.06 1875.89 2030.76 1855.74 2055.06 1855.74 2079.36 1855.74 2099.06 1875.89 2099.06 1900.74 2099.06 1925.59 2079.36 1945.74 2055.06 1945.74 2030.76 1945.74 2011.06 1925.59 2011.06 1900.74Z" fill="#EE9040" fill-rule="evenodd"/><path d="M2142.06 2192.74C2142.06 2167.88 2162.2 2147.74 2187.06 2147.74 2211.92 2147.74 2232.06 2167.88 2232.06 2192.74 2232.06 2217.6 2211.92 2237.74 2187.06 2237.74 2162.2 2237.74 2142.06 2217.6 2142.06 2192.74Z" fill="#EE9040" fill-rule="evenodd"/></g></svg>
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Overview/aiops.md b/docs/readthedocs/source/doc/Chronos/Overview/aiops.md
deleted file mode 100644
index 87870226..00000000
--- a/docs/readthedocs/source/doc/Chronos/Overview/aiops.md
+++ /dev/null
@@ -1,87 +0,0 @@
-# Artificial Intelligence for IT operations (AIOps)
-
-Chronos provides a template(i.e., `ConfigGenerator`) as an easy-to-use builder for an AIOps decision system with the usage of `Trigger`.
-
-## How does it work
-
-AIOps application typically relies on a decision system with one or multiple AI models. Generally, this AI system needs to be trained with some training data with a self-defined checkpoint. When using the AI system, we first initialize it throught previously trained checkpoint and inform the AI system with current status to get the suggested configuration.
-
-![](../Image/aiops-workflow.png)
-
-Sometimes the AI system need to be informed some **timely** information (e.g., some events in log or some monitoring data every second). Chronos also defines some triggers for this kind of usage.
-
-## Define ConfigGenerator
-
-### Start from a trivial ConfigGenerator
-Chronos provides `bigdl.chronos.aiops.ConfigGenerator` as a template for users to define their own AIOps AI system. Following is a "hello-world" case.
-
-```python
-class MyConfigGenerator(ConfigGenerator):
-    def __init__(self):
-        super().__init__()
-        self.best_config = [3.0, 1.6]
-
-    def genConfig(self):
-        return self.best_config
-```
-
-For this self-defined `MyConfigGenerator`, we keep generate a fixed best config with out considering current status. This could be a startpoint or smoke test configgenerator for your system. The whole system even do not need to be trained.
-
-### Add AI Model to ConfigGenerator
-Any model could be used in `ConfigGenerator`, to name a few, sklearn, pytorch or tensorflow models are all valid. Following is a normal flow you may want to add your model.
-
-```python
-class MyConfigGenerator(ConfigGenerator):
-    def __init__(self, path):
-        super().__init__()
-        self.model = load_model_from_checkpoint(path)
-
-    def genConfig(self, current_status):
-        return self.model(current_status)
-
-    @staticmethod
-    def train(train_data, path):
-        train_model_and_save_checkpoint(train_data, path)
-```
-
-- In `MyConfigGenerator.train`, users will define the way to train their model and save to a specific path.
-- In `MyConfigGenerator.__init__`, users will define the way to load the trained checkpoint.
-- In `MyConfigGenerator.genConfig`, users will define the way to use the loaded model to do the prediction and get the suggested config.
-
-Please refer to [ConfigGenerator API doc](../../PythonAPI/Chronos/aiops.html) for detailed information.
-
-#### Use Chronos Forecaster/Anomaly detector
-Chronos also provides some out-of-box forecasters and anomaly detectors for time series data for users to build their AIOps use-case easier.
-
-Please refer to [Forecaster User Guide](./forecasting.html) and [Anomaly Detector User Guide](./anomaly_detection.html) for detailed information.
-
-### Use trigger in ConfigGenerator
-Sometimes the AI system need to be informed some **timely** information (e.g., some events in log or some monitoring data every second). Chronos also defines some triggers for this kind of usage. Following is a trivial case to help users understand what a `Trigger` can do. 
-
-```python
-class MyConfigGenerator(ConfigGenerator):
-    def __init__(self):
-        self.sweetpoint = 1
-        super().__init__()
-
-    def genConfig(self):
-        return self.sweetpoint
-
-    @triggerbyclock(2)
-    def update_sweetpoint(self):
-        self.sweetpoint += 1
-```
-
-In this case, once the `MyConfigGenerator` is initialized, `update_sweetpoint` will be called every 2 seconds, users will thus get an evolving ConfiguGenerator.
-
-```python
-mycg = MyConfigGenerator(1)
-time.sleep(2)
-assert mycg.genConfig() == 2
-time.sleep(2)
-assert mycg.genConfig() == 3
-```
-
-This trivial case may seem useless, but with a dedicated `update_sweetpoint`, such as get the CPU utils every second, users could bring useful information to their ConfigGenerator and make better decision with easy programming.
-
-Please refer to [Trigger API doc](../../PythonAPI/Chronos/aiops.html) for detailed information.
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Overview/anomaly_detection.md b/docs/readthedocs/source/doc/Chronos/Overview/anomaly_detection.md
deleted file mode 100644
index 8fc00aa2..00000000
--- a/docs/readthedocs/source/doc/Chronos/Overview/anomaly_detection.md
+++ /dev/null
@@ -1,34 +0,0 @@
-# Anomaly Detection
-
-Anomaly Detection detects abnormal samples in a given time series. _Chronos_ provides a set of unsupervised anomaly detectors.
-
-View some examples notebooks for [Datacenter AIOps][AIOps].
-
-## 1. ThresholdDetector
-
-ThresholdDetector detects anomaly based on threshold. It can be used to detect anomaly on a given time series ([notebook][AIOps_anomaly_detect_unsupervised]), or used together with [Forecasters](#forecasting) to detect anomaly on new coming samples ([notebook][AIOps_anomaly_detect_unsupervised_forecast_based]).
-
-View [ThresholdDetector API Doc](../../PythonAPI/Chronos/anomaly_detectors.html#chronos-model-anomaly-th-detector) for more details.
-
-
-## 2. AEDetector
-
-AEDetector detects anomaly based on the reconstruction error of an autoencoder network.
-
-View anomaly detection [notebook][AIOps_anomaly_detect_unsupervised] and [AEDetector API Doc](../../PythonAPI/Chronos/anomaly_detectors.html#chronos-model-anomaly-ae-detector) for more details.
-
-## 3. DBScanDetector
-
-DBScanDetector uses DBSCAN clustering algortihm for anomaly detection.
-
-```eval_rst
-.. note::
-     Users may install ``scikit-learn-intelex`` to accelerate this detector. Chronos will detect if ``scikit-learn-intelex`` is installed to decide if using it. More details please refer to: https://intel.github.io/scikit-learn-intelex/installation.html
-```
-
-View anomaly detection [notebook][AIOps_anomaly_detect_unsupervised] and [DBScanDetector API Doc](../../PythonAPI/Chronos/anomaly_detectors.html#chronos-model-anomaly-dbscan-detector) for more details.
-
-
-[AIOps]:<https://github.com/intel-analytics/BigDL/tree/main/python/chronos/use-case/AIOps>
-[AIOps_anomaly_detect_unsupervised]:<https://github.com/intel-analytics/BigDL/blob/main/python/chronos/use-case/AIOps/AIOps_anomaly_detect_unsupervised.ipynb>
-[AIOps_anomaly_detect_unsupervised_forecast_based]:<https://github.com/intel-analytics/BigDL/blob/main/python/chronos/use-case/AIOps/AIOps_anomaly_detect_unsupervised_forecast_based.ipynb>
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Overview/chronos_known_issue.md b/docs/readthedocs/source/doc/Chronos/Overview/chronos_known_issue.md
deleted file mode 100644
index e9a720bb..00000000
--- a/docs/readthedocs/source/doc/Chronos/Overview/chronos_known_issue.md
+++ /dev/null
@@ -1,71 +0,0 @@
-# Chronos Known Issue
-
-## Version Compatibility Issues
-
-### Numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
-
-**Problem description**
-
-It seems to be a numpy compatibility issue, we do not recommend to solve it by downgrading Numpy to 1.19.x,
-when no other issues exist, the solution is given below.
-
-**Solution**
-* `pip install -y pycocotools`
-* `pip install pycocotools --no-cache-dir --no-binary :all:`
-* `conda install –c conda-forge pycocotools`
-
----------------------------
-
-### Cannot convert a symbolic Tensor (encoder_lstm_8/strided_slice:0) to a numpy array
-
-**Problem description**
-
-This is a compatibility issue caused by Tensorflow and Numpy 1.20.x
-
-**Solution**
-
-* `pip install numpy==1.19.5`
-
----------------------------
-
-### StanModel object has no attribute 'fit_class'
-
-**Problem description**
-
-We recommend reinstalling prophet using conda or miniconda.
-
-**Solution**
-
-* `pip uninstall pystan prophet –y`
-* `conda install –c conda-forge prophet=1.0.1`
-
----------------------------
-
-## Dependency Issues
-
-### RuntimeError: No active RayContext
-
-**Problem description**
-
-Exception: No active RayContext. Please call init_orca_context to create a RayContext.
-> ray_ctx = RayContext.get()<br>
-> ray_ctx = RayContext.get(initialize=False)
-
-**Solution**
-
-* Make sure all operations are before `stop_orca_context`. 
-* No other `RayContext` exists before `init_orca_context`. 
-
----------------------------
-
-### error while loading shared libraries: libunwind.so.8: cannot open shared object file: No such file or directory.
-
-**Problem description**
-
-A dependency is missing from your environment, only happens when you run `source bigdl-nano-init`.
-
-**Solution**
-
-* `apt-get install libunwind8-dev` 
-
----------------------------
diff --git a/docs/readthedocs/source/doc/Chronos/Overview/data_processing_feature_engineering.md b/docs/readthedocs/source/doc/Chronos/Overview/data_processing_feature_engineering.md
deleted file mode 100644
index 7218b51c..00000000
--- a/docs/readthedocs/source/doc/Chronos/Overview/data_processing_feature_engineering.md
+++ /dev/null
@@ -1,276 +0,0 @@
-# Data Processing and Feature Engineering
-
-Time series data is a special data formulation with its specific operations. _Chronos_ provides [`TSDataset`](../../PythonAPI/Chronos/tsdataset.html) as a time series dataset abstract for data processing (e.g. impute, deduplicate, resample, scale/unscale, roll sampling) and auto feature engineering (e.g. datetime feature, aggregation feature). Chronos also provides [`XShardsTSDataset`](../../PythonAPI/Chronos/tsdataset.html#xshardstsdataset) with same(or similar) API for distributed and parallelized data preprocessing on large data.
-
-Users can create a [`TSDataset`](../../PythonAPI/Chronos/tsdataset.html) quickly from many raw data types, including pandas dataframe, parquet files, spark dataframe or xshards objects. [`TSDataset`](../../PythonAPI/Chronos/tsdataset.html) can be directly used in [`AutoTSEstimator`](../../PythonAPI/Chronos/autotsestimator.html#autotsestimator) and [forecasters](../../PythonAPI/Chronos/forecasters). It can also be converted to pandas dataframe, numpy ndarray, pytorch dataloaders or tensorflow dataset for various usage.
-
-## 1. Basic concepts
-
-A time series can be interpreted as a sequence of real value whose order is timestamp. While a time series dataset can be a combination of one or a huge amount of time series. It may contain multiple time series since users may collect different time series in the same/different period of time (e.g. An AIops dataset may have CPU usage ratio and memory usage ratio data for two servers at a period of time. This dataset contains four time series).
-
-In [`TSDataset`](../../PythonAPI/Chronos/tsdataset.html) and [`XShardsTSDataset`](../../PythonAPI/Chronos/tsdataset.html#xshardstsdataset), we provide **2** possible dimensions to construct a high dimension time series dataset (i.e. **feature dimension** and **id dimension**).
-
-* feature dimension: Time series along this dimension might be independent or related. Though they may be related, they are assumed to have **different patterns and distributions** and collected on the **same period of time**. For example, the CPU usage ratio and Memory usage ratio for the same server at a period of time.
-* id dimension: Time series along this dimension are assumed to have the **same patterns and distributions** and might by collected on the **same or different period of time**. For example, the CPU usage ratio for two servers at a period of time.
-
-All the preprocessing operations will be done on each independent time series(i.e on both feature dimension and id dimension), while feature scaling will be only carried out on the feature dimension.
-
-```eval_rst
-.. note::
-
-     ``XShardsTSDataset`` will perform the data processing in parallel(based on spark) to support large dataset. While the parallelization will only be performed on "id dimension". This means, in previous example, ``XShardsTSDataset`` will only utilize multiple workers to process data for different servers at the same time. If a dataset only has 1 id, ``XShardsTSDataset`` will be even slower than ``TSDataset`` because of the overhead.
-
-```
-
-## 2. Create a TSDataset
-
-[`TSDataset`](../../PythonAPI/Chronos/tsdataset.html) supports initializing from a pandas dataframe through [`TSDataset.from_pandas`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.from_pandas), from a parquet file through [`TSDataset.from_parquet`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.from_parquet) or from Prometheus data through [`TSDataset.from_prometheus`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.from_prometheus).
-
-[`XShardsTSDataset`](../../PythonAPI/Chronos/tsdataset.html#xshardstsdataset) supports initializing from an [xshards object](https://bigdl.readthedocs.io/en/latest/doc/Orca/Overview/data-parallel-processing.html#xshards-distributed-data-parallel-python-processing) through [`XShardsTSDataset.from_xshards`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.experimental.xshards_tsdataset.XShardsTSDataset.from_xshards) or from a Spark Dataframe through [`XShardsTSDataset.from_sparkdf`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.experimental.xshards_tsdataset.XShardsTSDataset.from_sparkdf).
-
-A typical valid time series dataframe `df` is shown below.
-
-You can initialize a [`XShardsTSDataset`](../../PythonAPI/Chronos/tsdataset.html#xshardstsdataset) or [`TSDataset`](../../PythonAPI/Chronos/tsdataset.html) by simply:
-```eval_rst
-
-.. tabs::
-
-    .. tab:: TSDataset
-
-        .. code-block:: python
-
-            # Server id  Datetime         CPU usage   Mem usage
-            # 0          08:39 2021/7/9   93          24
-            # 0          08:40 2021/7/9   91          24
-            # 0          08:41 2021/7/9   93          25
-            # 0          ...              ...         ...
-            # 1          08:39 2021/7/9   73          79
-            # 1          08:40 2021/7/9   72          80
-            # 1          08:41 2021/7/9   79          80
-            # 1          ...              ...         ...
-            from bigdl.chronos.data import TSDataset
-
-            tsdata = TSDataset.from_pandas(df,
-                                           dt_col="Datetime",
-                                           id_col="Server id",
-                                           target_col=["CPU usage",
-                                                       "Mem usage"])
-
-    .. tab:: XShardsTSDataset
-
-        .. code-block:: python
-
-            # Here is a df example:
-            # id        datetime      value   "extra feature 1"   "extra feature 2"
-            # 00        2019-01-01    1.9     1                   2
-            # 01        2019-01-01    2.3     0                   9
-            # 00        2019-01-02    2.4     3                   4
-            # 01        2019-01-02    2.6     0                   2
-            from bigdl.orca.data.pandas import read_csv
-            from bigdl.chronos.data.experimental import XShardsTSDataset
-
-            shards = read_csv(csv_path)
-            tsdataset = XShardsTSDataset.from_xshards(shards, dt_col="datetime",
-                                                      target_col="value", id_col="id",
-                                                      extra_feature_col=["extra feature 1",
-                                                                         "extra feature 2"])
-
-```
-`target_col` is a list of all elements along feature dimension, while `id_col` is the identifier that distinguishes the id dimension. `dt_col` is the datetime column. For `extra_feature_col`(not shown in this case), you should list those features that you will use as input features but not as target features (e.g. you will **not** perform forecasting or anomaly detection task on this col).
-
-If you are building a prototype for your forecasting/anomaly detection task and you need to split you TSDataset to train/valid/test set, you can use `with_split` parameter.[`TSDataset`](../../PythonAPI/Chronos/tsdataset.html) or [`XShardsTSDataset`](../../PythonAPI/Chronos/tsdataset.html#xshardstsdataset) supports split with ratio by `val_ratio` and `test_ratio`.
-
-If you are deploying your model in production environment, you can use `deploy_mode` parameter and specify it to `True` when calling `TSDataset.from_pandas`, `TSDataset.from_parquet` or `TSDataset.from_prometheus`, which will reduce data processing latency and set necessary parameters for data processing and feature engineering.
-
-## 3. Time series dataset preprocessing
-[`TSDataset`](../../PythonAPI/Chronos/tsdataset.html) supports [`impute`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.impute), [`deduplicate`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.deduplicate) and [`resample`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.resample). You may fill the missing point by [`impute`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.impute) in different modes. You may remove the records that are totally the same by [`deduplicate`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.deduplicate). You may change the sample frequency by [`resample`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.resample). [`XShardsTSDataset`](../../PythonAPI/Chronos/tsdataset.html#xshardstsdataset) only supports [`impute`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.experimental.xshards_tsdataset.XShardsTSDataset.impute) for now.
-
-A typical cascade call for preprocessing is:
-```eval_rst
-.. tabs::
-
-    .. tab:: TSDataset
-
-        .. code-block:: python
-
-            tsdata.deduplicate().resample(interval="2s").impute()
-
-    .. tab:: XShardsTSDataset
-
-         .. code-block:: python
-
-            tsdata.impute()
-```
-## 4. Feature scaling
-Scaling all features to one distribution is important, especially when we want to train a machine learning/deep learning system. Scaling will make the training process much more stable. Still, we may always remember to unscale the prediction result at last.
-
-[`TSDataset`](../../PythonAPI/Chronos/tsdataset.html) and [`XShardsTSDataset`](../../PythonAPI/Chronos/tsdataset.html#xshardstsdataset) support all the scalers in sklearn through [`scale`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.scale) and [`unscale`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.unscale) method.
-
-Since a scaler should not fit, a typical call for scaling operations is is:
-```eval_rst
-.. tabs::
-
-    .. tab:: TSDataset
-
-        .. code-block:: python
-
-            from sklearn.preprocessing import StandardScaler
-            scale = StandardScaler()
-
-            # scale
-            for tsdata in [tsdata_train, tsdata_valid, tsdata_test]:
-                tsdata.scale(scaler, fit=tsdata is tsdata_train)
-
-            # unscale
-            for tsdata in [tsdata_train, tsdata_valid, tsdata_test]:
-                tsdata.unscale()
-
-    .. tab:: XShardsTSDataset
-
-        .. code-block:: python
-
-            from sklearn.preprocessing import StandardScaler
-            scale = StandardScaler()
-
-            # scale
-            scaler = {"id1": StandardScaler(), "id2": StandardScaler()}
-            for tsdata in [tsdata_train, tsdata_valid, tsdata_test]:
-                tsdata.scale(scaler, fit=tsdata is tsdata_train)
-
-            # unscale
-            for tsdata in [tsdata_train, tsdata_valid, tsdata_test]:
-                tsdata.unscale()
-```
-[`unscale_numpy`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.unscale_numpy) in TSDataset or [`unscale_xshards`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.experimental.xshards_tsdataset.XShardsTSDataset.unscale_xshards) in XShardsTSDataset is specially designed for forecasters. Users may unscale the output of a forecaster by this operation.
-
-A typical call is:
-```eval_rst
-.. tabs::
-
-    .. tab:: TSDataset
-
-        .. code-block:: python
-
-            x, y = tsdata_test.scale(scaler)\
-                              .roll(lookback=..., horizon=...)\
-                              .to_numpy()
-            yhat = forecaster.predict(x)
-            unscaled_yhat = tsdata_test.unscale_numpy(yhat)
-            unscaled_y = tsdata_test.unscale_numpy(y)
-            # calculate metric by unscaled_yhat and unscaled_y
-
-    .. tab:: XShardsTSDataset
-
-        .. code-block:: python
-
-            x, y = tsdata_test.scale(scaler)\
-                              .roll(lookback=..., horizon=...)\
-                              .to_xshards()
-            yhat = forecaster.predict(x)
-            unscaled_yhat = tsdata_test.unscale_xshards(yhat)
-            unscaled_y = tsdata_test.unscale_xshards(y, key="y")
-            # calculate metric by unscaled_yhat and unscaled_y
-```
-## 5. Feature generation
-Other than historical target data and other extra feature provided by users, some additional features can be generated automatically by [`TSDataset`](../../PythonAPI/Chronos/tsdataset.html). [`gen_dt_feature`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.gen_dt_feature) helps users to generate 10 datetime related features(e.g. MONTH, WEEKDAY, ...). [`gen_global_feature`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.gen_global_feature) and [`gen_rolling_feature`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.gen_rolling_feature) are powered by tsfresh to generate aggregated features (e.g. min, max, ...) for each time series or rolling windows respectively.
-
-## 6. Sampling and exporting
-A time series dataset needs to be sampling and exporting as numpy ndarray/dataloader to be used in machine learning and deep learning models(e.g. forecasters, anomaly detectors, auto models, etc.).
-```eval_rst
-.. warning::
-    You don't need to call any sampling or exporting methods introduced in this section when using ``AutoTSEstimator``.
-```
-### 6.1 Roll sampling
-Roll sampling (or sliding window sampling) is useful when you want to train a RR type supervised deep learning forecasting model. It works as the [diagram](#RR-forecast-image) shows.
-
-
-Please refer to the API doc [`roll`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.roll) for detailed behavior. Users can simply export the sampling result as numpy ndarray by [`to_numpy`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.to_numpy), pytorch dataloader [`to_torch_data_loader`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.to_torch_data_loader), tensorflow dataset by [`to_tf_dataset`](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.to_tf_dataset) or xshards object by [`to_xshards`](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.experimental.xshards_tsdataset.XShardsTSDataset.to_xshards).
-
-
-```eval_rst
-.. note::
-    **Difference between** ``roll`` **and** ``to_torch_data_loader``:
-
-    ``.roll(...)`` performs the rolling before RR forecasters/auto models training while ``.to_torch_data_loader(...)`` performs rolling during the training.
-
-    It is fine to use either of them when you have a relatively small dataset (less than 1G). ``.to_torch_data_loader(...)`` is recommended when you have a large dataset (larger than 1G) to save memory usage.
-```
-
-```eval_rst
-.. note::
-    **Roll sampling format**:
-
-    As decribed in RR style forecasting concept, the sampling result will have the following shape requirement.
-
-    | x: (sample_num, lookback, input_feature_num)
-    | y: (sample_num, horizon, output_feature_num)
-
-    Please follow the same shape if you use customized data creator.
-```
-
-A typical call of [`roll`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.roll) is as following:
-
-```eval_rst
-.. tabs::
-
-    .. tab:: TSDataset
-
-        .. code-block:: python
-
-            # forecaster
-            x, y = tsdata.roll(lookback=..., horizon=...).to_numpy()
-            forecaster.fit((x, y))
-
-    .. tab:: XShardsTSDataset
-
-        .. code-block:: python
-
-            # forecaster
-            data = tsdata.roll(lookback=..., horizon=...).to_xshards()
-            forecaster.fit(data)
-```
-
-### 6.2 Pandas Exporting
-Now we support pandas dataframe exporting through `to_pandas()` for users to carry out their own transformation. Here is an example of using only one time series for anomaly detection.
-```python
-# anomaly detector on "target" col
-x = tsdata.to_pandas()["target"].to_numpy()
-anomaly_detector.fit(x)
-```
-View [TSDataset API Doc](../../PythonAPI/Chronos/tsdataset.html#) for more details.
-
-## 7. Built-in Dataset
-
-Built-in Dataset supports the function of data downloading, preprocessing, and returning to the `TSDataset` object of the public data set.
-
-|Dataset name|Task|Time Series Length|Number of Instances|Feature Number|Information Page|Download Link|
-|---|---|---|---|---|---|---|
-|network_traffic|forecasting|8760|1|2|[network_traffic](http://mawi.wide.ad.jp/~agurim/about.html)|[network_traffic](http://mawi.wide.ad.jp/~agurim/dataset/)|
-|nyc_taxi|forecasting|10320|1|1|[nyc_taxi](https://github.com/numenta/NAB/blob/master/data/README.md)|[nyc_taxi](https://raw.githubusercontent.com/numenta/NAB/v1.0/data/realKnownCause/nyc_taxi.csv)|
-|fsi|forecasting|1259|1|1|[fsi](https://github.com/CNuge/kaggle-code/tree/master/stock_data)|[fsi](https://github.com/CNuge/kaggle-code/raw/master/stock_data/individual_stocks_5yr.zip)|
-|AIOps|anomaly_detect|61570|1|1|[AIOps](https://github.com/alibaba/clusterdata)|[AIOps](http://clusterdata2018pubcn.oss-cn-beijing.aliyuncs.com/machine_usage.tar.gz)|
-|uci_electricity|forecasting|140256|370|1|[uci_electricity](https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014)|[uci_electricity](https://archive.ics.uci.edu/ml/machine-learning-databases/00321/LD2011_2014.txt.zip)|
-|tsinghua_electricity|forecasting|26304|321|1|[tsinghua_electricity](https://cloud.tsinghua.edu.cn/d/e1ccfff39ad541908bae/?p=%2Felectricity&mode=list)|[tsinghua_electricity](https://cloud.tsinghua.edu.cn/d/e1ccfff39ad541908bae/?p=%2Felectricity&mode=list)|
-
-Specify the `name`, the raw data file will be saved in the specified `path` (defaults to ~/.chronos/dataset). `redownload` can help you re-download the files you need.
-
-When `with_split` is set to True, the length of the data set will be divided according to the specified `val_ratio` and `test_ratio`, and three `TSDataset` will be returned. `with_split` defaults to True, `val_ratio` and `test_ratio` defaults to **0.1**. If you need only one `TSDataset`, just specify `with_split` to False.
-About `TSDataset`, more details, please refer to [here](../../PythonAPI/Chronos/tsdataset.html).
-
-```python
-# load built-in dataset
-from bigdl.chronos.data import get_public_dataset
-from sklearn.preprocessing import StandardScaler
-tsdata_train, tsdata_val, \
-    tsdata_test = get_public_dataset(name='nyc_taxi',
-                                     with_split=True,
-                                     val_ratio=0.1,
-                                     test_ratio=0.1
-                                     )
-# carry out additional customized preprocessing on the dataset.
-stand = StandardScaler()
-for tsdata in [tsdata_train, tsdata_val, tsdata_test]:
-    tsdata.gen_dt_feature(one_hot_features=['HOUR'])\
-          .impute()\
-          .scale(stand, fit=tsdata is tsdata_train)
-```
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Overview/deep_dive.rst b/docs/readthedocs/source/doc/Chronos/Overview/deep_dive.rst
deleted file mode 100644
index 9828f800..00000000
--- a/docs/readthedocs/source/doc/Chronos/Overview/deep_dive.rst
+++ /dev/null
@@ -1,10 +0,0 @@
-Chronos Deep Dive
-=========
-
-* `Time Series Processing and Feature Engineering <data_processing_feature_engineering.html>`__ introduces how to load a built-in/customized dataset and carry out transformation and feature engineering on it.
-* `Time Series Forecasting <forecasting.html>`__ introduces how to build a time series forecasting application.
-* `Time Series Anomaly Detection <anomaly_detection.html>`__ introduces how to build a anomaly detection application.
-* `Generate Synthetic Sequential Data <simulation.html>`__ introduces how to build a series data generation application.
-* `Artificial Intelligence for IT operations (AIOps)`__ introduces how to build an AI system for AIOps use-cases.
-* `Speed up Chronos built-in/customized models <speed_up.html>`__ introduces how to speed up chronos built-in models/customized time-series models
-* `Useful Functionalities <useful_functionalities.html>`__ introduces some functionalities provided by Chronos that can help you improve accuracy/performance or scale the application to a larger data. 
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Overview/forecasting.md b/docs/readthedocs/source/doc/Chronos/Overview/forecasting.md
deleted file mode 100644
index 52c1cdca..00000000
--- a/docs/readthedocs/source/doc/Chronos/Overview/forecasting.md
+++ /dev/null
@@ -1,287 +0,0 @@
-# Time Series Forecasting
-
-_Chronos_ provides both deep learning/machine learning models and traditional statistical models for forecasting.
-
-There're three ways to do forecasting:
-- Use highly integrated [**AutoTS pipeline**](#use-autots-pipeline) with auto feature generation, data pre/post-processing, hyperparameter optimization.
-- Use [**auto forecasting models**](#use-auto-forecasting-model) with auto hyperparameter optimization.
-- Use [**standalone forecasters**](#use-standalone-forecaster-pipeline).
-
-Besides, _Chronos_ also provides **benchmark tool** to benchmark forecasting performance. For more information, please refer to [Use Chronos benchmark tool](https://bigdl.readthedocs.io/en/latest/doc/Chronos/Howto/how_to_use_benchmark_tool.html).
-
-#### 0. Supported Time Series Forecasting Model
-
-- `Model`: Model name.
-- `Style`: Forecasting model style. Detailed information will be stated in [this section](#time-series-forecasting-concepts).
-- `Multi-Variate`: Predict more than one variable at the same time?
-- `Multi-Step`: Predict more than one data point in the future?
-- `Exogenous Variables`: Take other variables(you don't need to predict) into consideration?
-- `Distributed`: Scale the model to a cluster and take data from distributed file system?
-- `ONNX`: Export and use `OnnxRuntime` to do the inference.
-- `Quantization`: Export and use quantized int8 model to do the inference.
-- `Auto Models`: AutoModel API support.
-- `AutoTS`: AutoTS API support.
-- `Backend`: The DL framework we use to implement this model.
-
-<span id="supported_forecasting_model"></span>
-
-| Model   | Style | Multi-Variate | Multi-Step | Exogenous Variables | Distributed | ONNX | Quantization | Auto Models | AutoTS | Backend |
-| ----------------- | ----- | ------------- | ---------- | ------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- |
-| LSTM    | RR    | ✅             | ❌      | ✅    | ✅   | ✅           | ✅        | ✅          | ✅         | pytorch/tf2  |
-| Seq2Seq     | RR    | ✅             | ✅     | ✅     | ✅     | ✅           | ❌      | ✅          | ✅         | pytorch/tf2  |
-| TCN | RR    | ✅             | ✅     | ✅     | ✅     | ✅           | ✅      | ✅          | ✅         | pytorch/tf2  |
-| Autoformer | RR    | ✅             | ✅     | ✅     | ❌     | ❌           | ❌      | ❌          | ❌         | pytorch  |
-| NBeats | RR    | ❌             | ✅     | ❌     | ✅     | ✅           | ✅      | ❌          | ❌         | pytorch  |
-| MTNet   | RR    | ✅             | ❌    | ✅     | ❌     | ❌          | ❌         | ❌          | ✳️\*\*        | tf2 |
-| TCMF    | TS    | ✅             | ✅    | ✅      | ✳️\*     | ❌          | ❌         | ❌          | ❌         | pytorch  |
-| Prophet | TS    | ❌             | ✅    | ❌      | ❌        | ❌          | ❌      | ✅          | ❌         | prophet  |
-| ARIMA   | TS    | ❌             | ✅    | ❌      | ❌         | ❌          | ❌     | ✅          | ❌         | pmdarima |
-| Customized\*\*\* | RR | Customized | Customized | Customized | ❌ |✅|❌|❌|✅|pytorch
-
-\* TCMF only partially supports distributed training.<br>
-\*\*  Auto tuning of MTNet is only supported in our deprecated AutoTS API.<br>
-\*\*\* Customized model is only supported in `AutoTSEstimator` with pytorch as backend.
-
-
-
-#### 1. Time Series Forecasting Concepts
-Time series forecasting is one of the most popular tasks on time series data. **In short, forecasing aims at predicting the future by using the knowledge you can learn from the history.**
-
-##### 1.1 Traditional Statistical(TS) Style
-Traditionally, Time series forecasting problem was formulated with rich mathematical fundamentals and statistical models. Typically, one model can only handle one time series and fit on the whole time series before the last observed timestamp and predict the next few steps. Training(fit) is needed every time you change the last observed timestamp.
-
-![](../Image/forecast-TS.png)
-
-##### 1.2 Regular Regression(RR) Style
-Recent years, common deep learning architectures (e.g. RNN, CNN, Transformer, etc.) are being successfully applied to forecasting problem. Forecasting is transformed to a supervised learning regression problem in this style. A model can predict several time series. Typically, a sampling process based on sliding-window is needed, some terminology is explained as following:
-
-- `lookback` / `past_seq_len`: the length of historical data along time. This number is tunable.
-- `horizon` / `future_seq_len`: the length of predicted data along time. This number is depended on the task definition. If this value larger than 1, then the forecasting task is *Multi-Step*.
-- `input_feature_num`: The number of variables the model can observe. This number is tunable since we can select a subset of extra feature to use.
-- `output_feature_num`: The number of variables the model to predict. This number is depended on the task definition. If this value larger than 1, then the forecasting task is *Multi-Variate*.
-
-<span id="RR-forecast-image"></span>
-![](../Image/forecast-RR.png)
-
-#### 2. Use AutoTS Pipeline
-For AutoTS Pipeline, we will leverage `AutoTSEstimator`, `TSPipeline` and preferably `TSDataset`. A typical usage of AutoTS pipeline basically contains 3 steps.
-1. Prepare a `TSDataset` or customized data creator.
-2. Init a `AutoTSEstimator` and call `.fit()` on the data.
-3. Use the returned `TSPipeline` for further development.
-```eval_rst
-.. warning::
-    ``AutoTSTrainer`` workflow has been deprecated, no feature updates or performance improvement will be carried out. Users of ``AutoTSTrainer`` may refer to `Chronos API doc <https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Chronos/autots.html>`_.
-```
-```eval_rst
-.. note::
-    ``AutoTSEstimator`` currently only support pytorch backend.
-```
-View [Quick Start](../QuickStart/chronos-autotsest-quickstart.html) for a more detailed example.
-
-##### 2.1 Prepare dataset
-`AutoTSEstimator` support 2 types of data input.
-
-You can easily prepare your data in `TSDataset` (recommended). You may refer to [here](#TSDataset) for the detailed information to prepare your `TSDataset` with proper data processing and feature generation. Here is a typical `TSDataset` preparation.
-```python
-from bigdl.chronos.data import TSDataset
-from sklearn.preprocessing import StandardScaler
-
-tsdata_train, tsdata_val, tsdata_test\
-    = TSDataset.from_pandas(df, dt_col="timestamp", target_col="value", with_split=True, val_ratio=0.1, test_ratio=0.1)
-
-standard_scaler = StandardScaler()
-for tsdata in [tsdata_train, tsdata_val, tsdata_test]:
-    tsdata.gen_dt_feature()\
-          .impute(mode="last")\
-          .scale(standard_scaler, fit=(tsdata is tsdata_train))
-```
-You can also create your own data creator. The data creator takes a dictionary config and returns a pytorch dataloader. Users may define their own customized key and add them to the search space. "batch_size" is the only fixed key.
-```python
-from torch.utils.data import DataLoader
-def training_data_creator(config):
-    return Dataloader(..., batch_size=config['batch_size'])
-```
-##### 2.2 Create an AutoTSEstimator
-`AutoTSEstimator` depends on the [Distributed Hyper-parameter Tuning](../../Orca/Overview/distributed-tuning.html) supported by Project Orca. It also provides time series only functionalities and optimization. Here is a typical initialization process.
-```python
-import bigdl.orca.automl.hp as hp
-from bigdl.chronos.autots import AutoTSEstimator
-auto_estimator = AutoTSEstimator(model='lstm',
-                                 search_space='normal',
-                                 past_seq_len=hp.randint(1, 10),
-                                 future_seq_len=1,
-                                 selected_features="auto")
-```
-We prebuild three defualt search space for each build-in model, which you can use the by setting `search_space` to "minimal"，"normal", or "large" or define your own search space in a dictionary. The larger the search space, the better accuracy you will get and the more time will be cost.
-
-`past_seq_len` can be set as a hp sample function, the proper range is highly related to your data. A range between 0.5 cycle and 2 cycle is reasonable. You may set it to `"auto"`, then a cycle length will be detected automatically and this parameter will be set to a random search between 0.5 cycle and 2 cycle length.
-
-`selected_features` is set to `"auto"` by default, where the `AutoTSEstimator` will find the best subset of extra features to help the forecasting task.
-
-##### 2.3 Fit on AutoTSEstimator
-Fitting on `AutoTSEstimator` is fairly easy. A `TSPipeline` will be returned once fitting is completed.
-```python
-ts_pipeline = auto_estimator.fit(data=tsdata_train,
-                                 validation_data=tsdata_val,
-                                 batch_size=hp.randint(32, 64),
-                                 epochs=5)
-```
-Detailed information and settings please refer to [AutoTSEstimator API doc](../../PythonAPI/Chronos/autotsestimator.html#id1).
-##### 2.4 Development on TSPipeline
-You may carry out predict, evaluate, incremental training or save/load for further development.
-```python
-# predict with the best trial
-y_pred = ts_pipeline.predict(tsdata_test)
-
-# evaluate the result pipeline
-mse, smape = ts_pipeline.evaluate(tsdata_test, metrics=["mse", "smape"])
-print("Evaluate: the mean square error is", mse)
-print("Evaluate: the smape value is", smape)
-
-# save the pipeline
-my_ppl_file_path = "/tmp/saved_pipeline"
-ts_pipeline.save(my_ppl_file_path)
-
-# restore the pipeline for further deployment
-from bigdl.chronos.autots import TSPipeline
-loaded_ppl = TSPipeline.load(my_ppl_file_path)
-```
-Detailed information please refer to [TSPipeline API doc](../../PythonAPI/Chronos/autotsestimator.html#tspipeline).
-
-```eval_rst
-.. note::
-    ``init_orca_context`` is not needed if you just use the trained TSPipeline for inference, evaluation or incremental fitting.
-```
-```eval_rst
-.. note::
-    Incremental fitting on TSPipeline just update the model weights the standard way, which does not involve AutoML.
-```
-
-#### 3. Use Standalone Forecaster Pipeline
-
-_Chronos_ provides a set of standalone time series forecasters without AutoML support, including deep learning models as well as traditional statistical models.
-
-View some examples notebooks for [Network Traffic Prediction][network_traffic]
-
-The common process of using a Forecaster looks like below.
-```python
-# set fixed hyperparameters, loss, metric...
-f = Forecaster(...)
-# input data, batch size, epoch...
-f.fit(...)
-# input test data x, batch size...
-f.predict(...)
-```
-The input data can be easily get from `TSDataset`.
-View [Quick Start](../QuickStart/chronos-tsdataset-forecaster-quickstart.md) for a more detailed example. Refer to [API docs](../../PythonAPI/Chronos/forecasters.html) of each Forecaster for detailed usage instructions and examples.
-
-<span id="LSTMForecaster"></span>
-##### 3.1 LSTMForecaster
-
-LSTMForecaster wraps a vanilla LSTM model, and is suitable for univariate time series forecasting.
-
-View Network Traffic Prediction [notebook][network_traffic_model_forecasting] and [LSTMForecaster API Doc](../../PythonAPI/Chronos/forecasters.html#lstmforecaster) for more details.
-
-<span id="Seq2SeqForecaster"></span>
-##### 3.2 Seq2SeqForecaster
-
-Seq2SeqForecaster wraps a sequence to sequence model based on LSTM, and is suitable for multivariant & multistep time series forecasting.
-
-View [Seq2SeqForecaster API Doc](../../PythonAPI/Chronos/forecasters.html#seq2seqforecaster) for more details.
-
-<span id="TCNForecaster"></span>
-##### 3.3 TCNForecaster
-
-Temporal Convolutional Networks (TCN) is a neural network that use convolutional architecture rather than recurrent networks. It supports multi-step and multi-variant cases. Causal Convolutions enables large scale parallel computing which makes TCN has less inference time than RNN based model such as LSTM.
-
-View Network Traffic multivariate multistep Prediction [notebook][network_traffic_multivariate_multistep_tcnforecaster] and [TCNForecaster API Doc](../../PythonAPI/Chronos/forecasters.html#tcnforecaster) for more details.
-
-<span id="MTNetForecaster"></span>
-##### 3.4 MTNetForecaster
-
-```eval_rst
-.. note::
-    **Additional Dependencies**:
-    You need to install ``bigdl-nano[tensorflow]`` to enable this built-in model.
-
-    ``pip install bigdl-nano[tensorflow]``
-```
-
-MTNetForecaster wraps a MTNet model. The model architecture mostly follows the [MTNet paper](https://arxiv.org/abs/1809.02105) with slight modifications, and is suitable for multivariate time series forecasting.
-
-View Network Traffic Prediction [notebook][network_traffic_model_forecasting] and [MTNetForecaster API Doc](../../PythonAPI/Chronos/forecasters.html#mtnetforecaster) for more details.
-
-<span id="TCMFForecaster"></span>
-##### 3.5 TCMFForecaster
-
-TCMFForecaster wraps a model architecture that follows implementation of the paper [DeepGLO paper](https://arxiv.org/abs/1905.03806) with slight modifications. It is especially suitable for extremely high dimensional (up-to millions) multivariate time series forecasting.
-
-View High-dimensional Electricity Data Forecasting [example][run_electricity] and [TCMFForecaster API Doc](../../PythonAPI/Chronos/forecasters.html#tcmfforecaster) for more details.
-
-<span id="ARIMAForecaster"></span>
-##### 3.6 ARIMAForecaster
-
-```eval_rst
-.. note::
-    **Additional Dependencies**:
-    You need to install ``pmdarima`` to enable this built-in model.
-
-    ``pip install pmdarima==1.8.5``
-```
-
-ARIMAForecaster wraps a ARIMA model and is suitable for univariate time series forecasting. It works best with data that show evidence of non-stationarity in the sense of mean (and an initial differencing step (corresponding to the "I, integrated" part of the model) can be applied one or more times to eliminate the non-stationarity of the mean function.
-
-View [ARIMAForecaster API Doc](../../PythonAPI/Chronos/forecasters.html#arimaforecaster) for more details.
-
-<span id="ProphetForecaster"></span>
-##### 3.7 ProphetForecaster
-
-```eval_rst
-.. note::
-    **Additional Dependencies**:
-    You need to install `prophet` to enable this built-in model.
-
-    ``pip install prophet==1.1.0``
-```
-
-```eval_rst
-.. note::
-    **Acceleration Note**:
-    Intel® Distribution for Python may improve the speed of prophet's training and inferencing. You may install it by refering to https://www.intel.com/content/www/us/en/developer/tools/oneapi/distribution-for-python.html.
-```
-
-ProphetForecaster wraps the Prophet model ([site](https://github.com/facebook/prophet)) which is an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects and is suitable for univariate time series forecasting. It works best with time series that have strong seasonal effects and several seasons of historical data and is robust to missing data and shifts in the trend, and typically handles outliers well.
-
-View Stock Prediction [notebook][stock_prediction_prophet] and [ProphetForecaster API Doc](../../PythonAPI/Chronos/forecasters.html#prophetforecaster) for more details.
-
-<span id="NBeatsForecaster"></span>
-##### 3.8 NBeatsForecaster
-
-Neural basis expansion analysis for interpretable time series forecasting ([N-BEATS](https://arxiv.org/abs/1905.10437)) is a deep neural architecture based on backward and forward residual links and a very deep stack of fully-connected layers. Nbeats can solve univariate time series point forecasting problems, being interpretable, and fast to train.
-
-[NBeatsForecaster API Doc](../../PythonAPI/Chronos/forecasters.html#nbeatsforecaster) for more details.
-
-<span id="AutoformerForecaster"></span>
-##### 3.9 AutoformerForecaster
-
-Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting ([Autoformer](https://arxiv.org/abs/2106.13008)) is a Transformer based neural network that could reach SOTA results on many datasets.
-
-[AutoformerForecaster API Doc](../../PythonAPI/Chronos/forecasters.html#autoformerforecaster) for more details.
-
-#### 4. Use Auto forecasting model
-Auto forecasting models are designed to be used exactly the same as Forecasters. The only difference is that you can set hp search function to the hyperparameters and the `.fit()` method will search the best hyperparameter setting.
-```python
-# set hyperparameters in hp search function, loss, metric...
-auto_model = AutoModel(...)
-# input data, batch size, epoch...
-auto_model.fit(...)
-# input test data x, batch size...
-auto_model.predict(...)
-```
-The input data can be easily get from `TSDataset`. Users can refer to detailed [API doc](../../PythonAPI/Chronos/automodels.html).
-
-[network_traffic]:<https://github.com/intel-analytics/BigDL/tree/main/python/chronos/use-case/network_traffic>
-[network_traffic_model_forecasting]:<https://github.com/intel-analytics/BigDL/blob/main/python/chronos/use-case/network_traffic/network_traffic_model_forecasting.ipynb>
-[network_traffic_multivariate_multistep_tcnforecaster]:<https://github.com/intel-analytics/BigDL/blob/main/python/chronos/use-case/network_traffic/network_traffic_multivariate_multistep_tcnforecaster.ipynb>
-[run_electricity]:<https://github.com/intel-analytics/BigDL/blob/main/python/chronos/example/tcmf/run_electricity.py>
-[stock_prediction_prophet]:<https://github.com/intel-analytics/BigDL/blob/main/python/chronos/use-case/fsi/stock_prediction_prophet.ipynb>
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Overview/install.md b/docs/readthedocs/source/doc/Chronos/Overview/install.md
deleted file mode 100644
index 29ee4860..00000000
--- a/docs/readthedocs/source/doc/Chronos/Overview/install.md
+++ /dev/null
@@ -1,151 +0,0 @@
-# Chronos Installation
-
----
-
-#### OS and Python version requirement
-
-
-```eval_rst
-.. note::
-
-    **Supported OS**:
-
-    Chronos is thoroughly tested on Ubuntu (16.04/18.04/20.04), and should works fine on CentOS. If you are a Windows user, there are 2 ways to use Chronos:
-     
-    1. You could use Chronos on a windows laptop with WSL2 (you may refer to `here <https://docs.microsoft.com/en-us/windows/wsl/setup/environment>`_) or just install a ubuntu virtual machine.
-
-    2. You could use Chronos on native Windows, but some features are unavailable in this case, the limitations will be shown below.
-```
-```eval_rst
-.. note::
-
-    **Supported Python Version**:
-
-    Chronos supports all installation options on Python 3.7 ~ 3.9. For details about different installation options, refer to `here <#install-using-conda>`_.
-```
-
-
-
-#### Install using Conda
-
-We recommend using conda to manage the Chronos python environment. For more information about Conda, refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
-Select your preferences in the panel below to find the proper install command. Then run the install command as the example shown below.
-
-
-```eval_rst
-.. raw:: html
-
-    <link rel="stylesheet" type="text/css" href="../../../_static/css/installation_panel.css" />
-
-    <div class="installation-panel-wrapper">
-
-        <table class="installation-panel-table">
-            <tbody>
-                <tr>
-                    <td colspan="1">Functionality</td>
-                    <td colspan="2"><button id="Forecasting">Forecasting</button></td>
-                    <td colspan="2"><button id="Anomaly" class="fitting-cell">Anomaly Detection</button></td>
-                    <td colspan="2"><button id="Simulation">Simulation</button></td>
-                </tr>
-                <tr id="model">
-                    <td colspan="1">Model</td>
-                    <td colspan="2"><button id="Deep_learning_models">Deep learning</button></td>
-                    <td colspan="2"><button id="Prophet">Prophet</button></td>
-                    <td colspan="2"><button id="ARIMA">ARIMA</button></td>
-                </tr>
-                <tr>
-                    <td colspan="1">DL framework</td>
-                    <td colspan="3"><button id="pytorch"
-                            title="Use PyTorch as deep learning models' backend. Most of the model support and works better under PyTorch.">PyTorch (Recommended)</button>
-                    </td>
-                    <td colspan="3"><button id="tensorflow"
-                            title="Use Tensorflow as deep learning models' backend.">TensorFlow</button></td>
-                </tr>
-                <tr>
-                    <td colspan="1">OS</td>
-                    <td colspan="3"><button id="linux" title="Ubuntu/CentOS is recommended">Linux</button></td>
-                    <td colspan="3"><button id="win" title="WSL is needed for Windows users">Windows</button></td>
-                </tr>
-
-                <tr>
-                    <td colspan="1">Auto Tuning</td>
-                    <td colspan="3" title="I don't need any hyperparameter auto tuning feature."><button
-                            id="automlno">No</button></td>
-                    <td colspan="3" title="I need chronos to help me tune the hyperparameters."><button
-                            id="automlyes">Yes</button></td>
-                </tr>
-
-                <tr>
-                    <td colspan="1">Inference Opt</td>
-                    <td colspan="3" title="No need for low-latency inference models"><button id="inferenceno">No</button></td>
-                    <td colspan="3" title="Get low-latency inference models with onnx\openvino\inc"><button id="inferenceyes">Yes</button></td>
-                </tr>
-
-                <tr>
-                    <td colspan="1">Hardware</td>
-                    <td colspan="3"><button id="singlenode" title="For users use laptop/single node server.">Single
-                            node</button></td>
-                    <td colspan="3"><button id="cluster" title="For users use K8S/Yarn Cluster.">Cluster</button></td>
-                </tr>
-
-                <tr>
-                    <td colspan="1">Package</td>
-                    <td colspan="3"><button id="pypi" title="For users use pip to install chronos.">Pip</button></td>
-                    <td colspan="3"><button id="docker" title="For users use docker image.">Docker</button></td>
-                </tr>
-
-                <tr>
-                    <td colspan="1">Version</td>
-                    <td colspan="3"><button id="nightly"
-                            title="For users would like to try chronos's latest feature">Nightly</button></td>
-                    <td colspan="3"><button id="stable"
-                            title="For users would like to deploy chronos in their production">Stable</button></td>
-                </tr>
-
-                <tr>
-                    <td colspan="1">Install CMD</td>
-                    <td colspan="6" id="cmd">NA</td>
-                </tr>
-            </tbody>
-        </table>
-    </div>
-
-    <script src="../../../_static/js/chronos_installation_panel.js"></script>
-```
-
-</br>
-
-
-```bash
-# create a conda environment for chronos
-conda create -n my_env python=3.8 setuptools=58.0.4
-conda activate my_env
-
-# select your preference in above panel to find the proper command to replace the below command, e.g.
-pip install --pre --upgrade bigdl-chronos[pytorch]
-
-# init bigdl-nano to enable local accelerations
-source bigdl-nano-init  # accelerate the conda env
-```
-
-##### Install Chronos on native Windows
-
-Chronos can be simply installed using pip on native Windows, you could use the same command as Linux to install, but unfortunately, some features are unavailable now:
-
-1. `bigdl-chronos[distributed]` is not supported.
-
-2. `intel_extension_for_pytorch (ipex)` is unavailable for Windows now, so the related feature is not supported.
-
-For some known issues when installing and using Chronos on native Windows, you could refer to [windows_guide](https://bigdl.readthedocs.io/en/latest/doc/Chronos/Howto/windows_guide.html).
-
-##### Install Chronos along with specific Tensorflow
-
-Currently, the default Tensorflow version of Chronos is 2.7. But Chronos is also validated on Tensorflow 2.8-2.12. If you want to use specific Tensorflow, please follow the table below to find the extra install command after installing Chronos.
-
-| TF version       | Install CMD                                                                 |
-| ---------------- | --------------------------------------------------------------------------- |
-| **2.8**          | pip install tensorflow==2.8.0 intel-tensorflow==2.8.0                       |
-| **2.9**          | pip install tensorflow==2.9.0 intel-tensorflow==2.9.1                       |
-| **2.10**         | pip install tensorflow==2.10.0 intel-tensorflow==2.10.0                     |
-| **2.11**         | pip install tensorflow==2.11.0 intel-tensorflow==2.11.0                     |
-| **2.12**         | pip install tensorflow==2.12.0 intel-tensorflow==2.12.0 protobuf==3.20.3    |
diff --git a/docs/readthedocs/source/doc/Chronos/Overview/quick-tour.rst b/docs/readthedocs/source/doc/Chronos/Overview/quick-tour.rst
deleted file mode 100644
index 58089e6d..00000000
--- a/docs/readthedocs/source/doc/Chronos/Overview/quick-tour.rst
+++ /dev/null
@@ -1,289 +0,0 @@
-Chronos Quick Tour
-=================================
-Welcome to Chronos for building a fast, accurate and scalable time series analysis application🎉! Start with our quick tour to understand some critical concepts and how to use them to tackle your tasks.
-
-.. grid:: 1 1 1 1
-
-    .. grid-item-card::
-        :text-align: center
-
-        **Data processing**
-        ^^^
-        Time series data processing includes imputing, deduplicating, resampling, scale/unscale, roll sampling, etc to process raw time series data(typically in a table) to a format that is understandable to the models. ``TSDataset`` and ``XShardsTSDataset`` are provided for an abstraction.
-        +++
-        .. button-ref:: TSDataset/XShardsTSDataset
-            :color: primary
-            :expand:
-            :outline:
-
-            Get Started
-
-.. grid:: 1 3 3 3
-    :gutter: 2
-
-    .. grid-item-card::
-        :text-align: center
-        :class-card: sd-mb-2
-
-        **Forecasting**
-        ^^^
-        Time series forecasting uses history data to predict future data. ``Forecaster`` and ``AutoTSEstimator`` are provided for built-in algorithms and distributed hyperparameter tunning.
-        +++
-        .. button-ref:: Forecaster
-            :color: primary
-            :expand:
-            :outline:
-
-            Get Started
-
-    .. grid-item-card::
-        :text-align: center
-        :class-card: sd-mb-2
-
-        **Anomaly Detection**
-        ^^^
-        Time series anomaly detection finds the anomaly point in time series. ``Detector`` is provided for many built-in algorithms.
-        +++
-        .. button-ref:: Detector
-            :color: primary
-            :expand:
-            :outline:
-
-            Get Started
-
-    .. grid-item-card::
-        :text-align: center
-        :class-card: sd-mb-2
-
-        **Simulation**
-        ^^^
-        Time series simulation generates synthetic time series data. ``Simulator`` is provided for many built-in algorithms.
-        +++
-        .. button-ref:: Simulator(experimental)
-            :color: primary
-            :expand:
-            :outline:
-
-            Get Started
-
-
-TSDataset/XShardsTSDataset
----------------------
-
-In Chronos, we provide a ``TSDataset`` (and a ``XShardsTSDataset`` to handle large data input in distributed fashion) abstraction to represent a time series dataset. It is responsible for preprocessing raw time series data(typically in a table) to a format that is understandable to the models. Many typical transformation, preprocessing and feature engineering method can be called cascadely on ``TSDataset`` or ``XShardsTSDataset``.
-
-.. code-block:: python
-
-    # !wget https://raw.githubusercontent.com/numenta/NAB/v1.0/data/realKnownCause/nyc_taxi.csv
-    import pandas as pd
-    from sklearn.preprocessing import StandardScaler
-    from bigdl.chronos.data import TSDataset
-
-    df = pd.read_csv("nyc_taxi.csv", parse_dates=["timestamp"])
-    tsdata = TSDataset.from_pandas(df,
-                                dt_col="timestamp",
-                                target_col="value")
-    scaler = StandardScaler()
-    tsdata.deduplicate()\
-        .impute()\
-        .gen_dt_feature()\
-        .scale(scaler)\
-        .roll(lookback=100, horizon=1)
-
-
-.. grid:: 2
-    :gutter: 2
-
-    .. grid-item-card::
-
-        .. button-ref:: ./data_processing_feature_engineering
-            :color: primary
-            :expand:
-            :outline:
-
-            Tutorial
-
-    .. grid-item-card::
-
-        .. button-ref:: ../../PythonAPI/Chronos/tsdataset
-            :color: primary
-            :expand:
-            :outline:
-
-            API Document
-
-Forecaster
------------------------
-We have implemented quite a few algorithms among traditional statistics to deep learning for time series forecasting in ``bigdl.chronos.forecaster`` package. Users may train these forecasters on history time series and use them to predict future time series.
-
-To import a specific forecaster, you may use {algorithm name} + "Forecaster", and call ``fit`` to train the forecaster and ``predict`` to predict future data.
-
-.. code-block:: python
-
-    from bigdl.chronos.forecaster import TCNForecaster  # TCN is algorithm name
-    from bigdl.chronos.data import get_public_dataset
-
-    if __name__ == "__main__":
-        # use nyc_taxi public dataset
-        train_data, _, test_data = get_public_dataset("nyc_taxi")
-        for data in [train_data, test_data]:
-            # use 100 data point in history to predict 1 data point in future
-            data.roll(lookback=100, horizon=1)
-
-        # create a forecaster
-        forecaster = TCNForecaster.from_tsdataset(train_data)
-
-        # train the forecaster
-        forecaster.fit(train_data)
-
-        # predict with the trained forecaster
-        pred = forecaster.predict(test_data)
-
-
-AutoTSEstimator
----------------------------
-For time series forecasting, we also provide an ``AutoTSEstimator`` for distributed hyperparameter tunning as an extention to ``Forecaster``. Users only need to create a ``AutoTSEstimator`` and call ``fit`` to train the estimator. A ``TSPipeline`` will be returned for users to predict future data.
-
-.. code-block:: python
-
-    from bigdl.orca.automl import hp
-    from bigdl.chronos.data import get_public_dataset
-    from bigdl.chronos.autots import AutoTSEstimator
-    from bigdl.orca import init_orca_context, stop_orca_context
-    from sklearn.preprocessing import StandardScaler
-
-    if __name__ == "__main__":
-        # initial orca context
-        init_orca_context(cluster_mode="local", cores=4, memory="8g", init_ray_on_spark=True)
-
-        # load dataset
-        tsdata_train, tsdata_val, tsdata_test = get_public_dataset(name='nyc_taxi')
-
-        # dataset preprocessing
-        stand = StandardScaler()
-        for tsdata in [tsdata_train, tsdata_val, tsdata_test]:
-            tsdata.gen_dt_feature().impute()\
-                .scale(stand, fit=tsdata is tsdata_train)
-
-        # AutoTSEstimator initalization
-        autotsest = AutoTSEstimator(model="tcn",
-                                    future_seq_len=10)
-
-        # AutoTSEstimator fitting
-        tsppl = autotsest.fit(data=tsdata_train,
-                            validation_data=tsdata_val)
-
-        # Prediction
-        pred = tsppl.predict(tsdata_test)
-
-        # stop orca context
-        stop_orca_context()
-
-.. grid:: 3
-    :gutter: 2
-
-    .. grid-item-card::
-
-        .. button-ref:: ../QuickStart/chronos-tsdataset-forecaster-quickstart
-            :color: primary
-            :expand:
-            :outline:
-
-            Quick Start
-
-    .. grid-item-card::
-
-        .. button-ref:: ./forecasting
-            :color: primary
-            :expand:
-            :outline:
-
-            Tutorial
-
-    .. grid-item-card::
-
-        .. button-ref:: ../../PythonAPI/Chronos/forecasters
-            :color: primary
-            :expand:
-            :outline:
-
-            API Document
-
-Detector
---------------------
-We have implemented quite a few algorithms among traditional statistics to deep learning for time series anomaly detection in ``bigdl.chronos.detector.anomaly`` package.
-
-To import a specific detector, you may use {algorithm name} + "Detector", and call ``fit`` to train the detector and ``anomaly_indexes`` to get anomaly data points' indexs.
-
-.. code-block:: python
-
-    from bigdl.chronos.detector.anomaly import DBScanDetector  # DBScan is algorithm name
-    from bigdl.chronos.data import get_public_dataset
-
-    if __name__ == "__main__":
-        # use nyc_taxi public dataset
-        train_data = get_public_dataset("nyc_taxi", with_split=False)
-
-        # create a detector
-        detector = DBScanDetector()
-
-        # fit a detector
-        detector.fit(train_data.to_pandas()['value'].to_numpy())
-
-        # find the anomaly points
-        anomaly_indexes = detector.anomaly_indexes()
-
-.. grid:: 3
-    :gutter: 2
-
-    .. grid-item-card::
-
-        .. button-ref:: ../QuickStart/chronos-anomaly-detector
-            :color: primary
-            :expand:
-            :outline:
-
-            Quick Start
-
-    .. grid-item-card::
-
-        .. button-ref:: ./anomaly_detection
-            :color: primary
-            :expand:
-            :outline:
-
-            Tutorial
-
-    .. grid-item-card::
-
-        .. button-ref:: ../../PythonAPI/Chronos/anomaly_detectors
-            :color: primary
-            :expand:
-            :outline:
-
-            API Document
-
-Simulator(experimental)
----------------------
-Simulator is still under activate development with unstable API.
-
-.. grid:: 2
-    :gutter: 2
-
-    .. grid-item-card::
-
-        .. button-ref:: ./simulation
-            :color: primary
-            :expand:
-            :outline:
-
-            Tutorial
-
-    .. grid-item-card::
-
-        .. button-ref:: ../../PythonAPI/Chronos/simulator
-            :color: primary
-            :expand:
-            :outline:
-
-            API Document
diff --git a/docs/readthedocs/source/doc/Chronos/Overview/simulation.md b/docs/readthedocs/source/doc/Chronos/Overview/simulation.md
deleted file mode 100644
index 6d2488e4..00000000
--- a/docs/readthedocs/source/doc/Chronos/Overview/simulation.md
+++ /dev/null
@@ -1,18 +0,0 @@
-# Synthetic Data Generation
-
-Chronos provides simulators to generate synthetic time series data for users who want to conquer limited data access in a deep learning/machine learning project or only want to generate some synthetic data to play with.
-
-```eval_rst
-.. note::
-     ``DPGANSimulator`` is the only simulator chronos provides at the moment, more simulators are on their way.
-```
-
-## 1. DPGANSimulator
-`DPGANSimulator` adopt DoppelGANger raised in [Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions](http://arxiv.org/abs/1909.13403). The method is data-driven unsupervised method based on deep learning model with GAN (Generative Adversarial Networks) structure. The model features a pair of separate attribute generator and feature generator and their corresponding discriminators `DPGANSimulator` also supports a rich and comprehensive input data (training data) format and outperform other algorithms in many evaluation metrics.
-
-```eval_rst
-.. note::
-     We reimplement this model by pytorch(original implementation was based on tf1) for better performance(both speed and memory).
-```
-
-Users may refer to detailed [API doc](../../PythonAPI/Chronos/simulator.html#module-bigdl.chronos.simulator.doppelganger_simulator).
diff --git a/docs/readthedocs/source/doc/Chronos/Overview/speed_up.md b/docs/readthedocs/source/doc/Chronos/Overview/speed_up.md
deleted file mode 100644
index 6eada0c4..00000000
--- a/docs/readthedocs/source/doc/Chronos/Overview/speed_up.md
+++ /dev/null
@@ -1,143 +0,0 @@
-# Accelerated Training and Inference
-
-Chronos provides transparent acceleration for Chronos built-in models and customized time-series models. In this deep-dive page, we will introduce how to enable/disable them.
-
-We will focus on **single node acceleration for forecasting models' training and inferencing** in this page. Other topic such as:
-
-- Distributed time series data processing - [XShardsTSDataset (based on Spark, powered by `bigdl.orca.data`)](./useful_functionalities.html#xshardstsdataset)
-- Distributed training on a cluster - [Distributed training (based on Ray/Spark/Horovod, powered by `bigdl.orca.learn`)](./useful_functionalities.html#distributed-training)
-- Non-forecasting models / non-deep-learning models - [Prophet with intel python](./forecasting.html#prophetforecaster), [DBScan Detector with intel Sklearn](./anomaly_detection.html#dbscandetector), [DPGANSimulator pytorch implementation](./simulation.html#dpgansimulator).
-
-You may refer to other pages listed above.
-
-### 1. Overview
-Time series model, especially those deep learning models, often suffers slow training speed and unsatisfying inference speed. Chronos is adapted to integrate many optimized library and best known methods(BKMs) for performance improvement on built-in models and customized models.
-
-### 2. Training Acceleration
-Training Acceleration is transparent in Chronos's API. Transparentness means that Chronos users will enjoy the acceleration without changing their code(unless some expert users want to set some advanced settings).
-```eval_rst
-.. note::
-    **Write your script under** ``if __name__=="__main__":``:
-
-     Chronos will automatically utilize the computation resources on the hardware. This may include multi-process training on a single node. Use this header will prevent many strange behavior.
-```
-#### 2.1 `Forecaster` Training Acceleration
-Currently, transparent acceleration for `LSTMForecaster`, `Seq2SeqForecaster`, `TCNForecaster` and `NBeatsForecaster` is **automatically enabled** and tested. Chronos will set various environment variables and config multi-processing training according to the hardware paremeters(e.g. cores number, ...).
-
-Currently, this function is under active development and **some expert users may want to change some config or disable some acceleration tricks**. Here are some instructions.
-
-Users may unset the environment by:
-```bash
-source bigdl-nano-unset-env
-```
-Users may set the the number of process to use in training by:
-```python
-print(forecaster.num_processes)  # num_processes is automatically optimized by Chronos
-forecaster.num_processes = 1  # disable multi-processing training
-forecaster.num_processes = 10  # You may set it to any number you want
-```
-Users may set the IPEX(Intel Pytorch Extension) availbility to use in training by:
-```python
-print(forecaster.use_ipex)  # use_ipex is automatically optimized by Chronos
-forecaster.use_ipex = True  # enable ipex during training
-forecaster.use_ipex = False  # disable ipex during training
-```
-
-#### 2.2 Customized Model Training Acceleration
-We provide an optimized pytorch-lightning Trainer, `TSTrainer`, to accelerate customized time series model defined by pytorch. A typical use-case can be using `pytorch-forecasting`'s built-in models(they are defined in pytorch-lightning LightningModule) and Chronos `TSTrainer` to accelerate the training process.
-
-`TSTrainer` requires very few code changes to your original code. Here is a quick guide:
-```python
-# from pytorch-lightning import Trainer
-from bigdl.chronos.pytorch import TSTrainer as Trainer
-
-trainer = Trainer(...
-                  # set number of processes for training
-                  num_processes=8,
-                  # disable GPU training, TSTrainer currently only available for CPU
-                  gpus=0,
-                  ...)
-```
-
-We have examples adapted from `pytorch-forecasting`'s examples to show the significant speed-up by using `TSTrainer` in our [use-case](https://github.com/intel-analytics/BigDL/tree/main/python/chronos/use-case/pytorch-forecasting).
-
-#### 2.3 Auto Tuning Acceleration
-We are working on the acceleration of `AutoModel` and `AutoTSEstimator`. Please unset the environment by:
-```bash
-source bigdl-nano-unset-env
-```
-
-### 3. Inference Acceleration
-Inference has become a critical part for time series model's performance. This may be divided to two parts:
-- Throughput: how many samples can be predicted in a certain amount of time.
-- Latency: how much time is used to predict 1 sample.
-
-Typically, throughput and latency is a trade-off pair. We have three optimization options for inferencing in Chronos.
-- **Default**: Generally useful for both throughput and latency.
-- **ONNX Runtime**: Users may export their trained(w/wo auto tuning) model to ONNX file and deploy it on other service. Chronos also provides an internal onnxruntime inference support for those users who pursue low latency and higher throughput during inference on a single node.
-- **Quantization**: Quantization refers to processes that enable lower precision inference. In Chronos, post-training quantization is supported relied on [Intel® Neural Compressor](https://intel.github.io/neural-compressor/README.html).
-```eval_rst
-.. note::
-    **Additional Dependencies**:
-
-    You need to install ``neural-compressor`` to enable quantization related methods.
-
-    ``pip install neural-compressor==1.8.1``
-```
-#### 3.1 `Forecaster` Inference Acceleration
-##### 3.1.1 Default Acceleration
-Nothing needs to be done. Chronos has deployed accleration for inferencing. **some expert users may want to change some config or disable some acceleration tricks**. Here are some instructions:
-
-Users may unset the environment by:
-```bash
-source bigdl-nano-unset-env
-```
-##### 3.1.2 ONNX Runtime
-LSTM, TCN, Seq2seq and NBeats has supported onnx in their forecasters. When users use these built-in models, they may call `predict_with_onnx`/`evaluate_with_onnx` for prediction or evaluation. They may also call `export_onnx_file` to export the onnx model file and `build_onnx` to change the onnxruntime's setting(not necessary).
-```python
-f = Forecaster(...)
-f.fit(...)
-f.predict_with_onnx(...)
-```
-##### 3.1.3 Quantization
-LSTM, TCN and NBeats has supported quantization in their forecasters.
-```python
-# init
-f = Forecaster(...)
-
-# train the forecaster
-f.fit(train_data, ...)
-
-# quantize the forecaster
-f.quantize(train_data, ..., framework=...)
-
-# predict with int8 model with better inference throughput
-f.predict/predict_with_onnx(test_data, quantize=True)
-
-# predict with fp32
-f.predict/predict_with_onnx(test_data, quantize=False)
-
-# save
-f.save(checkpoint_file="fp32.model"
-       quantize_checkpoint_file="int8.model")
-
-# load
-f.load(checkpoint_file="fp32.model"
-       quantize_checkpoint_file="int8.model")
-```
-Please refer to [Forecaster API Docs](../../PythonAPI/Chronos/forecasters.html) for details.
-
-#### 3.2 `TSPipeline` Inference Acceleration
-Basically same to [`Forecaster`](#31-forecaster-inference-acceleration)
-##### 3.2.1 Default Acceleration
-Basically same to [`Forecaster`](#31-forecaster-inference-acceleration)
-##### 3.2.2 ONNX Runtime
-```python
-tsppl.predict_with_onnx(...)
-```
-##### 3.2.3 Quantization
-```python
-tsppl.quantize(...)
-tsppl.predict/predict_with_onnx(test_data, quantize=True/False)
-```
-Please refer to [TSPipeline API doc](../../PythonAPI/Chronos/autotsestimator.html#tspipeline) for details.
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/Overview/useful_functionalities.md b/docs/readthedocs/source/doc/Chronos/Overview/useful_functionalities.md
deleted file mode 100644
index 64c270b9..00000000
--- a/docs/readthedocs/source/doc/Chronos/Overview/useful_functionalities.md
+++ /dev/null
@@ -1,33 +0,0 @@
-# Distributed Processing
-
-
-#### Distributed training
-LSTM, TCN and Seq2seq users can easily train their forecasters in a distributed fashion to **handle extra large dataset and utilize a cluster**. The functionality is powered by Project Orca.
-```python
-f = Forecaster(..., distributed=True)
-f.fit(...)
-f.predict(...)
-f.to_local()  # collect the forecaster to single node
-f.predict_with_onnx(...)  # onnxruntime only supports single node
-```
-#### Distributed Data processing: XShardsTSDataset
-```eval_rst
-.. warning::
-    ``XShardsTSDataset`` is still experimental.
-```
-`TSDataset` is a single thread lib with reasonable speed on large datasets(~10G). When you handle an extra large dataset or limited memory on a single node, `XShardsTSDataset` can be involved to handle the exact same functionality and usage as `TSDataset` in a distributed fashion.
-
-```python
-# a fully distributed forecaster pipeline
-from orca.data.pandas import read_csv
-from bigdl.chronos.data.experimental import XShardsTSDataset
-
-shards = read_csv("hdfs://...")
-tsdata, _, test_tsdata = XShardsTSDataset.from_xshards(...)
-tsdata_xshards = tsdata.roll(...).to_xshards()
-test_tsdata_xshards = test_tsdata.roll(...).to_xshards()
-
-f = Forecaster(..., distributed=True)
-f.fit(tsdata_xshards, ...)
-f.predict(test_tsdata_xshards, ...)
-```
diff --git a/docs/readthedocs/source/doc/Chronos/Overview/visualization.md b/docs/readthedocs/source/doc/Chronos/Overview/visualization.md
deleted file mode 100644
index 146870c9..00000000
--- a/docs/readthedocs/source/doc/Chronos/Overview/visualization.md
+++ /dev/null
@@ -1,49 +0,0 @@
-# AutoML Visualization
-
-AutoML visualization provides two kinds of visualization. You may use them while fitting on auto models or AutoTS pipeline.
-* During the searching process, the visualizations of each trail are shown and updated every 30 seconds. (Monitor view)
-* After the searching process, a leaderboard of each trail's configs and metrics is shown. (Leaderboard view)
-
-**Note**: AutoML visualization is based on tensorboard and tensorboardx. They should be installed properly before the training starts.
-
-<span id="monitor_view">**Monitor view**</span>
-
-Before training, start the tensorboard server through
-
-```python
-tensorboard --logdir=<logs_dir>/<name>
-```
-
-`logs_dir` is the log directory you set for your predictor(e.g. `AutoTSEstimator`, `AutoTCN`, etc.). `name ` is the name parameter you set for your predictor.
-
-The data in SCALARS tag will be updated every 30 seconds for users to see the training progress.
-
-![](../Image/automl_monitor.png)
-
-After training, start the tensorboard server through
-
-```python
-tensorboard --logdir=<logs_dir>/<name>_leaderboard/
-```
-
-where `logs_dir` and `name` are the same as stated in [Monitor view](#monitor_view).
-
-A dashboard of each trail's configs and metrics is shown in the SCALARS tag.
-
-![](../Image/automl_scalars.png)
-
-A leaderboard of each trail's configs and metrics is shown in the HPARAMS tag.
-
-![](../Image/automl_hparams.png)
-
-**Use visualization in Jupyter Notebook**
-
-You can enable a tensorboard view in jupyter notebook by the following code.
-
-```python
-%load_ext tensorboard
-# for scalar view
-%tensorboard --logdir <logs_dir>/<name>/
-# for leaderboard view
-%tensorboard --logdir <logs_dir>/<name>_leaderboard/
-```
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/QuickStart/chronos-anomaly-detector.md b/docs/readthedocs/source/doc/Chronos/QuickStart/chronos-anomaly-detector.md
deleted file mode 100644
index 413a5cda..00000000
--- a/docs/readthedocs/source/doc/Chronos/QuickStart/chronos-anomaly-detector.md
+++ /dev/null
@@ -1,50 +0,0 @@
-# Detect Anomaly Point in Real Time Traffic Data
-
----
-
-![](../../../../image/colab_logo_32px.png)[Run in Google Colab][chronos_minn_traffic_anomaly_detector_colab] &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub][chronos_minn_traffic_anomaly_detector]
-
----
-
-**In this guide we will demonstrate how to use _Chronos Anomaly Detector_ for time seires anomaly detection in 3 simple steps.**
-
-### Step 0: Prepare Environment
-
-We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../Overview/chronos.html#install) for more details.
-
-```bash
-conda create -n my_env python=3.7 # "my_env" is conda environment name, you can use any name you like.
-conda activate my_env
-pip install bigdl-chronos
-```
-
-## Step 1: Prepare dataset
-For demonstration, we use the publicly available real time traffic data from the Twin Cities Metro area in Minnesota, collected by the Minnesota Department of Transportation. The detailed information can be found [here](https://github.com/numenta/NAB/blob/master/data/realTraffic/speed_7578.csv)
-
-Now we need to do data cleaning and preprocessing on the raw data. Note that this part could vary for different dataset. 
-For the machine_usage data, the pre-processing contains 2 parts: <br>
-1. Change the time interval from irregular to 5 minutes.<br>
-2. Check missing values and handle missing data.
-
-```python
-from bigdl.chronos.data import TSDataset
-
-tsdata = TSDataset.from_pandas(df, dt_col="timestamp", target_col="value")
-df = tsdata.resample("5min")\
-           .impute(mode="linear")\
-           .to_pandas()
-```
-
-## Step 2: Use Chronos Anomaly Detector
-Chronos provides many anomaly detector for anomaly detection, here we use DBScan as an example. More anomaly detector can be found [here](../../PythonAPI/Chronos/anomaly_detectors.html).
-
-```python
-from bigdl.chronos.detector.anomaly import DBScanDetector
-
-ad = DBScanDetector(eps=0.3, min_samples=6)
-ad.fit(df['value'].to_numpy())
-anomaly_indexes = ad.anomaly_indexes()
-```
-
-[chronos_minn_traffic_anomaly_detector_colab]: <https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/chronos/colab-notebook/chronos_minn_traffic_anomaly_detector.ipynb>
-[chronos_minn_traffic_anomaly_detector]: <https://github.com/intel-analytics/BigDL/blob/main/python/chronos/colab-notebook/chronos_minn_traffic_anomaly_detector.ipynb>
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/QuickStart/chronos-autotsest-quickstart.md b/docs/readthedocs/source/doc/Chronos/QuickStart/chronos-autotsest-quickstart.md
deleted file mode 100644
index a8a8766b..00000000
--- a/docs/readthedocs/source/doc/Chronos/QuickStart/chronos-autotsest-quickstart.md
+++ /dev/null
@@ -1,119 +0,0 @@
-# Tune a Forecasting Task Automatically
-
----
-
-![](../../../../image/colab_logo_32px.png)[Run in Google Colab][chronos_autots_nyc_taxi_colab] &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub][chronos_autots_nyc_taxi]
-
----
-
-**In this guide we will demonstrate how to use _Chronos AutoTSEstimator_ and _Chronos TSPipeline_ to auto tune a time seires forecasting task and handle the whole model development process easily.**
-
-### Introduction
-
-Chronos provides `AutoTSEstimator` as a highly integrated solution for time series forecasting task with hyperparameter autotuning, auto feature selection and auto preprocessing. Users can prepare a `TSDataset`(recommended, used in this notebook) or their own data creator as input data. By constructing a `AutoTSEstimator` and calling `fit` on the data, a `TSPipeline` contains the best model and pre/post data processing will be returned for further development of deployment.
-
-`AutoTSEstimator` only support LSTM, TCN, and Seq2seq built-in models and 3rd party models for now.
-
-### Step 0: Prepare Environment
-
-We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../Overview/chronos.html#install) for more details.
-
-```bash
-conda create -n my_env python=3.7
-conda activate my_env
-pip install --pre --upgrade bigdl-chronos[all]
-```
-
-### Step 1: Init Orca Context
-```python
-if args.cluster_mode == "local":
-    init_orca_context(cluster_mode="local", cores=4) # run in local mode
-elif args.cluster_mode == "k8s":
-    init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2) # run on K8s cluster
-elif args.cluster_mode == "yarn":
-    init_orca_context(cluster_mode="yarn-client", num_nodes=2, cores=2) # run on Hadoop YARN cluster
-```
-This is the only place where you need to specify local or distributed mode. View [Orca Context](../../Orca/Overview/orca-context.md) for more details.
-
-**Note:** You should `export HADOOP_CONF_DIR=/path/to/hadoop/conf/dir` when running on Hadoop YARN cluster. View [Hadoop User Guide](../../UserGuide/hadoop.md) for more details.
-
-### Step 2: Prepare a TSDataset
-Prepare a `TSDataset` and call necessary operations on it.
-```python
-from bigdl.chronos.data import TSDataset
-from sklearn.preprocessing import StandardScaler
-
-tsdata_train, tsdata_val, tsdata_test\
-    = TSDataset.from_pandas(df, dt_col="timestamp", target_col="value", with_split=True, val_ratio=0.1, test_ratio=0.1)
-
-standard_scaler = StandardScaler()
-for tsdata in [tsdata_train, tsdata_val, tsdata_test]:
-    tsdata.gen_dt_feature()\
-          .impute(mode="last")\
-          .scale(standard_scaler, fit=(tsdata is tsdata_train))
-```
-There is no need to call `.roll()` or `.to_torch_data_loader()` in this step, which is the largest difference between the usage of `AutoTSEstimator` and _Chronos Forecaster_. `AutoTSEstimator` will do that automatically and tune the parameters as well.
-
-Please call `.gen_dt_feature()`(recommended), `.gen_rolling_feature()`, and `gen_global_feature()` to generate all candidate features to be selected by `AutoTSEstimator` as well as your input extra feature.
-
-Detailed information please refer to [TSDataset API doc](../../PythonAPI/Chronos/tsdataset.html) and [Time series data basic concepts](../Overview/data_processing_feature_engineering.html).
-
-### Step 3: Create an AutoTSEstimator
-
-```python
-import bigdl.orca.automl.hp as hp
-from bigdl.chronos.autots import AutoTSEstimator
-auto_estimator = AutoTSEstimator(model='lstm', # the model name used for training
-                                 search_space='normal', # a default hyper parameter search space
-                                 past_seq_len=hp.randint(1, 10), # hp sampling function of past_seq_len for auto-tuning
-) 
-```
-We prebuild three defualt search space for each build-in model, which you can use the by setting `search_space` to "minimal"，"normal", or "large" or define your own search space in a dictionary. The larger the search space, the better accuracy you will get and the more time will be cost.
-
-`past_seq_len` can be set as a hp sample function, the proper range is highly related to your data. A range between 0.5 cycle and 3 cycle is reasonable.
-
-Detailed information please refer to [AutoTSEstimator API doc](../../PythonAPI/Chronos/autotsestimator.html#autotsestimator) and basic concepts [here](../Overview/forecasting.html#use-autots-pipeline).
-
-### Step 4: Fit with AutoTSEstimator
-```python
-# fit with AutoTSEstimator for a returned TSPipeline
-ts_pipeline = auto_estimator.fit(data=tsdata_train, # train dataset
-                                 validation_data=tsdata_val, # validation dataset
-                                 epochs=5) # number of epochs to train in each trial
-```
-Detailed information please refer to [AutoTSEstimator API doc](../../PythonAPI/Chronos/autotsestimator.html#autotsestimator).
-### Step 5: Further deployment with TSPipeline
-The `TSPipeline` will reply the same preprcessing and corresponding postprocessing operations on the test data. You may carry out predict, evaluate or save/load for further development.
-```python
-# predict with the best trial
-y_pred = ts_pipeline.predict(tsdata_test)
-```
-
-```python
-# evaluate the result pipeline
-mse, smape = ts_pipeline.evaluate(tsdata_test, metrics=["mse", "smape"])
-print("Evaluate: the mean square error is", mse)
-print("Evaluate: the smape value is", smape)
-```
-
-```python
-# save the pipeline
-my_ppl_file_path = "/tmp/saved_pipeline"
-ts_pipeline.save(my_ppl_file_path)
-# restore the pipeline for further deployment
-from bigdl.chronos.autots import TSPipeline
-loaded_ppl = TSPipeline.load(my_ppl_file_path)
-```
-Detailed information please refer to [TSPipeline API doc](../../PythonAPI/Chronos/tsdataset.html).
-
-### Optional: Examine the leaderboard visualization
-To view the evaluation result of "not chosen" trails and find some insight or even possibly improve you search space for a new autotuning task. We provide a leaderboard through tensorboard.
-```python
-# show a tensorboard view
-%load_ext tensorboard
-%tensorboard --logdir /tmp/autots_estimator/autots_estimator_leaderboard/
-```
-Detailed information please refer to [Visualization](../Overview/useful_functionalities.html#automl-visualization).
-
-[chronos_autots_nyc_taxi_colab]: <https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/chronos/colab-notebook/chronos_autots_nyc_taxi.ipynb>
-[chronos_autots_nyc_taxi]: <https://github.com/intel-analytics/BigDL/blob/main/python/chronos/colab-notebook/chronos_autots_nyc_taxi.ipynb>
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/QuickStart/chronos-tsdataset-forecaster-quickstart.md b/docs/readthedocs/source/doc/Chronos/QuickStart/chronos-tsdataset-forecaster-quickstart.md
deleted file mode 100644
index cde45dce..00000000
--- a/docs/readthedocs/source/doc/Chronos/QuickStart/chronos-tsdataset-forecaster-quickstart.md
+++ /dev/null
@@ -1,92 +0,0 @@
-# Predict Number of Taxi Passengers with Chronos Forecaster
-
----
-
-![](../../../../image/colab_logo_32px.png)[Run in Google Colab][chronos_nyc_taxi_tsdataset_forecaster_colab] &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub][chronos_nyc_taxi_tsdataset_forecaster]
-
----
-
-**In this guide we will demonstrate how to use _Chronos TSDataset_ and _Chronos Forecaster_ for time seires processing and forecasting in 4 simple steps.**
-
-### Step 0: Prepare Environment
-
-We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../Overview/chronos.html#install) for more details.
-
-```bash
-conda create -n my_env python=3.7 # "my_env" is conda environment name, you can use any name you like.
-conda activate my_env
-pip install bigdl-chronos[all]
-```
-
-### Step 1: Data transformation and feature engineering using Chronos TSDataset
-
-[TSDataset](../Overview/data_processing_feature_engineering.html) is our abstract of time series dataset for data transformation and feature engineering. Here we use it to preprocess the data.
-
-Initialize train, valid and test tsdataset from raw pandas dataframe.
-
-```python
-from bigdl.chronos.data import TSDataset
-from sklearn.preprocessing import StandardScaler
-
-tsdata_train, tsdata_valid, tsdata_test = TSDataset.from_pandas(df, dt_col="timestamp", target_col="value",
-                                                                with_split=True, val_ratio=0.1, test_ratio=0.1)
-```
-Preprocess the datasets. Here we perform:
-
-- deduplicate: remove those identical data records
-- impute: fill the missing values
-- gen_dt_feature: generate feature from datetime (e.g. month, day...)
-- scale: scale each feature to standard distribution.
-- roll: sample the data with sliding window.
-- For forecasting task, we will look back 3 hours' historical data (6 records) and predict the value of next 30 miniutes (1 records).
-
-We perform the same transformation processes on train, valid and test set.
-
-```python
-lookback, horizon = 6, 1
-
-scaler = StandardScaler()
-for tsdata in [tsdata_train, tsdata_valid, tsdata_test]:
-    tsdata.deduplicate().impute().gen_dt_feature()\
-          .scale(scaler, fit=(tsdata is tsdata_train))\
-          .roll(lookback=lookback, horizon=horizon)
-```
-
-### Step 2: Time series forecasting using Chronos Forecaster
-
-After preprocessing the datasets. We can use [Chronos Forecaster](../Overview/forecasting.html#use-standalone-forecaster-pipeline) to handle the forecasting tasks.
-
-Transform TSDataset to sampled numpy ndarray and feed them to forecaster.
-
-```python
-x, y = tsdata_train.to_numpy() 
-x_val, y_val = tsdata_valid.to_numpy() 
-# x.shape = (num of sample, lookback, num of input feature)
-# y.shape = (num of sample, horizon, num of output feature)
-
-forecaster = TCNForecaster(past_seq_len=lookback,  # number of steps to look back
-                           future_seq_len=horizon,  # number of steps to predict
-                           input_feature_num=x.shape[-1],  # number of feature to use
-                           output_feature_num=y.shape[-1])  # number of feature to predict
-res = forecaster.fit(data=(x, y), epochs=3)
-```
-
-### Step 3: Further deployment with fitted forecaster
-
-Use fitted forecaster to predict test data
-
-```python
-x_test, y_test = tsdata_test.to_numpy()
-pred = forecaster.predict(x_test)
-pred_unscale, groundtruth_unscale = tsdata_test.unscale_numpy(pred), tsdata_test.unscale_numpy(y_test)
-```
-
-Save & restore the forecaster.
-
-```python
-forecaster.save("nyc_taxi.fxt")
-forecaster.restore("nyc_taxi.fxt")
-```
-
-[chronos_nyc_taxi_tsdataset_forecaster_colab]:<https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/chronos/colab-notebook/chronos_nyc_taxi_tsdataset_forecaster.ipynb>
-[chronos_nyc_taxi_tsdataset_forecaster]:<https://github.com/intel-analytics/BigDL/blob/main/python/chronos/colab-notebook/chronos_nyc_taxi_tsdataset_forecaster.ipynb>
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/QuickStart/index.md b/docs/readthedocs/source/doc/Chronos/QuickStart/index.md
deleted file mode 100644
index c207b645..00000000
--- a/docs/readthedocs/source/doc/Chronos/QuickStart/index.md
+++ /dev/null
@@ -1,372 +0,0 @@
-# Chronos Examples
-
-```eval_rst
-.. raw:: html
-
-    <link rel="stylesheet" type="text/css" href="../../../_static/css/chronos_tutorial.css" />
-
-    <div id="tutorial">
-        <h3 style="text-align:left">Filter:</h3>
-        <p>Please <span style="font-weight:bold;">check</span> the checkboxes or <span style="font-weight:bold;">click</span> tag buttons to show the related examples. Reclick or uncheck will hide corresponding examples. If nothing is checked or clicked, all the examples will be displayed. </p>
-        <div class="border">
-            <div class="choiceline">
-                <div class="choicebox"><input type="checkbox" class="checkboxes" name="choice" value="forecast" id="forecast">forecast </div>
-                <div class="choicebox"><input type="checkbox" class="checkboxes" name="choice" value="anomaly_detection" id="anomaly_detection">anomaly detection</div>
-                <div class="choicebox"><input type="checkbox" class="checkboxes" name="choice" value="simulation" id="simulation">simulation</div>
-                <div class="choicebox"><input type="checkbox" class="checkboxes" name="choice" value="hyperparameter_tuning" id="hyperparameter_tuning">AutoML</div>
-            </div>
-            <div class="choiceline">
-                <div class="choicebox"><input type="checkbox" class="checkboxes" name="choice" value="onnxruntime" id="onnxruntime">onnxruntime </div>
-                <div class="choicebox"><input type="checkbox" class="checkboxes" name="choice" value="quantization" id="quantization">quantization</div>
-                <div class="choicebox"><input type="checkbox" class="checkboxes" name="choice" value="distributed" id="distributed">distributed</div>
-                <div class="choicebox"><input type="checkbox" class="checkboxes" name="choice" value="customized_model" id="customized_model">customized model</div>
-            </div>
-            <div class="hiddenline">
-                <div class="choicebox"><input type="checkbox" class="forecasters" name="forecasters" value="TCNForecaster" id="TCNForecaster">TCNForecaster</div>
-                <div class="choicebox"><input type="checkbox" class="forecasters" name="forecasters" value="AutoTSEstimator" id="AutoTSEstimator">AutoTSEstimator</div>
-                <div class="choicebox"><input type="checkbox" class="forecasters" name="forecasters" value="DBScanDetector" id="DBScanDetector">DBScanDetector</div>
-                <div class="choicebox"><input type="checkbox" class="forecasters" name="forecasters" value="LSTMForecaster" id="LSTMForecaster">LSTMForecaster</div>
-                <div class="choicebox"><input type="checkbox" class="forecasters" name="forecasters" value="AutoProphet" id="AutoProphet">AutoProphet</div>
-                <div class="choicebox"><input type="checkbox" class="forecasters" name="forecasters" value="MTNetForecaster" id="MTNetForecaster">MTNetForecaster</div>
-                <div class="choicebox"><input type="checkbox" class="forecasters" name="forecasters" value="DeepAR" id="DeepAR">DeepAR</div>
-                <div class="choicebox"><input type="checkbox" class="forecasters" name="forecasters" value="AutoLSTM" id="AutoLSTM">AutoLSTM</div>
-                <div class="choicebox"><input type="checkbox" class="forecasters" name="forecasters" value="Seq2SeqForecaster" id="Seq2SeqForecaster">Seq2SeqForecaster</div>
-                <div class="choicebox"><input type="checkbox" class="forecasters" name="forecasters" value="DPGANSimulator" id="DPGANSimulator">DPGANSimulator</div>
-                <div class="choicebox"><input type="checkbox" class="forecasters" name="forecasters" value="TCMFForecaster" id="TCMFForecaster">TCMFForecaster</div>
-                <div class="choicebox"><input type="checkbox" class="forecasters" name="forecasters" value="TFT_model" id="TFT_model">TFT_model</div>
-            </div>
-        </div>
-        </br>
-        <div class="showingForecaster">Currently showing forcaster:&nbsp;<i>All Forecasters</i>&nbsp;&nbsp;(<span style="font-weight:bold;">Reclick</span> the tag of these forecasters to undo.)</div>
-        </br>
-
-        <details id="ChronosForecaster">
-            <summary>
-                <a href="./chronos-tsdataset-forecaster-quickstart.html">Predict Number of Taxi Passengers with Chronos Forecaster</a>
-                <p>Tag: 
-                    <button value="forecast">forecast</button>&nbsp;
-                    <button value="TCNForecaster" class="roundbutton">TCNForecaster</button>
-                </p>
-            </summary>
-            <img src="../../../_images/colab_logo_32px.png"><a href="https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/chronos/colab-notebook/chronos_nyc_taxi_tsdataset_forecaster.ipynb">Run in Google Colab</a>
-            &nbsp;
-            <img src="../../../_images/GitHub-Mark-32px.png"><a href="https://github.com/intel-analytics/BigDL/blob/main/python/chronos/colab-notebook/chronos_nyc_taxi_tsdataset_forecaster.ipynb">View source on GitHub</a>
-            <p>In this guide we will demonstrate how to use <span>Chronos TSDataset</span> and <span>Chronos Forecaster</span> for time series processing and predict number of taxi passengers.</p>
-        </details>
-        <hr>
-
-        <details id="TuneaForecasting">
-            <summary>
-                <a href="./chronos-autotsest-quickstart.html">Tune a Forecasting Task Automatically</a>
-                <p>Tag: 
-                    <button value="forecast">forecast</button>&nbsp;
-                    <button value="hyperparameter_tuning">AutoML</button>&nbsp;
-                    <button value="AutoTSEstimator" class="roundbutton">AutoTSEstimator</button>
-                </p>
-            </summary>
-            <img src="../../../_images/colab_logo_32px.png"><a href="https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/chronos/colab-notebook/chronos_autots_nyc_taxi.ipynb">Run in Google Colab</a>
-            &nbsp;
-            <img src="../../../_images/GitHub-Mark-32px.png"><a href="https://github.com/intel-analytics/BigDL/blob/main/python/chronos/colab-notebook/chronos_autots_nyc_taxi.ipynb">View source on GitHub</a>
-            <p>In this guide we will demonstrate how to use <span>Chronos AutoTSEstimator</span> and <span>Chronos TSPipeline</span> to auto tune a time seires forecasting task and handle the whole model development process easily.</p>
-        </details>
-        <hr>
-
-        <details id="DetectAnomaly">
-            <summary>
-                <a href="./chronos-anomaly-detector.html">Detect Anomaly Point in Real Time Traffic Data</a>
-                <p>Tag: 
-                    <button value="anomaly_detection">anomaly detection</button>&nbsp;
-                    <button value="DBScanDetector" class="roundbutton">DBScanDetector</button>
-                </p>
-            </summary>
-            <img src="../../../_images/colab_logo_32px.png"><a href="https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/chronos/colab-notebook/chronos_minn_traffic_anomaly_detector.ipynb">Run in Google Colab</a>
-            &nbsp;
-            <img src="../../../_images/GitHub-Mark-32px.png"><a href="https://github.com/intel-analytics/BigDL/blob/main/python/chronos/colab-notebook/chronos_minn_traffic_anomaly_detector.ipynb">View source on GitHub</a>
-            <p>In this guide we will demonstrate how to use <span>Chronos Anomaly Detector</span> for real time traffic data from the Twin Cities Metro area in Minnesota anomaly detection.</p>
-        </details>
-        <hr>
-
-        <details id="AutoTS">
-            <summary>
-                <a href="https://github.com/intel-analytics/BigDL/blob/main/python/chronos/use-case/network_traffic/network_traffic_autots_customized_model.ipynb">Tune a Customized Time Series Forecasting Model with AutoTSEstimator.</a>
-                <p>Tag: 
-                    <button value="forecast">forecast</button>&nbsp;
-                    <button value="hyperparameter_tuning">AutoML</button>&nbsp;
-                    <button value="customized_model">customized model</button>&nbsp;
-                    <button value="AutoTSEstimator" class="roundbutton">AutoTSEstimator</button>
-                </p>
-            </summary>
-            <img src="../../../_images/GitHub-Mark-32px.png"><a href="https://github.com/intel-analytics/BigDL/blob/main/python/chronos/use-case/network_traffic/network_traffic_autots_customized_model.ipynb">View source on GitHub</a>
-            <p>In this notebook, we demonstrate a reference use case where we use the network traffic KPI(s) in the past to predict traffic KPI(s) in the future. We demonstrate how to use <span>AutoTSEstimator</span> to adjust the parameters of a customized model.</p>
-        </details>
-        <hr>
-
-        <details id="AutoWIDE">
-            <summary>
-                <a href="https://github.com/intel-analytics/BigDL/blob/main/python/chronos/use-case/network_traffic/network_traffic_autots_forecasting.ipynb">Auto Tune the Prediction of Network Traffic at the Transit Link of WIDE</a>
-                <p>Tag: 
-                    <button value="forecast">forecast</button>&nbsp;
-                    <button value="hyperparameter_tuning">AutoML</button>&nbsp;
-                    <button value="AutoTSEstimator" class="roundbutton">AutoTSEstimator</button>
-                </p>
-            </summary>
-            <img src="../../../_images/GitHub-Mark-32px.png"><a href="https://github.com/intel-analytics/BigDL/blob/main/python/chronos/use-case/network_traffic/network_traffic_autots_forecasting.ipynb">View source on GitHub</a>
-            <p>In this notebook, we demostrate a reference use case where we use the network traffic KPI(s) in the past to predict traffic KPI(s) in the future. We demostrate how to use <span>AutoTS</span> in project <span><a href="https://github.com/intel-analytics/bigdl/tree/main/python/chronos/src/bigdl/chronos">Chronos</a></span> to do time series forecasting in an automated and distributed way.</p>
-        </details>
-        <hr>
-
-        <details id="MultvarWIDE">
-            <summary>
-                <a href="https://github.com/intel-analytics/BigDL/blob/main/python/chronos/use-case/network_traffic/network_traffic_model_forecasting.ipynb">Multivariate Forecasting of Network Traffic at the Transit Link of WIDE</a>
-                <p>Tag: 
-                    <button value="forecast">forecast</button>&nbsp;
-                    <button value="LSTMForecaster" class="roundbutton">LSTMForecaster</button>
-                </p>
-            </summary>
-            <img src="../../../_images/GitHub-Mark-32px.png"><a href="https://github.com/intel-analytics/BigDL/blob/main/python/chronos/use-case/network_traffic/network_traffic_model_forecasting.ipynb">View source on GitHub</a>
-            <p>In this notebook, we demonstrate a reference use case where we use the network traffic KPI(s) in the past to predict traffic KPI(s) in the future. We demostrate how to do univariate forecasting (predict only 1 series), and multivariate forecasting (predicts more than 1 series at the same time) using Project <span><a href="https://github.com/intel-analytics/bigdl/tree/main/python/chronos/src/bigdl/chronos">Chronos</a></span>.</p>
-        </details>
-        <hr>
-
-        <details id="MultstepWIDE">
-            <summary>
-                <a href="https://github.com/intel-analytics/BigDL/blob/main/python/chronos/use-case/network_traffic/network_traffic_multivariate_multistep_tcnforecaster.ipynb">Multistep Forecasting of Network Traffic at the Transit Link of WIDE</a>
-                <p>Tag: 
-                    <button value="forecast">forecast</button>&nbsp;
-                    <button value="TCNForecaster" class="roundbutton">TCNForecaster</button>
-                </p>
-            </summary>
-            <img src="../../../_images/GitHub-Mark-32px.png"><a href="https://github.com/intel-analytics/BigDL/blob/main/python/chronos/use-case/network_traffic/network_traffic_multivariate_multistep_tcnforecaster.ipynb">View source on GitHub</a>
-            <p>In this notebook, we demonstrate a reference use case where we use the network traffic KPI(s) in the past to predict traffic KPI(s) in the future. We demostrate how to do multivariate multistep forecasting using Project <span><a href="https://github.com/intel-analytics/bigdl/tree/main/python/chronos/src/bigdl/chronos">Chronos</a></span>.</p>
-        </details>
-        <hr>
-
-        <details id="LSTMF">
-            <summary>
-            <a href="https://github.com/intel-analytics/BigDL/blob/main/python/chronos/use-case/fsi/stock_prediction.ipynb">Stock Price Prediction with LSTMForecaster</a>
-            <p>Tag: 
-                <button value="forecast">forecast</button>&nbsp;
-                <button value="LSTMForecaster" class="roundbutton">LSTMForecaster</button>
-            </p>
-            </summary>          
-            <img src="../../../_images/GitHub-Mark-32px.png"><a href="https://github.com/intel-analytics/BigDL/blob/main/python/chronos/use-case/fsi/stock_prediction.ipynb">View source on GitHub</a>
-            <p>In this notebook, we demonstrate a reference use case where we use historical stock price data to predict the future price. The dataset we use is the daily stock price of S&P500 stocks during 2013-2018 (data source). We demostrate how to do univariate forecasting using the past 80% of the total days' MMM price to predict the future 20% days' daily price.</p>
-            <p>Reference: <span><a href="https://github.com/jwkanggist/tf-keras-stock-pred">https://github.com/jwkanggist/tf-keras-stock-pred</a></span></p>
-        </details>
-        <hr>
-
-        <details id="AutoPr">
-            <summary>
-                <a href="https://github.com/intel-analytics/BigDL/blob/main/python/chronos/use-case/fsi/stock_prediction_prophet.ipynb">Stock Price Prediction with ProphetForecaster and AutoProphet</a>
-                <p>Tag: 
-                    <button value="forecast">forecast</button>&nbsp;
-                    <button value="hyperparameter_tuning">AutoML</button>&nbsp;
-                    <button value="AutoProphet" class="roundbutton">AutoProphet</button>
-                </p>
-            </summary>
-            <img src="../../../_images/GitHub-Mark-32px.png"><a href="https://github.com/intel-analytics/BigDL/blob/main/python/chronos/use-case/fsi/stock_prediction_prophet.ipynb">View source on GitHub</a>
-            <p>In this notebook, we demonstrate a reference use case where we use historical stock price data to predict the future price using the ProphetForecaster and AutoProphet. The dataset we use is the daily stock price of S&P500 stocks during 2013-2018 <span><a href="https://www.kaggle.com/camnugent/sandp500/">data source</a></span>.</p>
-            <p>Reference: <span><a href="https://facebook.github.io/prophet">https://facebook.github.io/prophet</a></span>, <span><a href="https://github.com/jwkanggist/tf-keras-stock-pred">https://github.com/jwkanggist/tf-keras-stock-pred</a></span></p>
-        </details>
-        <hr>
-
-        <details id="Unsupervised">
-            <summary>
-                <a href="https://github.com/intel-analytics/BigDL/blob/main/python/chronos/use-case/AIOps/AIOps_anomaly_detect_unsupervised.ipynb">Unsupervised Anomaly Detection for CPU Usage</a>
-                <p>Tag: 
-                    <button value="anomaly_detection">anomaly detection</button>&nbsp;
-                    <button value="DBScanDetector" class="roundbutton">DBScanDetector</button>
-                </p>
-            </summary>
-            <img src="../../../_images/GitHub-Mark-32px.png"><a href="https://github.com/intel-analytics/BigDL/blob/main/python/chronos/use-case/AIOps/AIOps_anomaly_detect_unsupervised.ipynb">View source on GitHub</a>
-            <p>We demonstrates how to perform anomaly detection based on Chronos's built-in <span><a href="../../PythonAPI/Chronos/anomaly_detectors.html#dbscandetector">DBScanDetector</a></span>, <span><a href="../../PythonAPI/Chronos/anomaly_detectors.html#aedetector">AEDetector</a></span> and <span><a href="../../PythonAPI/Chronos/anomaly_detectors.html#thresholddetector">ThresholdDetector</a></span>.</p>
-        </details>
-        <hr>
-
-        <details id="AnomalyDetection">
-            <summary>
-                <a href="https://github.com/intel-analytics/BigDL/blob/main/python/chronos/use-case/AIOps/AIOps_anomaly_detect_unsupervised_forecast_based.ipynb">Anomaly Detection for CPU Usage Based on Forecasters</a>
-                <p>Tag: 
-                    <button value="forecast">forecast</button>&nbsp;
-                    <button value="anomaly_detection">anomaly detection</button>&nbsp;
-                    <button value="MTNetForecaster" class="roundbutton">MTNetForecaster</button>
-                </p>
-            </summary>
-            <img src="../../../_images/GitHub-Mark-32px.png"><a href="https://github.com/intel-analytics/BigDL/blob/main/python/chronos/use-case/AIOps/AIOps_anomaly_detect_unsupervised_forecast_based.ipynb">View source on GitHub</a>
-            <p>We demonstrates how to leverage Chronos's built-in models ie. MTNet, to do time series forecasting. Then perform anomaly detection on predicted value with <span><a href="../../PythonAPI/Chronos/anomaly_detectors.html#thresholddetector">ThresholdDetector</a></span>.</p>
-        </details>
-        <hr>
-
-        <details id="DeepARmodel">
-            <summary>
-                <a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/use-case/pytorch-forecasting/DeepAR">Help pytorch-forecasting improve the training speed of DeepAR model</a>
-                <p>Tag: 
-                    <button value="forecast">forecast</button>&nbsp;
-                    <button value="customized_model">customized model</button>&nbsp;
-                    <button value="DeepAR" class="roundbutton">DeepAR</button>
-                </p>
-            </summary>
-            <img src="../../../_images/GitHub-Mark-32px.png"><a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/use-case/pytorch-forecasting/DeepAR">View source on GitHub</a>
-            <p>Chronos can help a 3rd party time series lib to improve the performance (both training and inferencing) and accuracy. This use-case shows Chronos can easily help pytorch-forecasting speed up the training of DeepAR model.</p>
-        </details>
-        <hr>
-
-        <details id="TFTmodel">
-            <summary>
-                <a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/use-case/pytorch-forecasting/TFT">Help pytorch-forecasting improve the training speed of TFT model</a>
-                <p>Tag: 
-                    <button value="forecast">forecast</button>&nbsp;
-                    <button value="customized_model">customized model</button>&nbsp;
-                    <button value="TFT_model" class="roundbutton">TFT Model</button>
-                </p>
-            </summary>
-            <img src="../../../_images/GitHub-Mark-32px.png"><a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/use-case/pytorch-forecasting/TFT">View source on GitHub</a>
-            <p>Chronos can help a 3rd party time series lib to improve the performance (both training and inferencing) and accuracy. This use-case shows Chronos can easily help pytorch-forecasting speed up the training of TFT model.</p>
-        </details>
-        <hr>
-
-        <details id="hyperparameter">
-            <summary>
-                <a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/example/hpo/muti_objective_hpo_with_builtin_latency_tutorial.ipynb">Tune a Time Series Forecasting Model with multi-objective hyperparameter optimization.</a>
-                <p>Tag: 
-                    <button value="forecast">forecast</button>&nbsp;
-                    <button value="hyperparameter_tuning">AutoML</button>&nbsp;
-                    <button value="TCNForecaster" class="roundbutton">TCNForecaster</button>
-                </p>
-            </summary>
-            <img src="../../../_images/GitHub-Mark-32px.png"><a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/example/hpo/muti_objective_hpo_with_builtin_latency_tutorial.ipynb">View source on GitHub</a>
-            <p>In this notebook, we demostrate how to use <span>multi-objective hyperparameter optimization with built-in latency metric</span> in project <span><a href="https://github.com/intel-analytics/bigdl/tree/main/python/chronos/src/bigdl/chronos">Chronos</a></span> to do time series forecasting and achieve good tradeoff between performance and latency.</p>
-        </details>
-        <hr>
-
-        <details id="taxiDataset">
-            <summary>
-                <a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/example/auto_model">Auto tuning prophet on nyc taxi dataset</a>
-                <p>Tag: 
-                    <button value="forecast">forecast</button>&nbsp;
-                    <button value="hyperparameter_tuning">AutoML</button>&nbsp;
-                    <button value="AutoLSTM" class="roundbutton">AutoLSTM</button>&nbsp;
-                    <button value="AutoProphet" class="roundbutton">AutoProphet</button>
-                </p>
-            </summary>
-            <img src="../../../_images/GitHub-Mark-32px.png"><a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/example/auto_model">View source on GitHub</a>
-            <p>This example collection will demonstrate Chronos auto models (i.e. autolstm & autoprophet) perform automatic time series forecasting on nyc_taxi dataset. The auto model will search the best hyperparameters automatically.</p>
-        </details>
-        <hr>
-
-        <details id="distributedFashion">
-            <summary>
-                <a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/example/distributed">Use Chronos forecasters in a distributed fashion</a>
-                <p>Tag: 
-                    <button value="forecast">forecast</button>&nbsp;
-                    <button value="distributed">distributed</button>&nbsp;
-                    <button value="Seq2SeqForecaster" class="roundbutton">Seq2SeqForecaster</button>&nbsp;
-                    <button value="TCNForecaster" class="roundbutton">TCNForecaster</button>
-                </p>
-            </summary>
-            <img src="../../../_images/GitHub-Mark-32px.png"><a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/example/distributed">View source on GitHub</a>
-            <p>Users can easily train their forecasters in a distributed fashion to handle extra large dataset and speed up the process (training and data processing) by utilizing a cluster or pseudo-distribution on a single node. The functionality is powered by Project Orca.</p>
-        </details>
-        <hr>
-
-        <details id="ONNX">
-            <summary>
-                <a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/example/onnx">Use ONNXRuntime to speed-up forecasters' inferecing</a>
-                <p>Tag: 
-                    <button value="forecast">forecast</button>&nbsp;
-                    <button value="onnxruntime">onnxruntime</button>&nbsp;
-                    <button value="hyperparameter_tuning">AutoML</button>&nbsp;
-                    <button value="AutoTSEstimator" class="roundbutton">AutoTSEstimator</button>&nbsp;
-                    <button value="Seq2SeqForecaster" class="roundbutton">Seq2SeqForecaster</button>
-                </p>
-            </summary>
-            <img src="../../../_images/GitHub-Mark-32px.png"><a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/example/onnx">View source on GitHub</a>
-            <p>This example will demonstrate how to use ONNX to speed up the inferencing(prediction/evaluation) on forecasters and AutoTSEstimator. In this example, onnx speed up the inferencing for ~4X.</p>
-        </details>
-        <hr>
-
-        <details id="Quantize">
-            <summary>
-                <a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/example/quantization">Quantize Chronos forecasters method to speed-up inference</a>
-                <p>Tag: 
-                    <button value="forecast">forecast</button>&nbsp;
-                    <button value="quantization">quantization</button>&nbsp;
-                    <button value="TCNForecaster" class="roundbutton">TCNForecaster</button>
-                </p>
-            </summary>
-            <img src="../../../_images/GitHub-Mark-32px.png"><a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/example/quantization">View source on GitHub</a>
-            <p>Users can easily quantize their forecasters to low precision and speed up the inference process (both throughput and latency) by on a single node. The functionality is powered by Project Nano.</p>
-        </details>
-        <hr>
-
-        <details id="SimualateTimeSeriesData">
-            <summary>
-                <a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/example/simulator">Simualate time series data with similar pattern as example data</a>
-                <p>Tag: 
-                    <button value="simulation">simulation</button>&nbsp;
-                    <button value="DPGANSimulator" class="roundbutton">DPGANSimulator</button>
-                </p>
-            </summary>
-            <img src="../../../_images/GitHub-Mark-32px.png"><a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/example/simulator">View source on GitHub</a>
-            <p>This example shows how to generate synthetic data with similar distribution as training data with the fast and easy DPGANSimulator API provided by Chronos.</p>
-        </details>
-        <hr>
-
-        <details id="TCMF">
-            <summary>
-                <a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/example/tcmf">High dimension time series forecasting with Chronos TCMFForecaster</a>
-                <p>Tag: 
-                    <button value="forecast">forecast</button>&nbsp;
-                    <button value="distributed">distributed</button>&nbsp;
-                    <button value="TCMFForecaster" class="roundbutton">TCMFForecaster</button>
-                </p>
-            </summary>
-            <img src="../../../_images/GitHub-Mark-32px.png"><a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/example/tcmf">View source on GitHub</a>
-            <p>This example demonstrates how to use BigDL Chronos TCMFForecaster to run distributed training and inference for high dimension time series forecasting task.</p>
-        </details>
-        <hr>
-
-        <details id="PenalizeUnderestimation">
-            <summary>
-                <a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/example/loss/penalize_underestimation.ipynb">Penalize underestimation with LinexLoss</a>
-                <p>Tag: 
-                    <button value="forecast">forecast</button>&nbsp;
-                    <button value="TCNForecaster" class="roundbutton">TCNForecaster</button>
-                </p>
-            </summary>
-            <img src="../../../_images/GitHub-Mark-32px.png"><a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/example/loss/penalize_underestimation.ipynb">View source on GitHub</a>
-            <p>This example demonstrates how to use TCNForecaster to penalize underestimation based on a built-in loss function LinexLoss.</p>
-        </details>
-        <hr>
-
-        <details id="GPUtrainingCPUacceleration">
-            <summary>
-                <a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/example/inference-acceleration">Accelerate the inference speed of model trained on other platform</a>
-                <p>Tag: 
-                    <button value="forecast">forecast</button>&nbsp;
-                    <button value="customized_model">customized model</button>&nbsp;
-                    <button value="TCNForecaster" class="roundbutton">TCNForecaster</button>
-                </p>
-            </summary>
-            <img src="../../../_images/GitHub-Mark-32px.png"><a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/example/inference-acceleration">View source on GitHub</a>
-            <p>In this example, we show an example to train the model on GPU and accelerate the model by using onnxruntime on CPU.</p>
-        </details>
-        <hr>
-
-        <details id="ServeForecaster">
-            <summary>
-                <a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/example/serving">Serve Chronos forecaster and predict through TorchServe</a>
-                <p>Tag: 
-                    <button value="forecast">forecast</button>&nbsp;
-                    <button value="TCNForecaster" class="roundbutton">TCNForecaster</button>
-                </p>
-            </summary>
-            <img src="../../../_images/GitHub-Mark-32px.png"><a href="https://github.com/intel-analytics/BigDL/tree/main/python/chronos/example/serving">View source on GitHub</a>
-            <p>In this example, we show how to serve Chronos forecaster and predict through TorchServe.</p>
-        </details>
-        <hr>
-
-    </div>
-
-    <script src="../../../_static/js/chronos_tutorial.js"></script> 
-```
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Chronos/index.rst b/docs/readthedocs/source/doc/Chronos/index.rst
deleted file mode 100644
index 24c9c519..00000000
--- a/docs/readthedocs/source/doc/Chronos/index.rst
+++ /dev/null
@@ -1,89 +0,0 @@
-BigDL-Chronos
-========================
-
-**BigDL-Chronos** (**Chronos** for short) is an application framework for building a fast, accurate and scalable time series analysis application.
-
-You can use **Chronos** for:
-
-.. grid:: 1 3 3 3
-
-    .. grid-item::
-
-        .. image:: ./Image/forecasting.svg
-            :alt: Forcasting example diagram
-
-        **Forecasting:** Predict future using history data.
-
-    .. grid-item::
-
-        .. image:: ./Image/anomaly_detection.svg
-            :alt: Anomaly Detection example diagram
-
-        **Anomaly Detection:** Discover unexpected items in data.
-
-    .. grid-item::
-
-        .. image:: ./Image/simulation.svg
-            :alt: Simulation example diagram
-
-        **Simulation:** Generate similar data as history data.
-
--------
-
-
-.. grid:: 1 2 2 2
-    :gutter: 2
-
-    .. grid-item-card::
-
-        **Get Started**
-        ^^^
-
-        You may understand the basic usage of Chronos' components and learn to write the first runnable application in this quick tour page.
-
-        +++
-        :bdg-link:`Chronos in 5 minutes <./Overview/quick-tour.html>` |
-        :bdg-link:`Installation <./Overview/install.html>`
-
-    .. grid-item-card::
-
-        **Key Features Guide**
-        ^^^
-
-        Our user guides provide you with in-depth information, concepts and knowledges about Chronos.
-
-        +++
-
-        :bdg-link:`Data <./Overview/data_processing_feature_engineering.html>` |
-        :bdg-link:`Forecast <./Overview/forecasting.html>` |
-        :bdg-link:`Detect <./Overview/anomaly_detection.html>` |
-        :bdg-link:`Simulate <./Overview/simulation.html>`
-
-    .. grid-item-card::
-
-        **How-to-Guide** / **Tutorials**
-        ^^^
-
-        If you are meeting with some specific problems during the usage, how-to guides are good place to be checked.
-        Examples provides short, high quality use case that users can emulated in their own works.
-
-        +++
-
-        :bdg-link:`How-to-Guide <./Howto/index.html>` | :bdg-link:`Example <./QuickStart/index.html>`
-
-    .. grid-item-card::
-
-        **API Document**
-        ^^^
-
-        API Document provides you with a detailed description of the Chronos APIs.
-
-        +++
-
-        :bdg-link:`API Document <../PythonAPI/Chronos/index.html>`
-
-
-..  toctree::
-    :hidden:
-
-    BigDL-Chronos Document <self>
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/DLlib/Image/tensorboard-histo1.png b/docs/readthedocs/source/doc/DLlib/Image/tensorboard-histo1.png
deleted file mode 100644
index 68e716d6..00000000
Binary files a/docs/readthedocs/source/doc/DLlib/Image/tensorboard-histo1.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/DLlib/Image/tensorboard-histo2.png b/docs/readthedocs/source/doc/DLlib/Image/tensorboard-histo2.png
deleted file mode 100644
index bb1db0a7..00000000
Binary files a/docs/readthedocs/source/doc/DLlib/Image/tensorboard-histo2.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/DLlib/Image/tensorboard-scalar.png b/docs/readthedocs/source/doc/DLlib/Image/tensorboard-scalar.png
deleted file mode 100644
index 0f593b71..00000000
Binary files a/docs/readthedocs/source/doc/DLlib/Image/tensorboard-scalar.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/DLlib/Overview/dllib.md b/docs/readthedocs/source/doc/DLlib/Overview/dllib.md
deleted file mode 100644
index 204ab135..00000000
--- a/docs/readthedocs/source/doc/DLlib/Overview/dllib.md
+++ /dev/null
@@ -1,139 +0,0 @@
-# DLlib in 5 minutes
-
-## Overview
-
-DLlib is a distributed deep learning library for Apache Spark; with DLlib, users can write their deep learning applications as standard Spark programs (using either Scala or Python APIs).
-
-It includes the functionalities of the [original BigDL](https://github.com/intel-analytics/BigDL/tree/branch-0.14) project, and provides following high-level APIs for distributed deep learning on Spark:
-
-* [Keras-like API](keras-api.md)
-* [Spark ML pipeline support](nnframes.md)
-
-
----
-
-## Scala Example
-
-This section show a single example of how to use dllib to build a deep learning application on Spark, using Keras APIs
-
-#### LeNet Model on MNIST using Keras-Style API
-
-This tutorial is an explanation of what is happening in the [lenet](https://github.com/intel-analytics/BigDL/tree/branch-2.0/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/example/keras) example
-
-A bigdl-dllib program starts with initialize as follows.
-````scala
-val conf = Engine.createSparkConf()
-  .setAppName("Train Lenet on MNIST")
-  .set("spark.task.maxFailures", "1")
-val sc = new SparkContext(conf)
-Engine.init
-````
-
-After the initialization, we need to:
-
-1. Load train and validation data by _**creating the [```DataSet```](https://github.com/intel-analytics/BigDL/blob/branch-2.0/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/feature/dataset/DataSet.scala)**_ (e.g., ````SampleToGreyImg````, ````GreyImgNormalizer```` and ````GreyImgToBatch````):
-   ````scala
-   val trainSet = (if (sc.isDefined) {
-       DataSet.array(load(trainData, trainLabel), sc.get, param.nodeNumber)
-     } else {
-       DataSet.array(load(trainData, trainLabel))
-     }) -> SampleToGreyImg(28, 28) -> GreyImgNormalizer(trainMean, trainStd) -> GreyImgToBatch(
-       param.batchSize)
-
-   val validationSet = DataSet.array(load(validationData, validationLabel), sc) ->
-       BytesToGreyImg(28, 28) -> GreyImgNormalizer(testMean, testStd) -> GreyImgToBatch(
-       param.batchSize)
-   ````
-
-2. We then define Lenet model using Keras-style api
-   ````scala
-   val input = Input(inputShape = Shape(28, 28, 1))
-   val reshape = Reshape(Array(1, 28, 28)).inputs(input)
-   val conv1 = Convolution2D(6, 5, 5, activation = "tanh").setName("conv1_5x5").inputs(reshape)
-   val pool1 = MaxPooling2D().inputs(conv1)
-   val conv2 = Convolution2D(12, 5, 5, activation = "tanh").setName("conv2_5x5").inputs(pool1)
-   val pool2 = MaxPooling2D().inputs(conv2)
-   val flatten = Flatten().inputs(pool2)
-   val fc1 = Dense(100, activation = "tanh").setName("fc1").inputs(flatten)
-   val fc2 = Dense(classNum, activation = "softmax").setName("fc2").inputs(fc1)
-   Model(input, fc2)
-   ````
-
-3. After that, we configure the learning process. Set the ````optimization method```` and the ````Criterion```` (which, given input and target, computes gradient per given loss function):
-   ````scala
-   model.compile(optimizer = optimMethod,
-           loss = ClassNLLCriterion[Float](logProbAsInput = false),
-           metrics = Array(new Top1Accuracy[Float](), new Top5Accuracy[Float](), new Loss[Float]))
-   ````
-
-Finally we _**train the model**_ by calling ````model.fit````:
-````scala
-model.fit(trainSet, nbEpoch = param.maxEpoch, validationData = validationSet)
-````
-
----
-
-## Python Example
-
-#### Initialize NN Context
-
-`NNContext` is the main entry for provisioning the dllib program on the underlying cluster (such as K8s or Hadoop cluster), or just on a single laptop.
-
-An dlllib program usually starts with the initialization of `NNContext` as follows:
-
-```python
-from bigdl.dllib.nncontext import *
-init_nncontext()
-```
-
-In `init_nncontext`, the user may specify cluster mode for the dllib program:
-
-- *Cluster mode=*: "local", "yarn-client", "yarn-cluster", "k8s-client", "standalone" and "spark-submit". Default to be "local".
-
-The dllib program simply runs `init_nncontext` on the local machine, which will automatically provision the runtime Python environment and distributed execution engine on the underlying computing environment (such as a single laptop, a large K8s or Hadoop cluster, etc.).
-
-
-#### Autograd Examples using bigdl-dllb keras Python API
-
-This tutorial describes the [Autograd](https://github.com/intel-analytics/BigDL/tree/branch-2.0/python/dllib/examples/autograd).
-
-The example first do the initializton using `init_nncontext()`:
-```python
-sc = init_nncontext()
-```
-
-It then generate the input data X_, Y_
-
-```python
-data_len = 1000
-X_ = np.random.uniform(0, 1, (1000, 2))
-Y_ = ((2 * X_).sum(1) + 0.4).reshape([data_len, 1])
-```
-
-It then define the custom loss
-
-```python
-def mean_absolute_error(y_true, y_pred):
-    result = mean(abs(y_true - y_pred), axis=1)
-    return result
-```
-
-After that, the example creates the model as follows and set the criterion as the custom loss:
-```python
-a = Input(shape=(2,))
-b = Dense(1)(a)
-c = Lambda(function=add_one_func)(b)
-model = Model(input=a, output=c)
-
-model.compile(optimizer=SGD(learningrate=1e-2),
-              loss=mean_absolute_error)
-```
-Finally the example trains the model by calling `model.fit`:
-
-```python
-model.fit(x=X_,
-          y=Y_,
-          batch_size=32,
-          nb_epoch=int(options.nb_epoch),
-          distributed=False)
-```
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/DLlib/Overview/index.rst b/docs/readthedocs/source/doc/DLlib/Overview/index.rst
deleted file mode 100644
index e2fd61e5..00000000
--- a/docs/readthedocs/source/doc/DLlib/Overview/index.rst
+++ /dev/null
@@ -1,6 +0,0 @@
-DLLib Key Features
-================================
-
-* `Keras-like API <keras-api.html>`_
-* `Spark ML Pipeline Support <nnframes.html>`_
-* `Visualization <visualization.html>`_
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/DLlib/Overview/install.md b/docs/readthedocs/source/doc/DLlib/Overview/install.md
deleted file mode 100644
index a64f3892..00000000
--- a/docs/readthedocs/source/doc/DLlib/Overview/install.md
+++ /dev/null
@@ -1,41 +0,0 @@
-# Installation
-
-
-## Scala
-
-Refer to [BigDl Install guide for Scala](../../UserGuide/scala.md).
-
-
-## Python
-
-
-### Install a Stable Release
-
-Run below command to install _bigdl-dllib_.
-
-```bash
-conda create -n my_env python=3.7
-conda activate my_env
-pip install bigdl-dllib
-```
-
-### Install Nightly build version
-
-You can install the latest nightly build of bigdl-dllib as follows:
-```bash
-pip install --pre --upgrade bigdl-dllib
-```
-
-### Verify your install
-
-You may verify if the installation is successful using the interactive Python shell as follows:
-
-* Type `python` in the command line to start a REPL.
-* Try to run the example code below to verify the installation:
-
-  ```python
-  from bigdl.dllib.utils.nncontext import *
-
-  sc = init_nncontext()  # Initiation of bigdl-dllib on the underlying cluster.
-  ```
-
diff --git a/docs/readthedocs/source/doc/DLlib/Overview/keras-api.md b/docs/readthedocs/source/doc/DLlib/Overview/keras-api.md
deleted file mode 100644
index 71504d0b..00000000
--- a/docs/readthedocs/source/doc/DLlib/Overview/keras-api.md
+++ /dev/null
@@ -1,187 +0,0 @@
-# Keras-Like API
-
-## 1. Introduction
-[DLlib](dllib.md) provides __Keras-like API__ based on [__Keras 1.2.2__](https://faroit.github.io/keras-docs/1.2.2/) for distributed deep learning on Apache Spark. Users can easily use the Keras-like API to create a neural network model, and train, evaluate or tune it in a distributed fashion on Spark.
-
-To define a model in Scala using the Keras-like API, one just needs to import the following packages:
-
-```scala
-import com.intel.analytics.bigdl.dllib.keras.layers._
-import com.intel.analytics.bigdl.dllib.keras.models._
-import com.intel.analytics.bigdl.dllib.utils.Shape
-```
-
-One of the highlighted features with regard to the new API is __shape inference__. Users only need to specify the input shape (a `Shape` object __excluding__ batch dimension, for example, `inputShape=Shape(3, 4)` for 3D input) for the first layer of a model and for the remaining layers, the input dimension will be automatically inferred.
-
----
-## 2. LeNet Example
-Here we use the Keras-like API to define a LeNet CNN model and train it on the MNIST dataset:
-
-```scala
-import com.intel.analytics.bigdl.numeric.NumericFloat
-import com.intel.analytics.bigdl.dllib.keras.layers._
-import com.intel.analytics.bigdl.dllib.keras.models._
-import com.intel.analytics.bigdl.dllib.utils.Shape
-
-val model = Sequential()
-model.add(Reshape(Array(1, 28, 28), inputShape = Shape(28, 28, 1)))
-model.add(Convolution2D(6, 5, 5, activation = "tanh").setName("conv1_5x5"))
-model.add(MaxPooling2D())
-model.add(Convolution2D(12, 5, 5, activation = "tanh").setName("conv2_5x5"))
-model.add(MaxPooling2D())
-model.add(Flatten())
-model.add(Dense(100, activation = "tanh").setName("fc1"))
-model.add(Dense(10, activation = "softmax").setName("fc2"))
-
-model.getInputShape().toSingle().toArray // Array(-1, 28, 28, 1)
-model.getOutputShape().toSingle().toArray // Array(-1, 10)
-```
----
-## 3. Shape
-Input and output shapes of a model in the Keras-like API are described by the `Shape` object in Scala, which can be classified into `SingleShape` and `MultiShape`.
-
-`SingleShape` is just a list of Int indicating shape dimensions while `MultiShape` is essentially a list of `Shape`.
-
-Example code to create a shape:
-```scala
-// create a SingleShape
-val shape1 = Shape(3, 4)
-// create a MultiShape consisting of two SingleShape
-val shape2 = Shape(List(Shape(1, 2, 3), Shape(4, 5, 6)))
-```
-You can use method `toSingle()` to cast a `Shape` to a `SingleShape`. Similarly, use `toMulti()` to cast a `Shape` to a `MultiShape`.
-
----
-## 4. Define a model
-You can define a model either using [Sequential API](#sequential-api) or [Functional API](#functional-api). Remember to specify the input shape for the first layer.
-
-After creating a model, you can call the following __methods__:
-
-```scala
-getInputShape()
-```
-```scala
-getOutputShape()
-```
-* Return the input or output shape of a model, which is a [`Shape`](#2-shape) object. For `SingleShape`, the first entry is `-1` representing the batch dimension. For a model with multiple inputs or outputs, it will return a `MultiShape`.
-
-```scala
-setName(name)
-```
-* Set the name of the model.
-
----
-## 5. Sequential API
-The model is described as a linear stack of layers in the Sequential API. Layers can be added into the `Sequential` container one by one and the order of the layers in the model will be the same as the insertion order.
-
-To create a sequential container:
-```scala
-Sequential()
-```
-
-Example code to create a sequential model:
-```scala
-import com.intel.analytics.bigdl.dllib.keras.layers.{Dense, Activation}
-import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-import com.intel.analytics.bigdl.dllib.utils.Shape
-
-val model = Sequential[Float]()
-model.add(Dense[Float](32, inputShape = Shape(128)))
-model.add(Activation[Float]("relu"))
-```
-
----
-## 6. Functional API
-The model is described as a graph in the Functional API. It is more convenient than the Sequential API when defining some complex model (for example, a model with multiple outputs).
-
-To create an input node:
-```scala
-Input(inputShape = null, name = null)
-```
-Parameters:
-
-* `inputShape`: A [`Shape`](#shape) object indicating the shape of the input node, not including batch.
-* `name`: String to set the name of the input node. If not specified, its name will by default to be a generated string.
-
-To create a graph container:
-```scala
-Model(input, output)
-```
-Parameters:
-
-* `input`: An input node or an array of input nodes.
-* `output`: An output node or an array of output nodes.
-
-To merge a list of input __nodes__ (__NOT__ layers), following some merge mode in the Functional API:
-```scala
-import com.intel.analytics.bigdl.dllib.keras.layers.Merge.merge
-
-merge(inputs, mode = "sum", concatAxis = -1) // This will return an output NODE.
-```
-
-Parameters:
-
-* `inputs`: A list of node instances. Must be more than one node.
-* `mode`: Merge mode. String, must be one of: 'sum', 'mul', 'concat', 'ave', 'cos', 'dot', 'max'. Default is 'sum'.
-* `concatAxis`: Int, axis to use when concatenating nodes. Only specify this when merge mode is 'concat'. Default is -1, meaning the last axis of the input.
-
-Example code to create a graph model:
-```scala
-import com.intel.analytics.bigdl.dllib.keras.layers.{Dense, Input}
-import com.intel.analytics.bigdl.dllib.keras.layers.Merge.merge
-import com.intel.analytics.bigdl.dllib.keras.models.Model
-import com.intel.analytics.bigdl.dllib.utils.Shape
-
-// instantiate input nodes
-val input1 = Input[Float](inputShape = Shape(8))
-val input2 = Input[Float](inputShape = Shape(6))
-// call inputs() with an input node and get an output node
-val dense1 = Dense[Float](10).inputs(input1)
-val dense2 = Dense[Float](10).inputs(input2)
-// merge two nodes following some merge mode
-val output = merge(inputs = List(dense1, dense2), mode = "sum")
-// create a graph container
-val model = Model[Float](Array(input1, input2), output)
-```
-
-## 7. Persistence
-This section describes how to save and load the Keras-like API.
-
-### 7.1 save
-To save a Keras model, you call the method `saveModel(path)`.
-
-**Scala:**
-```scala
-import com.intel.analytics.bigdl.dllib.keras.layers.{Dense, Activation}
-import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-
-val model = Sequential[Float]()
-model.add(Dense[Float](32, inputShape = Shape(128)))
-model.add(Activation[Float]("relu"))
-model.saveModel("/tmp/seq.model")
-```
-**Python:**
-```python
-import bigdl.dllib.keras.Sequential
-from bigdl.dllib.keras.layer import Dense
-
-model = Sequential()
-model.add(Dense(input_shape=(32, )))
-model.saveModel("/tmp/seq.model")
-```
-
-### 7.2 load
-To load a saved Keras model, you call the method `load_model(path)`.
-
-**Scala:**
-```scala
-import com.intel.analytics.bigdl.dllib.keras.Models
-
-val model = Models.loadModel[Float]("/tmp/seq.model")
-```
-
-**Python:**
-```python
-from bigdl.dllib.keras.models
-model = load_model("/tmp/seq.model")
-```
diff --git a/docs/readthedocs/source/doc/DLlib/Overview/nnframes.md b/docs/readthedocs/source/doc/DLlib/Overview/nnframes.md
deleted file mode 100644
index 8d70be13..00000000
--- a/docs/readthedocs/source/doc/DLlib/Overview/nnframes.md
+++ /dev/null
@@ -1,441 +0,0 @@
-# Spark ML Pipeline Support
-
-## 1. NNFrames Overview
-
-`NNFrames` in [DLlib](dllib.md) provides Spark DataFrame and ML Pipeline support of distributed deep learning on Apache Spark. It includes both Python and Scala interfaces, and is compatible with both Spark 2.x and Spark 3.x.
-
-**Examples**
-
-The examples are included in the DLlib source code.
-
-- image classification: model inference using pre-trained Inception v1 model. (See [Python version](https://github.com/intel-analytics/BigDL/tree/branch-2.0/python/dllib/examples/nnframes/imageInference))
-- image classification: transfer learning from pre-trained Inception v1 model. (See [Python version](https://github.com/intel-analytics/BigDL/tree/branch-2.0/python/dllib/examples/nnframes/imageTransferLearning))
-
-## 2. Primary APIs
-
-- **NNEstimator and NNModel**
-
-  BigDL DLLib provides `NNEstimator` for model training with Spark DataFrame, which provides high level API for training a BigDL Model with the Apache Spark [Estimator](https://spark.apache.org/docs/2.1.1/ml-pipeline.html#estimators) and [Transfomer](https://spark.apache.org/docs/2.1.1/ml-pipeline.html#transformers) pattern, thus users can conveniently fit BigDL DLLib into a ML pipeline. The fit result of `NNEstimator` is a NNModel, which is a Spark ML Transformer.
-
-- **NNClassifier and NNClassifierModel**
-
-  `NNClassifier` and `NNClassifierModel`extends `NNEstimator` and `NNModel` and focus on classification tasks, where both label column and prediction column are of Double type.
-
-- **NNImageReader**
-
-  NNImageReader loads image into Spark DataFrame.
-
----
-### 2.1 NNEstimator
-
-**Scala:**
-
-```scala
-val estimator = NNEstimator(model, criterion)
-```
-
-**Python:**
-
-```python
-estimator = NNEstimator(model, criterion)
-```
-
-`NNEstimator` extends `org.apache.spark.ml.Estimator` and supports training a BigDL model with Spark DataFrame data. It can be integrated into a standard Spark ML Pipeline
-to allow users to combine the components of BigDL and Spark MLlib. 
-
-`NNEstimator` supports different feature and label data types through `Preprocessing`. During fit (training), NNEstimator will extract feature and label data from input DataFrame and use the `Preprocessing` to convert data for the model, typically converts the feature and label to Tensors or converts the (feature, option[Label]) tuple to a BigDL `Sample`. 
-
-Each`Preprocessing` conducts a data conversion step in the preprocessing phase, multiple `Preprocessing` can be combined into a `ChainedPreprocessing`. Some pre-defined 
-`Preprocessing` for popular data types like Image, Array or Vector are provided in package `com.intel.analytics.bigdl.dllib.feature`, while user can also develop customized `Preprocessing`.
-
-NNEstimator and NNClassifier also supports setting the caching level for the training data. Options are "DRAM", "PMEM" or "DISK_AND_DRAM". If DISK_AND_DRAM(numSlice) is used, only 1/numSlice data will be loaded into memory during training time. By default, DRAM mode is used and all data are cached in memory.
-
-By default, `SeqToTensor` is used to convert an array or Vector to a 1-dimension Tensor. Using the `Preprocessing` allows `NNEstimator` to cache only the raw data and decrease the memory consumption during feature conversion and training, it also enables the model to digest extra data types that DataFrame does not support currently.
-
-More concrete examples are available in package `com.intel.analytics.bigdl.dllib.examples.nnframes`
-
-`NNEstimator` can be created with various parameters for different scenarios.
-
-- `NNEstimator(model, criterion)`
-
-   Takes only model and criterion and use `SeqToTensor` as feature and label `Preprocessing`. `NNEstimator` will extract the data from feature and label columns (only Scalar, Array[_] or Vector data type are supported) and convert each feature/label to 1-dimension Tensor. The tensors will be combined into BigDL `Sample` and send to model for training.
-
-- `NNEstimator(model, criterion, featureSize: Array[Int], labelSize: Array[Int])`
-
-   Takes model, criterion, featureSize(Array of Int) and labelSize(Array of Int). `NNEstimator` will extract the data from feature and label columns (only Scalar, Array[_] or Vector data type are supported) and convert each feature/label to Tensor according to the specified Tensor size.
-
-- `NNEstimator(model, criterion, featureSize: Array[Array[Int]], labelSize: Array[Int])`
-
-   This is the interface for multi-input model. It takes model, criterion, featureSize(Array of Int Array) and labelSize(Array of Int). `NNEstimator` will extract the data from feature and label columns (only Scalar, Array[_] or Vector data type are supported) and convert each feature/label to Tensor according to the specified Tensor size.
-
-- `NNEstimator(model, criterion, featurePreprocessing: Preprocessing[F, Tensor[T]],
-labelPreprocessing: Preprocessing[F, Tensor[T]])`
-
-   Takes model, criterion, featurePreprocessing and labelPreprocessing.  `NNEstimator` will extract the data from feature and label columns and convert each feature/label to Tensor with the featurePreprocessing and labelPreprocessing. This constructor provides more flexibility in supporting extra data types.
-
-Meanwhile, for advanced use cases (e.g. model with multiple input tensor), `NNEstimator` supports: `setSamplePreprocessing(value: Preprocessing[(Any, Option[Any]), Sample[T]])` to directly compose Sample according to user-specified Preprocessing.
-
-
-**Scala Example:**
-```scala
-import com.intel.analytics.bigdl.dllib.nn._
-import com.intel.analytics.bigdl.dllib.nnframes.NNEstimator
-import com.intel.analytics.bigdl.dllib.tensor.TensorNumericMath.TensorNumeric.NumericFloat
-
-val model = Sequential().add(Linear(2, 2))
-val criterion = MSECriterion()
-val estimator = NNEstimator(model, criterion)
-  .setLearningRate(0.2)
-  .setMaxEpoch(40)
-val data = sc.parallelize(Seq(
-  (Array(2.0, 1.0), Array(1.0, 2.0)),
-  (Array(1.0, 2.0), Array(2.0, 1.0)),
-  (Array(2.0, 1.0), Array(1.0, 2.0)),
-  (Array(1.0, 2.0), Array(2.0, 1.0))))
-val df = sqlContext.createDataFrame(data).toDF("features", "label")
-val nnModel = estimator.fit(df)
-nnModel.transform(df).show(false)
-```
-
-**Python Example:**
-```python
-from bigdl.dllib.nn.layer import *
-from bigdl.dllib.nn.criterion import *
-from bigdl.dllib.utils.common import *
-from bigdl.dllib.nnframes.nn_classifier import *
-from bigdl.dllib.feature.common import *
-
-data = self.sc.parallelize([
-    ((2.0, 1.0), (1.0, 2.0)),
-    ((1.0, 2.0), (2.0, 1.0)),
-    ((2.0, 1.0), (1.0, 2.0)),
-    ((1.0, 2.0), (2.0, 1.0))])
-
-schema = StructType([
-    StructField("features", ArrayType(DoubleType(), False), False),
-    StructField("label", ArrayType(DoubleType(), False), False)])
-df = self.sqlContext.createDataFrame(data, schema)
-model = Sequential().add(Linear(2, 2))
-criterion = MSECriterion()
-estimator = NNEstimator(model, criterion, SeqToTensor([2]), ArrayToTensor([2]))\
-    .setBatchSize(4).setLearningRate(0.2).setMaxEpoch(40) \
-nnModel = estimator.fit(df)
-res = nnModel.transform(df)
-```
-
-***Example with multi-inputs Model.***
-This example trains a model with 3 inputs. And users can use VectorAssembler from Spark MLlib to combine different fields. With the specified sizes for each model input, NNEstiamtor and NNClassifer will split the input features data and send tensors to corresponding inputs.
-
-```python
-
-from bigdl.dllib.utils.common import *
-from bigdl.dllib.nnframes.nn_classifier import *
-from bigdl.dllib.feature.common import *
-from bigdl.dllib.keras.objectives import SparseCategoricalCrossEntropy
-from bigdl.dllib.keras.optimizers import Adam
-from bigdl.dllib.keras.layers import *
-from bigdl.dllib.nncontext import *
-
-from pyspark.ml.linalg import Vectors
-from pyspark.ml.feature import VectorAssembler
-from pyspark.sql import SparkSession
-
-sparkConf = init_spark_conf().setAppName("testNNEstimator").setMaster('local[1]')
-sc = init_nncontext(sparkConf)
-spark = SparkSession\
-    .builder\
-    .getOrCreate()
-
-df = spark.createDataFrame(
-    [(1, 35, 109.0, Vectors.dense([2.0, 5.0, 0.5, 0.5]), 0.0),
-     (2, 58, 2998.0, Vectors.dense([4.0, 10.0, 0.5, 0.5]), 1.0),
-     (3, 18, 123.0, Vectors.dense([3.0, 15.0, 0.5, 0.5]), 0.0)],
-    ["user", "age", "income", "history", "label"])
-
-assembler = VectorAssembler(
-    inputCols=["user", "age", "income", "history"],
-    outputCol="features")
-
-df = assembler.transform(df)
-
-x1 = Input(shape=(1,))
-x2 = Input(shape=(2,))
-x3 = Input(shape=(2, 2,))
-
-user_embedding = Embedding(5, 10)(x1)
-flatten = Flatten()(user_embedding)
-dense1 = Dense(2)(x2)
-gru = LSTM(4, input_shape=(2, 2))(x3)
-
-merged = merge([flatten, dense1, gru], mode="concat")
-zy = Dense(2)(merged)
-
-zmodel = Model([x1, x2, x3], zy)
-criterion = SparseCategoricalCrossEntropy()
-classifier = NNEstimator(zmodel, criterion, [[1], [2], [2, 2]]) \
-    .setOptimMethod(Adam()) \
-    .setLearningRate(0.1)\
-    .setBatchSize(2) \
-    .setMaxEpoch(10)
-
-nnClassifierModel = classifier.fit(df)
-print(nnClassifierModel.getBatchSize())
-res = nnClassifierModel.transform(df).collect()
-
-```
-
----
-
-### 2.2 NNModel
-**Scala:**
-```scala
-val nnModel = NNModel(bigDLModel)
-```
-
-**Python:**
-```python
-nn_model = NNModel(bigDLModel)
-```
-
-`NNModel` extends Spark's ML
-[Transformer](https://spark.apache.org/docs/2.1.1/ml-pipeline.html#transformers). User can invoke `fit` in `NNEstimator` to get a `NNModel`, or directly compose a `NNModel` from BigDLModel. It enables users to wrap a pre-trained BigDL Model into a NNModel, and use it as a transformer in your Spark ML pipeline to predict the results for `DataFrame (DataSet)`. 
-
-`NNModel` can be created with various parameters for different scenarios.
-
-- `NNModel(model)`
-
-   Takes only model and use `SeqToTensor` as feature Preprocessing. `NNModel` will extract the data from feature column (only Scalar, Array[_] or Vector data type are supported) and convert each feature to 1-dimension Tensor. The tensors will be sent to model for inference.
-
-- `NNModel(model, featureSize: Array[Int])`
-
-   Takes model and featureSize(Array of Int). `NNModel` will extract the data from feature column (only Scalar, Array[_] or Vector data type are supported) and convert each feature to Tensor according to the specified Tensor size. User can also set featureSize as Array[Array[Int]] for multi-inputs model.
-
-- `NNModel(model, featurePreprocessing: Preprocessing[F, Tensor[T]])`
-
-   Takes model and featurePreprocessing. `NNModel` will extract the data from feature column and convert each feature to Tensor with the featurePreprocessing. This constructor provides more flexibility in supporting extra data types.
-
-Meanwhile, for advanced use cases (e.g. model with multiple input tensor), `NNModel` supports: `setSamplePreprocessing(value: Preprocessing[Any, Sample[T]])`to directly compose Sample according to user-specified Preprocessing.
-
-We can get model from `NNModel` by: 
-
-**Scala:**
-```scala
-val model = nnModel.getModel()
-```
-
-**Python:**
-```python
-model = nn_model.getModel()
-```
-
----
-
-### 2.3 NNClassifier
-**Scala:**
-```scala
-val classifer =  NNClassifer(model, criterion)
-```
-
-**Python:**
-```python
-classifier = NNClassifer(model, criterion)
-```
-
-`NNClassifier` is a specialized `NNEstimator` that simplifies the data format for classification tasks where the label space is discrete. It only supports label column of
-DoubleType, and the fitted `NNClassifierModel` will have the prediction column of DoubleType.
-
-* `model` BigDL module to be optimized in the fit() method
-* `criterion` the criterion used to compute the loss and the gradient
-
-`NNClassifier` can be created with various parameters for different scenarios.
-
-- `NNClassifier(model, criterion)`
-
-   Takes only model and criterion and use `SeqToTensor` as feature and label Preprocessing. `NNClassifier` will extract the data from feature and label columns (only Scalar, Array[_] or Vector data type are supported) and convert each feature/label to 1-dimension Tensor. The tensors will be combined into BigDL samples and send to model for   training.
-
-- `NNClassifier(model, criterion, featureSize: Array[Int])`
-
-   Takes model, criterion, featureSize(Array of Int). `NNClassifier` will extract the data from feature and label columns and convert each feature to Tensor according to the specified Tensor size. `ScalarToTensor` is used to convert the label column. User can also set featureSize as Array[Array[Int]] for multi-inputs model.
-
-- `NNClassifier(model, criterion, featurePreprocessing: Preprocessing[F, Tensor[T]])`
-
-   Takes model, criterion and featurePreprocessing.  `NNClassifier` will extract the data from feature and label columns and convert each feature to Tensor with the featurePreprocessing. This constructor provides more flexibility in supporting extra data types.
-
-Meanwhile, for advanced use cases (e.g. model with multiple input tensor), `NNClassifier` supports `setSamplePreprocessing(value: Preprocessing[(Any, Option[Any]), Sample[T]])` to directly compose Sample with user-specified Preprocessing.
-
-**Scala example:**
-```scala
-import com.intel.analytics.bigdl.dllib.nn._
-import com.intel.analytics.bigdl.dllib.nnframes.NNClassifier
-import com.intel.analytics.bigdl.dllib.tensor.TensorNumericMath.TensorNumeric.NumericFloat
-
-val model = Sequential().add(Linear(2, 2))
-val criterion = MSECriterion()
-val estimator = NNClassifier(model, criterion)
-  .setLearningRate(0.2)
-  .setMaxEpoch(40)
-val data = sc.parallelize(Seq(
-  (Array(0.0, 1.0), 1.0),
-  (Array(1.0, 0.0), 2.0),
-  (Array(0.0, 1.0), 1.0),
-  (Array(1.0, 0.0), 2.0)))
-val df = sqlContext.createDataFrame(data).toDF("features", "label")
-val dlModel = estimator.fit(df)
-dlModel.transform(df).show(false)
-```
-
-**Python Example:**
-
-```python
-from bigdl.dllib.nn.layer import *
-from bigdl.dllib.nn.criterion import *
-from bigdl.dllib.utils.common import *
-from bigdl.dllib.nnframes.nn_classifier import *
-from pyspark.sql.types import *
-
-#Logistic Regression with BigDL layers and NNClassifier
-model = Sequential().add(Linear(2, 2)).add(LogSoftMax())
-criterion = ClassNLLCriterion()
-estimator = NNClassifier(model, criterion, [2]).setBatchSize(4).setMaxEpoch(10)
-data = sc.parallelize([
-    ((0.0, 1.0), 1.0),
-    ((1.0, 0.0), 2.0),
-    ((0.0, 1.0), 1.0),
-    ((1.0, 0.0), 2.0)])
-
-schema = StructType([
-    StructField("features", ArrayType(DoubleType(), False), False),
-    StructField("label", DoubleType(), False)])
-df = spark.createDataFrame(data, schema)
-dlModel = estimator.fit(df)
-res = dlModel.transform(df).collect()
-```
-
-### 2.4 NNClassifierModel ##
-
-**Scala:**
-```scala
-val nnClassifierModel = NNClassifierModel(model, featureSize)
-```
-
-**Python:**
-```python
-nn_classifier_model = NNClassifierModel(model)
-```
-
-NNClassifierModel is a specialized `NNModel` for classification tasks. Both label and prediction column will have the datatype of Double.
-
-`NNClassifierModel` can be created with various parameters for different scenarios.
-
-- `NNClassifierModel(model)`
-
-   Takes only model and use `SeqToTensor` as feature Preprocessing. `NNClassifierModel` will extract the data from feature column (only Scalar, Array[_] or Vector data type are supported) and convert each feature to 1-dimension Tensor. The tensors will be sent to model for inference.
-
-- `NNClassifierModel(model, featureSize: Array[Int])`
-
-   Takes model and featureSize(Array of Int). `NNClassifierModel` will extract the data from feature column (only Scalar, Array[_] or Vector data type are supported) and convert each feature to Tensor according to the specified Tensor size. User can also set featureSize as Array[Array[Int]] for multi-inputs model.
-
-- `NNClassifierModel(model, featurePreprocessing: Preprocessing[F, Tensor[T]])`
-
-   Takes model and featurePreprocessing. `NNClassifierModel` will extract the data from feature column and convert each feature to Tensor with the featurePreprocessing. This constructor provides more flexibility in supporting extra data types.
-
-Meanwhile, for advanced use cases (e.g. model with multiple input tensor), `NNClassifierModel` supports: `setSamplePreprocessing(value: Preprocessing[Any, Sample[T]])`to directly compose Sample according to user-specified Preprocessing.
-
----
-
-### 2.5 Hyperparameter Setting
-
-Prior to the commencement of the training process, you can modify the optimization algorithm, batch size, the epoch number of your training, and learning rate to meet your goal or `NNEstimator`/`NNClassifier` will use the default value.
-
-Continue the codes above, NNEstimator and NNClassifier can be set in the same way.
-
-**Scala:**
-
-```scala
-//for esitmator
-estimator.setBatchSize(4).setMaxEpoch(10).setLearningRate(0.01).setOptimMethod(new Adam())
-//for classifier
-classifier.setBatchSize(4).setMaxEpoch(10).setLearningRate(0.01).setOptimMethod(new Adam())
-```
-**Python:**
-
-```python
-# for esitmator
-estimator.setBatchSize(4).setMaxEpoch(10).setLearningRate(0.01).setOptimMethod(Adam())
-# for classifier
-classifier.setBatchSize(4).setMaxEpoch(10).setLearningRate(0.01).setOptimMethod(Adam())
-
-```
-
-### 2.6 Training
-
-NNEstimator/NNCLassifer supports training with Spark's [DataFrame/DataSet](https://spark.apache.org/docs/latest/sql-programming-guide.html#datasets-and-dataframes)
-
-Suppose `df` is the training data, simple call `fit` method and let BigDL DLLib train the model for you.
-
-**Scala:**
-
-```scala
-//get a NNClassifierModel
-val nnClassifierModel = classifier.fit(df)
-```
-
-**Python:**
-
-```python
-# get a NNClassifierModel
-nnClassifierModel = classifier.fit(df)
-```
-User may also set validation DataFrame and validation frequency through `setValidation` method. Train summay and validation summary can also be configured to log the training process for visualization in Tensorboard.
-
-
-### 2.7 Prediction
-
-Since `NNModel`/`NNClassifierModel` inherits from Spark's `Transformer` abstract class, simply call `transform` method on `NNModel`/`NNClassifierModel` to make prediction.
-
-**Scala:**
-
-```scala
-nnModel.transform(df).show(false)
-```
-
-**Python:**
-
-```python
-nnModel.transform(df).show(false)
-```
-
-For the complete examples of NNFrames, please refer to: 
-[Scala examples](https://github.com/intel-analytics/BigDL/tree/branch-2.0/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/example/nnframes)
-[Python examples](https://github.com/intel-analytics/BigDL/tree/branch-2.0/python/dllib/examples/nnframes)
-
-
-### 2.8 NNImageReader
-
-`NNImageReader` is the primary DataFrame-based image loading interface, defining API to read images into DataFrame.
-
-Scala:
-```scala
-val imageDF = NNImageReader.readImages(imageDirectory, sc)
-```
-
-Python:
-```python
-image_frame = NNImageReader.readImages(image_path, self.sc)
-```
-
-The output DataFrame contains a sinlge column named "image". The schema of "image" column can be accessed from `com.intel.analytics.bigdl.dllib.nnframes.DLImageSchema.byteSchema`. Each record in "image" column represents one image record, in the format of Row(origin, height, width, num of channels, mode, data), where origin contains the URI for the image file, and `data` holds the original file bytes for the image file. `mode` represents the OpenCV-compatible type: CV_8UC3, CV_8UC1 in most cases.
-
-```scala
-val byteSchema = StructType(
-  StructField("origin", StringType, true) ::
-    StructField("height", IntegerType, false) ::
-    StructField("width", IntegerType, false) ::
-    StructField("nChannels", IntegerType, false) ::
-    // OpenCV-compatible type: CV_8UC3, CV_32FC3 in most cases
-    StructField("mode", IntegerType, false) ::
-    // Bytes in OpenCV-compatible order: row-wise BGR in most cases
-    StructField("data", BinaryType, false) :: Nil)
-```
-
-After loading the image, user can compose the preprocess steps with the `Preprocessing` defined in `com.intel.analytics.bigdl.dllib.feature.image`.
diff --git a/docs/readthedocs/source/doc/DLlib/Overview/visualization.md b/docs/readthedocs/source/doc/DLlib/Overview/visualization.md
deleted file mode 100644
index 61867302..00000000
--- a/docs/readthedocs/source/doc/DLlib/Overview/visualization.md
+++ /dev/null
@@ -1,40 +0,0 @@
-## Visualizing training with TensorBoard
-With the summary info generated, we can then use [TensorBoard](https://pypi.python.org/pypi/tensorboard) to visualize the behaviors of the BigDL program.
-
-* **Installing TensorBoard**
-
-  Prerequisites:
-
-  1. Python verison: 2.7, 3.4, 3.5, or 3.6
-  2. Pip version >= 9.0.1
-
-     To install TensorBoard using Python 2, you may run the command:
-     ```bash
-     pip install tensorboard==1.0.0a4
-     ```
-
-     To install TensorBoard using Python 3, you may run the command:
-     ```bash
-     pip3 install tensorboard==1.0.0a4
-     ```
-
-     Please refer to [this page](https://github.com/intel-analytics/BigDL/tree/master/spark/dl/src/main/scala/com/intel/analytics/bigdl/visualization#known-issues) for possible issues when installing TensorBoard.
-
-* **Launching TensorBoard**
-
-  You can launch TensorBoard using the command below:
-  ```bash
-  tensorboard --logdir=/tmp/bigdl_summaries
-  ```
-  After that, navigate to the TensorBoard dashboard using a browser. You can find the URL in the console output after TensorBoard is successfully launched; by default the URL is http://your_node:6006
-
-* **Visualizations in TensorBoard**
-
-  Within the TensorBoard dashboard, you will be able to read the visualizations of each run, including the “Loss” and “Throughput” curves under the SCALARS tab (as illustrated below):
-  ![](../Image/tensorboard-scalar.png)
-
-  And “weights”, “bias”, “gradientWeights” and “gradientBias” under the DISTRIBUTIONS and HISTOGRAMS tabs (as illustrated below):
-  ![](../Image/tensorboard-histo1.png)
-  ![](../Image/tensorboard-histo2.png)
-
----
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/DLlib/QuickStart/dllib-quickstart.md b/docs/readthedocs/source/doc/DLlib/QuickStart/dllib-quickstart.md
deleted file mode 100644
index 4526fb24..00000000
--- a/docs/readthedocs/source/doc/DLlib/QuickStart/dllib-quickstart.md
+++ /dev/null
@@ -1,70 +0,0 @@
-# DLlib Quickstarts
-
----
-
-![](../../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/dllib/colab-notebook/dllib_keras_api.ipynb) &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/dllib/colab-notebook/dllib_keras_api.ipynb)
-
----
-
-**In this guide we will demonstrate how to use _DLlib keras style api_ and _DLlib NNClassifier_ for classification.**
-
-### **Step 0: Prepare Environment**
-
-We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../Overview/chronos.html#install) for more details.
-
-```bash
-conda create -n my_env python=3.7 # "my_env" is conda environment name, you can use any name you like.
-conda activate my_env
-pip install bigdl-dllib
-```
-
-### Step 1: Data loading and processing using Spark DataFrame
-
-```python
-df = spark.read.csv(path, sep=',', inferSchema=True).toDF("num_times_pregrant", "plasma_glucose", "blood_pressure", "skin_fold_thickness", "2-hour_insulin", "body_mass_index", "diabetes_pedigree_function", "age", "class")
-```
-
-We process the data using Spark API and split the data into train and test set.
-
-```python
-vecAssembler = VectorAssembler(outputCol="features")
-vecAssembler.setInputCols(["num_times_pregrant", "plasma_glucose", "blood_pressure", "skin_fold_thickness", "2-hour_insulin", "body_mass_index", "diabetes_pedigree_function", "age"])
-train_df = vecAssembler.transform(df)
-
-changedTypedf = train_df.withColumn("label", train_df["class"].cast(DoubleType())+lit(1))\
-    .select("features", "label")
-(trainingDF, validationDF) = changedTypedf.randomSplit([0.9, 0.1])
-```
-
-### Step 3: Define classification model using DLlib keras style api
-
-```python
-x1 = Input(shape=(8,))
-dense1 = Dense(12, activation='relu')(x1)
-dense2 = Dense(8, activation='relu')(dense1)
-dense3 = Dense(2)(dense2)
-model = Model(x1, dense3)
-```
-
-### Step 4: Create NNClassifier and Fit NNClassifier
-
-```python
-classifier = NNClassifier(model, CrossEntropyCriterion(), [8]) \
-    .setOptimMethod(Adam()) \
-    .setBatchSize(32) \
-    .setMaxEpoch(150)
-
-nnModel = classifier.fit(trainingDF)
-```
-
-### Step 5: Evaluate the trained model
-
-```python
-predictionDF = nnModel.transform(validationDF).cache()
-predictionDF.sample(False, 0.1).show()
-
-
-evaluator = MulticlassClassificationEvaluator(
-    labelCol="label", predictionCol="prediction", metricName="accuracy")
-accuracy = evaluator.evaluate(predictionDF)
-```
diff --git a/docs/readthedocs/source/doc/DLlib/QuickStart/index.md b/docs/readthedocs/source/doc/DLlib/QuickStart/index.md
deleted file mode 100644
index 06a16e3a..00000000
--- a/docs/readthedocs/source/doc/DLlib/QuickStart/index.md
+++ /dev/null
@@ -1,9 +0,0 @@
-# DLlib Tutorial
-
-
-- [**Python Quickstart Notebook**](./python-getting-started.html)
-
-    > ![](../../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/dllib/colab-notebook/dllib_keras_api.ipynb) &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/dllib/colab-notebook/dllib_keras_api.ipynb)
-
-    In this guide we will demonstrate how to use _DLlib keras style api_ and _DLlib NNClassifier_ for classification.
-
diff --git a/docs/readthedocs/source/doc/DLlib/QuickStart/python-getting-started.md b/docs/readthedocs/source/doc/DLlib/QuickStart/python-getting-started.md
deleted file mode 100644
index b6f6c736..00000000
--- a/docs/readthedocs/source/doc/DLlib/QuickStart/python-getting-started.md
+++ /dev/null
@@ -1,218 +0,0 @@
-# DLLib Python Getting Start Guide
-
-## 1. Code initialization
-```nncontext``` is the main entry for provisioning the dllib program on the underlying cluster (such as K8s or Hadoop cluster), or just on a single laptop.
-
-It is recommended to initialize `nncontext` at the beginning of your program:
-```
-from bigdl.dllib.nncontext import *
-sc = init_nncontext()
-```
-For more information about ```nncontext```, please refer to [nncontext](../Overview/dllib.md#initialize-nn-context)
-
-## 2. Distributed Data Loading
-
-#### Using Spark Dataframe APIs
-DLlib supports Spark Dataframes as the input to the distributed training, and as
-the input/output of the distributed inference. Consequently, the user can easily
-process large-scale dataset using Apache Spark, and directly apply AI models on
-the distributed (and possibly in-memory) Dataframes without data conversion or serialization
-
-We create Spark session so we can use Spark API to load and process the data
-```
-spark = SQLContext(sc)
-```
-
-1. We can use Spark API to load the data into Spark DataFrame, eg. read csv file into Spark DataFrame
-   ```
-   path = "pima-indians-diabetes.data.csv"
-   spark.read.csv(path)
-   ```
-
-   If the feature column for the model is a Spark ML Vector. Please assemble related columns into a Vector and pass it to the model. eg.
-   ```
-   from pyspark.ml.feature import VectorAssembler
-   vecAssembler = VectorAssembler(outputCol="features")
-   vecAssembler.setInputCols(["num_times_pregrant", "plasma_glucose", "blood_pressure", "skin_fold_thickness", "2-hour_insulin", "body_mass_index", "diabetes_pedigree_function", "age"])
-   assemble_df = vecAssembler.transform(df)
-   assemble_df.withColumn("label", col("class").cast(DoubleType) + lit(1))
-   ```
-
-2. If the training data is image, we can use DLLib api to load image into Spark DataFrame. Eg.
-   ```
-   imgPath = "cats_dogs/"
-   imageDF = NNImageReader.readImages(imgPath, sc)
-   ```
-
-   It will load the images and generate feature tensors automatically. Also we need generate labels ourselves. eg:
-   ```
-   labelDF = imageDF.withColumn("name", getName(col("image"))) \
-           .withColumn("label", getLabel(col('name')))
-   ```
-
-   Then split the Spark DataFrame into traing part and validation part
-   ```
-   (trainingDF, validationDF) = labelDF.randomSplit([0.9, 0.1])
-   ```
-
-## 3. Model Definition
-
-#### Using Keras-like APIs
-
-To define a model, you can use the [Keras Style API](../Overview/keras-api.md).
-```
-x1 = Input(shape=[8])
-dense1 = Dense(12, activation="relu")(x1)
-dense2 = Dense(8, activation="relu")(dense1)
-dense3 = Dense(2)(dense2)
-dmodel = Model(input=x1, output=dense3)
-```
-
-After creating the model, you will have to decide which loss function to use in training.
-
-Now you can use `compile` function of the model to set the loss function, optimization method.
-```
-dmodel.compile(optimizer = "adam", loss = "sparse_categorical_crossentropy")
-```
-
-Now the model is built and ready to train.
-
-## 4. Distributed Model Training
-Now you can use 'fit' begin the training, please set the label columns. Model Evaluation can be performed periodically during a training.
-1. If the dataframe is generated using Spark apis, you also need set the feature columns. eg.
-   ```
-   model.fit(df, feature_cols=["features"], label_cols=["label"], batch_size=4, nb_epoch=1)
-   ```
-   Note: Above model accepts single input(column `features`) and single output(column `label`).
-
-   If your model accepts multiple inputs(eg. column `f1`, `f2`, `f3`), please set the features as below:
-   ```
-   model.fit(df, feature_cols=["f1", "f2"], label_cols=["label"], batch_size=4, nb_epoch=1)
-   ```
-
-   Similarly, if the model accepts multiple outputs(eg. column `label1`, `label2`), please set the label columns as below:
-   ```
-   model.fit(df, feature_cols=["features"], label_cols=["l1", "l2"], batch_size=4, nb_epoch=1)
-   ```
-
-2. If the dataframe is generated using DLLib `NNImageReader`, we don't need set `feature_cols`, we can set `transform` to config how to process the images before training. Eg.
-   ```
-   from bigdl.dllib.feature.image import transforms
-   transformers = transforms.Compose([ImageResize(50, 50), ImageMirror()])
-   model.fit(image_df, label_cols=["label"], batch_size=1, nb_epoch=1, transform=transformers)
-   ```
-   For more details about how to use DLLib keras api to train image data, you may want to refer [ImageClassification](https://github.com/intel-analytics/BigDL/tree/main/python/dllib/examples/keras/image_classification.py)
-
-## 5. Model saving and loading
-When training is finished, you may need to save the final model for later use.
-
-BigDL allows you to save your BigDL model on local filesystem, HDFS, or Amazon s3.
-- **save**
-  ```
-  modelPath = "/tmp/demo/keras.model"
-  dmodel.saveModel(modelPath)
-  ```
-
-- **load**
-  ```
-  loadModel = Model.loadModel(modelPath)
-  preDF = loadModel.predict(df, feature_cols=["features"], prediction_col="predict")
-  ```
-
-You may want to refer [Save/Load](../Overview/keras-api.html#save)
-
-## 6. Distributed evaluation and inference
-After training finishes, you can then use the trained model for prediction or evaluation.
-
-- **inference**
-  1. For dataframe generated by Spark API, please set `feature_cols` and `prediction_col`
-     ```
-     dmodel.predict(df, feature_cols=["features"], prediction_col="predict")
-     ```
-  2. For dataframe generated by `NNImageReader`, please set `prediction_col` and you can set `transform` if needed
-     ```
-     model.predict(df, prediction_col="predict", transform=transformers)
-     ```
-
-- **evaluation**
-
-  Similary for dataframe generated by Spark API, the code is as below:
-  ```
-  dmodel.evaluate(df, batch_size=4, feature_cols=["features"], label_cols=["label"])
-  ```
-
-  For dataframe generated by `NNImageReader`:
-  ```
-  model.evaluate(image_df, batch_size=1, label_cols=["label"], transform=transformers)
-  ```
-
-## 7. Checkpointing and resuming training
-You can configure periodically taking snapshots of the model.
-```
-cpPath = "/tmp/demo/cp"
-dmodel.set_checkpoint(cpPath)
-```
-You can also set ```over_write``` to ```true``` to enable overwriting any existing snapshot files
-
-After training stops, you can resume from any saved point. Choose one of the model snapshots to resume (saved in checkpoint path, details see Checkpointing). Use Models.loadModel to load the model snapshot into an model object.
-```
-loadModel = Model.loadModel(path)
-```
-
-## 8. Monitor your training
-
-- **Tensorboard**
-
-  BigDL provides a convenient way to monitor/visualize your training progress. It writes the statistics collected during training/validation. Saved summary can be viewed via TensorBoard.
-
-  In order to take effect, it needs to be called before fit.
-  ```
-  dmodel.set_tensorboard("./", "dllib_demo")
-  ```
-  For more details, please refer [visulization](../Overview/visualization.md)
-
-## 9. Transfer learning and finetuning
-
-- **freeze and trainable**
-
-  BigDL DLLib supports exclude some layers of model from training.
-  ```
-  dmodel.freeze(layer_names)
-  ```
-  Layers that match the given names will be freezed. If a layer is freezed, its parameters(weight/bias, if exists) are not changed in training process.
-
-  BigDL DLLib also support unFreeze operations. The parameters for the layers that match the given names will be trained(updated) in training process
-  ```
-  dmodel.unFreeze(layer_names)
-  ```
-  For more information, you may refer [freeze](../../PythonAPI/DLlib/freeze.md)
-
-## 10. Hyperparameter tuning
-- **optimizer**
-
-  DLLib supports a list of optimization methods.
-  For more details, please refer [optimization](../../PythonAPI/DLlib/optim-Methods.md)
-
-- **learning rate scheduler**
-
-  DLLib supports a list of learning rate scheduler.
-  For more details, please refer [lr_scheduler](../../PythonAPI/DLlib/learningrate-Scheduler.md)
-
-- **batch size**
-
-  DLLib supports set batch size during training and prediction. We can adjust the batch size to tune the model's accuracy.
-
-- **regularizer**
-
-  DLLib supports a list of regularizers.
-  For more details, please refer [regularizer](../../PythonAPI/DLlib/regularizers.md)
-
-- **clipping**
-
-  DLLib supports gradient clipping operations.
-  For more details, please refer [gradient_clip](../../PythonAPI/DLlib/clipping.md)
-
-## 11. Running program
-```
-python you_app_code.py
-```
diff --git a/docs/readthedocs/source/doc/DLlib/QuickStart/scala-getting-started.md b/docs/readthedocs/source/doc/DLlib/QuickStart/scala-getting-started.md
deleted file mode 100644
index c08bc722..00000000
--- a/docs/readthedocs/source/doc/DLlib/QuickStart/scala-getting-started.md
+++ /dev/null
@@ -1,303 +0,0 @@
-# DLLib Scala Getting Start Guide
-
-## 1. Creating dev environment
-
-#### Scala project (maven & sbt)
-
-- **Maven**
-
-  To use BigDL DLLib to build your own deep learning application, you can use maven to create your project and add bigdl-dllib to your dependency. Please add below code to your pom.xml to add BigDL DLLib as your dependency:
-  ```
-  <dependency>
-      <groupId>com.intel.analytics.bigdl</groupId>
-      <artifactId>bigdl-dllib-spark_2.4.6</artifactId>
-      <version>0.14.0</version>
-  </dependency>
-  ```
-
-- **SBT**
-  ```
-  libraryDependencies += "com.intel.analytics.bigdl" % "bigdl-dllib-spark_2.4.6" % "0.14.0"
-  ```
-  For more information about how to add BigDL dependency, please refer [scala docs](../../UserGuide/scala.md#build-a-scala-project)
-
-#### IDE (Intelij)
-Open up IntelliJ and click File => Open
-
-Navigate to your project. If you have add BigDL DLLib as dependency in your pom.xml.
-The IDE will automatically download it from maven and you are able to run your application.
-
-For more details about how to setup IDE for BigDL project, please refer [IDE Setup Guide](../../UserGuide/develop.html#id2)
-
-
-## 2. Code initialization
-```NNContext``` is the main entry for provisioning the dllib program on the underlying cluster (such as K8s or Hadoop cluster), or just on a single laptop.
-
-It is recommended to initialize `NNContext` at the beginning of your program:
-```
-import com.intel.analytics.bigdl.dllib.NNContext
-import com.intel.analytics.bigdl.dllib.keras.Model
-import com.intel.analytics.bigdl.dllib.keras.models.Models
-import com.intel.analytics.bigdl.dllib.keras.optimizers.Adam
-import com.intel.analytics.bigdl.dllib.nn.ClassNLLCriterion
-import com.intel.analytics.bigdl.dllib.utils.Shape
-import com.intel.analytics.bigdl.dllib.keras.layers._
-import com.intel.analytics.bigdl.numeric.NumericFloat
-import org.apache.spark.ml.feature.VectorAssembler
-import org.apache.spark.sql.SQLContext
-import org.apache.spark.sql.functions._
-import org.apache.spark.sql.types.DoubleType
-
-val sc = NNContext.initNNContext("dllib_demo")
-```
-For more information about ```NNContext```, please refer to [NNContext](../Overview/dllib.md#initialize-nn-context)
-
-## 3. Distributed Data Loading
-
-#### Using Spark Dataframe APIs
-DLlib supports Spark Dataframes as the input to the distributed training, and as
-the input/output of the distributed inference. Consequently, the user can easily
-process large-scale dataset using Apache Spark, and directly apply AI models on
-the distributed (and possibly in-memory) Dataframes without data conversion or serialization
-
-We create Spark session so we can use Spark API to load and process the data
-```
-val spark = new SQLContext(sc)
-```
-
-1. We can use Spark API to load the data into Spark DataFrame, eg. read csv file into Spark DataFrame
-   ```
-   val path = "pima-indians-diabetes.data.csv"
-   val df = spark.read.options(Map("inferSchema"->"true","delimiter"->",")).csv(path)
-         .toDF("num_times_pregrant", "plasma_glucose", "blood_pressure", "skin_fold_thickness", "2-hour_insulin", "body_mass_index", "diabetes_pedigree_function", "age", "class")
-   ```
-
-   If the feature column for the model is a Spark ML Vector. Please assemble related columns into a Vector and pass it to the model. eg.
-   ```
-   val assembler = new VectorAssembler()
-     .setInputCols(Array("num_times_pregrant", "plasma_glucose", "blood_pressure", "skin_fold_thickness", "2-hour_insulin", "body_mass_index", "diabetes_pedigree_function", "age"))
-     .setOutputCol("features")
-   val assembleredDF = assembler.transform(df)
-   val df2 = assembleredDF.withColumn("label",col("class").cast(DoubleType) + lit(1))
-   ```
-
-2. If the training data is image, we can use DLLib api to load image into Spark DataFrame. Eg.
-   ```
-   val createLabel = udf { row: Row =>
-   if (new Path(row.getString(0)).getName.contains("cat")) 1 else 2
-   }
-   val imagePath = "cats_dogs/"
-   val imgDF = NNImageReader.readImages(imagePath, sc)
-   ```
-
-   It will load the images and generate feature tensors automatically. Also we need generate labels ourselves. eg:
-   ```
-   val df = imgDF.withColumn("label", createLabel(col("image")))
-   ```
-
-   Then split the Spark DataFrame into traing part and validation part
-   ```
-   val Array(trainDF, valDF) = df.randomSplit(Array(0.8, 0.2))
-   ```
-
-## 4. Model Definition
-
-#### Using Keras-like APIs
-
-To define a model, you can use the [Keras Style API](../Overview/keras-api.md).
-```
-val x1 = Input(Shape(8))
-val dense1 = Dense(12, activation="relu").inputs(x1)
-val dense2 = Dense(8, activation="relu").inputs(dense1)
-val dense3 = Dense(2).inputs(dense2)
-val dmodel = Model(x1, dense3)
-```
-
-After creating the model, you will have to decide which loss function to use in training.
-
-Now you can use `compile` function of the model to set the loss function, optimization method.
-```
-dmodel.compile(optimizer = new Adam(), loss = ClassNLLCriterion())
-```
-
-Now the model is built and ready to train.
-
-## 5. Distributed Model Training
-Now you can use 'fit' begin the training, please set the label columns. Model Evaluation can be performed periodically during a training.
-1. If the dataframe is generated using Spark apis, you also need set the feature columns. eg.
-   ```
-   model.fit(x=trainDF, batchSize=4, nbEpoch = 2,
-     featureCols = Array("feature1"), labelCols = Array("label"), valX=valDF)
-   ```
-   Note: Above model accepts single input(column `feature1`) and single output(column `label`).
-
-   If your model accepts multiple inputs(eg. column `f1`, `f2`, `f3`), please set the features as below:
-   ```
-   model.fit(x=dataframe, batchSize=4, nbEpoch = 2,
-     featureCols = Array("f1", "f2", "f3"), labelCols = Array("label"))
-   ```
-
-   Similarly, if the model accepts multiple outputs(eg. column `label1`, `label2`), please set the label columns as below:
-   ```
-   model.fit(x=dataframe, batchSize=4, nbEpoch = 2,
-     featureCols = Array("f1", "f2", "f3"), labelCols = Array("label1", "label2"))
-   ```
-
-2. If the dataframe is generated using DLLib `NNImageReader`, we don't need set `featureCols`, we can set `transform` to config how to process the images before training. Eg.
-   ```
-   val transformers = transforms.Compose(Array(ImageResize(50, 50),
-     ImageMirror()))
-   model.fit(x=dataframe, batchSize=4, nbEpoch = 2,
-     labelCols = Array("label"), transform = transformers)
-   ```
-   For more details about how to use DLLib keras api to train image data, you may want to refer [ImageClassification](https://github.com/intel-analytics/BigDL/blob/main/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/example/keras/ImageClassification.scala)
-
-## 6. Model saving and loading
-When training is finished, you may need to save the final model for later use.
-
-BigDL allows you to save your BigDL model on local filesystem, HDFS, or Amazon s3.
-- **save**
-  ```
-  val modelPath = "/tmp/demo/keras.model"
-  dmodel.saveModel(modelPath)
-  ```
-
-- **load**
-  ```
-  val loadModel = Models.loadModel(modelPath)
-
-  val preDF2 = loadModel.predict(valDF, featureCols = Array("features"), predictionCol = "predict")
-  ```
-
-You may want to refer [Save/Load](../Overview/keras-api.html#save)
-
-## 7. Distributed evaluation and inference
-After training finishes, you can then use the trained model for prediction or evaluation.
-
-- **inference**
-  1. For dataframe generated by Spark API, please set `featureCols`
-     ```
-     dmodel.predict(trainDF, featureCols = Array("features"), predictionCol = "predict")
-     ```
-  2. For dataframe generated by `NNImageReader`, no need to set `featureCols` and you can set `transform` if needed
-     ```
-     model.predict(imgDF, predictionCol = "predict", transform = transformers)
-     ```
-
-- **evaluation**
-
-  Similary for dataframe generated by Spark API, the code is as below:
-  ```
-  dmodel.evaluate(trainDF, batchSize = 4, featureCols = Array("features"),
-    labelCols = Array("label"))
-  ```
-
-  For dataframe generated by `NNImageReader`:
-  ```
-  model.evaluate(imgDF, batchSize = 1, labelCols = Array("label"), transform = transformers)
-  ```
-
-## 8. Checkpointing and resuming training
-You can configure periodically taking snapshots of the model.
-```
-val cpPath = "/tmp/demo/cp"
-dmodel.setCheckpoint(cpPath, overWrite=false)
-```
-You can also set ```overWrite``` to ```true``` to enable overwriting any existing snapshot files
-
-After training stops, you can resume from any saved point. Choose one of the model snapshots to resume (saved in checkpoint path, details see Checkpointing). Use Models.loadModel to load the model snapshot into an model object.
-```
-val loadModel = Models.loadModel(path)
-```
-
-## 9. Monitor your training
-
-- **Tensorboard**
-
-  BigDL provides a convenient way to monitor/visualize your training progress. It writes the statistics collected during training/validation. Saved summary can be viewed via TensorBoard.
-
-  In order to take effect, it needs to be called before fit.
-  ```
-  dmodel.setTensorBoard("./", "dllib_demo")
-  ```
-  For more details, please refer [visulization](../Overview/visualization.md)`
-
-## 10. Transfer learning and finetuning
-
-- **freeze and trainable**
-
-  BigDL DLLib supports exclude some layers of model from training.
-  ```
-  dmodel.freeze(layer_names)
-  ```
-  Layers that match the given names will be freezed. If a layer is freezed, its parameters(weight/bias, if exists) are not changed in training process.
-
-  BigDL DLLib also support unFreeze operations. The parameters for the layers that match the given names will be trained(updated) in training process
-  ```
-  dmodel.unFreeze(layer_names)
-  ```
-  For more information, you may refer [freeze](../../PythonAPI/DLlib/freeze.md)
-
-## 11. Hyperparameter tuning
-- **optimizer**
-
-  DLLib supports a list of optimization methods.
-  For more details, please refer [optimization](../../PythonAPI/DLlib/optim-Methods.md)
-
-- **learning rate scheduler**
-
-  DLLib supports a list of learning rate scheduler.
-  For more details, please refer [lr_scheduler](../../PythonAPI/DLlib/learningrate-Scheduler.md)
-
-- **batch size**
-
-  DLLib supports set batch size during training and prediction. We can adjust the batch size to tune the model's accuracy.
-
-- **regularizer**
-
-  DLLib supports a list of regularizers.
-  For more details, please refer [regularizer](../../PythonAPI/DLlib/regularizers.md)
-
-- **clipping**
-
-  DLLib supports gradient clipping operations.
-  For more details, please refer [gradient_clip](../../PythonAPI/DLlib/clipping.md)
-
-## 12. Running program
-You can run a bigdl-dllib program as a standard Spark program (running on either a local machine or a distributed cluster) as follows:
-```
-# Spark local mode
-${BIGDL_HOME}/bin/spark-submit-with-dllib.sh \
-  --master local[2] \
-  --class class_name \
-  jar_path
-
-# Spark standalone mode
-## ${SPARK_HOME}/sbin/start-master.sh
-## check master URL from http://localhost:8080
-${BIGDL_HOME}/bin/spark-submit-with-dllib.sh \
-  --master spark://... \
-  --executor-cores cores_per_executor \
-  --total-executor-cores total_cores_for_the_job \
-  --class class_name \
-  jar_path
-
-# Spark yarn client mode
-${BIGDL_HOME}/bin/spark-submit-with-dllib.sh \
- --master yarn \
- --deploy-mode client \
- --executor-cores cores_per_executor \
- --num-executors executors_number \
- --class class_name \
- jar_path
-
-# Spark yarn cluster mode
-${BIGDL_HOME}/bin/spark-submit-with-dllib.sh \
- --master yarn \
- --deploy-mode cluster \
- --executor-cores cores_per_executor \
- --num-executors executors_number \
- --class class_name
- jar_path
-```
-For more detail about how to run BigDL scala application, please refer to [Scala UserGuide](../../UserGuide/scala.md)
diff --git a/docs/readthedocs/source/doc/DLlib/index.rst b/docs/readthedocs/source/doc/DLlib/index.rst
deleted file mode 100644
index e6876eaf..00000000
--- a/docs/readthedocs/source/doc/DLlib/index.rst
+++ /dev/null
@@ -1,62 +0,0 @@
-BigDL-DLlib
-=========================
-
-**BigDL-DLlib** (or **DLlib** for short) is a distributed deep learning library for Apache Spark; with DLlib, users can write their deep learning applications as standard Spark programs (using either Scala or Python APIs).
-
--------
-
-
-.. grid:: 1 2 2 2
-    :gutter: 2
-
-    .. grid-item-card::
-
-        **Get Started**
-        ^^^
-
-        Documents in these sections helps you getting started quickly with DLLib.
-
-        +++
-        :bdg-link:`DLlib in 5 minutes <./Overview/dllib.html>` |
-        :bdg-link:`Installation <./Overview/install.html>`
-
-    .. grid-item-card::
-
-        **Key Features Guide**
-        ^^^
-
-        Each guide in this section provides you with in-depth information, concepts and knowledges about DLLib key features.
-
-        +++
-
-        :bdg-link:`Keras-Like API <./Overview/keras-api.html>` |
-        :bdg-link:`Spark ML Pipeline <./Overview/nnframes.html>`
-
-    .. grid-item-card::
-
-        **Examples**
-        ^^^
-
-        DLLib Examples and Tutorials.
-
-        +++
-
-        :bdg-link:`Tutorials <./QuickStart/index.html>`
-
-    .. grid-item-card::
-
-        **API Document**
-        ^^^
-
-        API Document provides detailed description of DLLib APIs.
-
-        +++
-
-        :bdg-link:`API Document <../PythonAPI/DLlib/index.html>`
-
-
-..  toctree::
-    :hidden:
-
-    BigDL-DLlib Document <self>
-
diff --git a/docs/readthedocs/source/doc/Friesian/examples.md b/docs/readthedocs/source/doc/Friesian/examples.md
deleted file mode 100644
index 111db0e9..00000000
--- a/docs/readthedocs/source/doc/Friesian/examples.md
+++ /dev/null
@@ -1,70 +0,0 @@
-### Use Cases
-
-
-- **Train a DeepFM model using recsys data**
->![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/deep_fm)
-
----------------------------
-
-- **Run DeepRec with BigDL**
->![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/deeprec)
-
----------------------------
-
-- **Train DIEN using the Amazon Book Reviews dataset**
->![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/dien)
-
----------------------------
-
-- **Preprocess the Criteo dataset for DLRM Model**
->![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/dlrm)
-
----------------------------
-
-- **Train an LightGBM model using Twitter dataset**
->![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/lightGBM)
-
----------------------------
-
-- **Running Friesian listwise example**
->![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/listwise_ranking)
-
----------------------------
-
-- **Multi-task Recommendation with BigDL**
->![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/multi_task)
-
----------------------------
-
-- **Train an NCF model on MovieLens**
->![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/ncf)
-
-
----------------------------
-
-- **Offline Recall with Faiss on Spark**
->![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/recall)
-
-
----------------------------
-
-- **Recommend items using Friesian-Serving Framework**
->![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/serving)
-
-
----------------------------
-
-- **Train a two tower model using recsys data**
->![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/two_tower)
-
----------------------------
-
-- **Preprocess the Criteo dataset for WideAndDeep Model**
->![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/wnd)
-
-
----------------------------
-
-- **Train an XGBoost model using Twitter dataset**
->![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/xgb)
-
diff --git a/docs/readthedocs/source/doc/Friesian/index.rst b/docs/readthedocs/source/doc/Friesian/index.rst
deleted file mode 100644
index 16676873..00000000
--- a/docs/readthedocs/source/doc/Friesian/index.rst
+++ /dev/null
@@ -1,66 +0,0 @@
-BigDL-Friesian
-=========================
-
-
-
-BigDL Friesian is an application framework for building optimized large-scale recommender solutions. The recommending workflows built on top of Friesian can seamlessly scale out to distributed big data clusters in the production environment.
-
-Friesian provides end-to-end support for three typical stages in a modern recommendation system:
-
-- Offline stage: distributed feature engineering and model training.
-- Nearline stage: Feature and model updates.
-- Online stage: Recall and ranking.
-
--------
-
-.. grid:: 1 2 2 2
-    :gutter: 2
-
-    .. grid-item-card::
-
-        **Get Started**
-        ^^^
-
-        Documents in these sections helps you getting started quickly with Friesian.
-
-        +++
-
-        :bdg-link:`Introduction <./intro.html>`
-
-    .. grid-item-card::
-
-        **Key Features Guide**
-        ^^^
-
-        Each guide in this section provides you with in-depth information, concepts and knowledges about Friesian key features.
-
-        +++
-
-        :bdg-link:`Serving <./serving.html>`
-
-    .. grid-item-card::
-
-        **Use Cases**
-        ^^^
-
-        Use Cases and Examples.
-
-        +++
-
-        :bdg-link:`Use Cases <./examples.html>`
-
-    .. grid-item-card::
-
-        **API Document**
-        ^^^
-
-        API Document provides detailed description of Nano APIs.
-
-        +++
-
-        :bdg-link:`API Document <../PythonAPI/Friesian/index.html>`
-
-..  toctree::
-    :hidden:
-
-    BigDL-Friesian Document <self>
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Friesian/intro.rst b/docs/readthedocs/source/doc/Friesian/intro.rst
deleted file mode 100644
index 5468c0e9..00000000
--- a/docs/readthedocs/source/doc/Friesian/intro.rst
+++ /dev/null
@@ -1,17 +0,0 @@
-Friesian Introduction
-==========================
-
-BigDL Friesian is an application framework for building optimized large-scale recommender solutions. The recommending workflows built on top of Friesian can seamlessly scale out to distributed big data clusters in the production environment.
-
-Friesian provides end-to-end support for three typical stages in a modern recommendation system:
-
-- Offline stage: distributed feature engineering and model training.
-- Nearline stage: Feature and model updates.
-- Online stage: Recall and ranking.
-
-The overall architecture of Friesian is shown in the following diagram:
-
-
-.. image:: ../../../image/friesian_architecture.png
-
-
diff --git a/docs/readthedocs/source/doc/Friesian/serving.md b/docs/readthedocs/source/doc/Friesian/serving.md
deleted file mode 100644
index 405c8b94..00000000
--- a/docs/readthedocs/source/doc/Friesian/serving.md
+++ /dev/null
@@ -1,600 +0,0 @@
-## Serving Recommendation Framework
-
-### Architecture of the serving pipelines
-
-The diagram below demonstrates the components of the friesian serving system, which typically consists of three stages:
-
-- Offline: Preprocess the data to get user/item DNN features and user/item Embedding features. Then use the embedding features and embedding model to get embedding vectors.
-- Nearline: Retrieve user/item profiles and keep them in the Key-Value store. Retrieve item embedding vectors and build the faiss index. Make updates to the profiles from time to time.
-- Online: Trigger the recommendation process whenever a user comes. Recall service generate candidates from millions of items based on embeddings and the deep learning model ranks the candidates for the final recommendation results.
-
-![](../../../image/friesian_architecture.png)
-
-
-### Services and APIs
-The friesian serving system consists of 4 types of services:
-- Ranking Service: performs model inference and returns the results.
-  - `rpc doPredict(Content) returns (Prediction) {}`
-    - Input: The `encodeStr` is a Base64 string encoded from a bigdl [Activity](https://github.com/intel-analytics/BigDL/blob/branch-2.0/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/nn/abstractnn/Activity.scala) serialized byte array.
-    ```bash
-    message Content {
-        string encodedStr = 1;
-    }
-    ```
-    - Output: The `predictStr` is a Base64 string encoded from a bigdl [Activity](https://github.com/intel-analytics/BigDL/blob/branch-2.0/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/nn/abstractnn/Activity.scala) (the inference result) serialized byte array.
-    ```bash
-    message Prediction {
-        string predictStr = 1;
-    }
-    ```
-- Feature Service: searches user embeddings, user features or item features in Redis, and returns the features.
-  - `rpc getUserFeatures(IDs) returns (Features) {}` and `rpc getItemFeatures(IDs) returns (Features) {}`
-    - Input: The user/item id list for searching.
-    ```bash
-    message IDs {
-        repeated int32 ID = 1;
-    }
-    ```
-    - Output: `colNames` is a string list of the column names. `b64Feature` is a list of Base64 string, each string is encoded from java serialized array of objects. `ID` is a list of ids corresponding `b64Feature`.
-    ```bash
-    message Features {
-        repeated string colNames = 1;
-        repeated string b64Feature = 2;
-        repeated int32 ID = 3;
-    }
-    ```
-- Recall Service: searches item candidates in the built faiss index and returns candidates id list.
-  - `rpc searchCandidates(Query) returns (Candidates) {}`
-    - Input: `userID` is the id of the user to search similar item candidates. `k` is the number of candidates.
-    ```bash
-    message Query {
-        int32 userID = 1;
-        int32 k = 2;
-    }
-    ```
-    - Output: `candidate` is the list of ids of item candidates.
-    ```bash
-    message Candidates {
-        repeated int32 candidate = 1;
-    }
-    ```
-- Recommender Service: gets candidates from the recall service, calls the feature service to get the user and item candidate's features, then sorts the inference results from ranking service and returns the top recommendNum items.
-  - `rpc getRecommendIDs(RecommendRequest) returns (RecommendIDProbs) {}`
-    - Input: `ID` is a list of user ids to recommend. `recommendNum` is the number of items to recommend. `candidateNum` is the number of generated candidates to inference in ranking service.
-    ```bash
-    message RecommendRequest {
-        int32 recommendNum = 1;
-        int32 candidateNum = 2;
-        repeated int32 ID = 3;
-    }
-    ```
-    - Output: `IDProbList` is a list of results corresponding to user `ID` in input. Each `IDProbs` consists of `ID` and `prob`, `ID` is the list of item ids, and `prob` is the corresponding probability.
-    ```bash
-    message RecommendIDProbs {
-        repeated IDProbs IDProbList = 1;
-    }
-    message IDProbs {
-        repeated int32 ID = 1;
-        repeated float prob = 2;
-    }
-    ```
-
-### Quick Start
-You can run Friesian Serving Recommendation Framework using the official Docker images.
-
-You can follow the following steps to run the WnD demo.
-
-1. Pull docker image from dockerhub
-   ```bash
-   docker pull intelanalytics/friesian-grpc:0.0.2
-   ```
-
-2. Run & enter docker container
-   ```bash
-   docker run -itd --name friesian --net=host intelanalytics/friesian-grpc:0.0.2
-   docker exec -it friesian bash
-   ```
-
-3. Add vec_feature_user_prediction.parquet, vec_feature_item_prediction.parquet, wnd model,
-   wnd_item.parquet and wnd_user.parquet (You can check [the schema of the parquet files](#schema-of-the-parquet-files))
-
-4. Start ranking service
-   ```bash
-   export OMP_NUM_THREADS=1
-   java -cp bigdl-friesian-serving-spark_2.4.6-0.14.0-SNAPSHOT.jar com.intel.analytics.bigdl.friesian.serving.ranking.RankingServer -c config_ranking.yaml > logs/inf.log 2>&1 &
-   ```
-
-5. Start feature service for recommender service
-   ```bash
-   ./redis-5.0.5/src/redis-server &
-   java -Dspark.master=local[*] -cp bigdl-friesian-serving-spark_2.4.6-0.14.0-SNAPSHOT.jar com.intel.analytics.bigdl.friesian.serving.feature.FeatureServer -c config_feature.yaml > logs/feature.log 2>&1 &
-   ```
-
-6. Start feature service for recall service
-   ```bash
-   java -Dspark.master=local[*] -cp bigdl-friesian-serving-spark_2.4.6-0.14.0-SNAPSHOT.jar com.intel.analytics.bigdl.friesian.serving.feature.FeatureServer -c config_feature_vec.yaml > logs/fea_recall.log 2>&1 &
-   ```
-
-7. Start recall service
-   ```bash
-   java -Dspark.master=local[*] -Dspark.driver.maxResultSize=2G -cp bigdl-friesian-serving-spark_2.4.6-0.14.0-SNAPSHOT.jar com.intel.analytics.bigdl.friesian.serving.recall.RecallServer -c config_recall.yaml > logs/vec.log 2>&1 &
-   ```
-
-8. Start recommender service
-   ```bash
-   java -cp bigdl-friesian-serving-spark_2.4.6-0.14.0-SNAPSHOT.jar com.intel.analytics.bigdl.friesian.serving.recommender.RecommenderServer -c config_recommender.yaml > logs/rec.log 2>&1 &
-   ```
-
-9. Check if the services are running
-   ```bash
-   ps aux|grep friesian
-   ```
-   You will see 5 processes start with 'java'
-
-10. Run client to test
-    ```bash
-    java -Dspark.master=local[*] -cp bigdl-friesian-serving-spark_2.4.6-0.14.0-SNAPSHOT.jar com.intel.analytics.bigdl.friesian.serving.recommender.RecommenderMultiThreadClient -target localhost:8980 -dataDir wnd_user.parquet -k 50 -clientNum 4 -testNum 2
-    ```
-11. Close services
-    ```bash
-    ps aux|grep friesian (find the service pid)
-    kill xxx (pid of the service which should be closed)
-    ```
-
-### Schema of the parquet files
-
-#### The schema of the user and item embedding files
-The embedding parquet files should contain at least 2 columns, id column and prediction column.
-The id column should be IntegerType and the column name should be specified in the config files.
-The prediction column should be DenseVector type, and you can transfer your existing embedding vectors using pyspark:
-```python
-from pyspark.sql import SparkSession
-from pyspark.sql.functions import udf, col
-from pyspark.ml.linalg import VectorUDT, DenseVector
-
-spark = SparkSession.builder \
-        .master("local[*]") \
-        .config("spark.driver.memory", "2g") \
-        .getOrCreate()
-
-df = spark.read.parquet("data_path")
-
-def trans_densevector(data):
-   return DenseVector(data)
-
-vector_udf = udf(lambda x: trans_densevector(x), VectorUDT())
-# suppose the embedding column (ArrayType(FloatType,true)) is the existing user/item embedding.
-df = df.withColumn("prediction", vector_udf(col("embedding")))
-df.write.parquet("output_file_path", mode="overwrite")
-```
-
-#### The schema of the recommendation model feature files
-The feature parquet files should contain at least 2 columns, the id column and other feature columns.
-The feature columns can be int, float, double, long and array of int, float, double and long.
-Here is an example of the WideAndDeep model feature.
-```bash
-+-------------+--------+--------+----------+--------------------------------+---------------------------------+------------+-----------+---------+----------------------+-----------------------------+
-|present_media|language|tweet_id|tweet_type|engaged_with_user_follower_count|engaged_with_user_following_count|len_hashtags|len_domains|len_links|present_media_language|engaged_with_user_is_verified|
-+-------------+--------+--------+----------+--------------------------------+---------------------------------+------------+-----------+---------+----------------------+-----------------------------+
-|            9|      43|     924|         2|                               6|                                3|         0.0|        0.1|      0.1|                    45|                            1|
-|            0|       6| 4741724|         2|                               3|                                3|         0.0|        0.0|      0.0|                   527|                            0|
-+-------------+--------+--------+----------+--------------------------------+---------------------------------+------------+-----------+---------+----------------------+-----------------------------+
-```
-
-### The data schema in Redis
-The user features, item features and user embedding vectors are saved in Redis.
-The data saved in Redis is a key-value set.
-
-#### Key in Redis
-The key in Redis consists of 3 parts: key prefix, data type, and data id.
-- Key prefix is `redisKeyPrefix` specified in the feature service config file.
-- Data type is one of `user` or `item`.
-- Data id is the value of `userIDColumn` or `itemIDColumn`.
-Here is an example of key: `2tower_user:29`
-
-#### Value in Redis
-A row in the input parquet file will be converted to java array of object, then serialized into byte array, and encoded into Base64 string.
-
-#### Data schema entry
-Every key prefix and data type combination has its data schema entry to save the corresponding column names. The key of the schema entry is `keyPrefix + dataType`, such as `2tower_user`. The value of the schema entry is a string of column names separated by `,`, such as `enaging_user_follower_count,enaging_user_following_count,enaging_user_is_verified`.
-
-### Config for different service
-You can pass some important information to services using `-c config.yaml`
-```bash
-java -Dspark.master=local[*] -Dspark.driver.maxResultSize=2G -cp bigdl-friesian-serving-spark_2.4.6-0.14.0-SNAPSHOT.jar com.intel.analytics.bigdl.friesian.serving.recall.RecallServer -c config_recall.yaml
-```
-
-#### Ranking Service Config
-Config with example:
-```yaml
-# Default: 8980, which port to create the server
-servicePort: 8083
-
-# Default: 0, open a port for prometheus monitoring tool, if set, user can check the
-# performance using prometheus
-monitorPort: 1234
-
-# model path must be provided
-modelPath: /home/yina/Documents/model/recys2021/wnd_813/recsys_wnd
-
-# default: null, savedmodel input list if the model is tf savedmodel. If not provided, the inputs
-# of the savedmodel will be arranged in alphabetical order
-savedModelInputs: serving_default_input_1:0, serving_default_input_2:0, serving_default_input_3:0, serving_default_input_4:0, serving_default_input_5:0, serving_default_input_6:0, serving_default_input_7:0, serving_default_input_8:0, serving_default_input_9:0, serving_default_input_10:0, serving_default_input_11:0, serving_default_input_12:0, serving_default_input_13:0
-
-# default: 1, number of models used in inference service
-modelParallelism: 4
-```
-
-##### Feature Service Config
-Config with example:
-1. load data into redis. Search data from redis
-   ```yaml
-   ### Basic setting
-   # Default: 8980, which port to create the server
-   servicePort: 8082
-
-   # Default: null, open a port for prometheus monitoring tool, if set, user can check the
-   # performance using prometheus
-   monitorPort: 1235
-
-   # 'kv' or 'inference' default: kv
-   serviceType: kv
-
-   # default: false, if need to load initial data to redis, set true
-   loadInitialData: true
-
-   # default: "", prefix for redis key
-   redisKeyPrefix:
-
-   # default: 0, item slot type on redis cluster. 0 means slot number use the default value 16384, 1 means all keys save to same slot, 2 means use the last character of id as hash tag.
-   redisClusterItemSlotType: 2
-
-   # default: null, if loadInitialData=true, initialUserDataPath or initialItemDataPath must be
-   # provided. Only support parquet file
-   initialUserDataPath: /home/yina/Documents/data/recsys/preprocess_output/wnd_user.parquet
-   initialItemDataPath: /home/yina/Documents/data/recsys/preprocess_output/wnd_exp1/wnd_item.parquet
-
-   # default: null, if loadInitialData=true and initialUserDataPath != null, userIDColumn and
-   # userFeatureColumns must be provided
-   userIDColumn: enaging_user_id
-   userFeatureColumns: enaging_user_follower_count,enaging_user_following_count
-
-   # default: null, if loadInitialData=true and initialItemDataPath != null, userIDColumn and
-   # userFeatureColumns must be provided
-   itemIDColumn: tweet_id
-   itemFeatureColumns: present_media, language, tweet_id, hashtags, present_links, present_domains, tweet_type, engaged_with_user_follower_count,engaged_with_user_following_count, len_hashtags, len_domains, len_links, present_media_language, tweet_id_engaged_with_user_id
-
-   # default: null, user model path or item model path must be provided if serviceType
-   # contains 'inference'. If serviceType=kv, usermodelPath, itemModelPath and modelParallelism will
-   # be ignored
-   # userModelPath:
-
-   # default: null, user model path or item model path must be provided if serviceType
-   # contains 'inference'. If serviceType=kv, usermodelPath, itemModelPath and modelParallelism will
-   # be ignored
-   # itemModelPath:
-
-   # default: 1, number of models used for inference
-   # modelParallelism:
-
-   ### Redis Configuration
-   # default: localhost:6379
-   # redisUrl:
-
-   # default: 256, JedisPoolMaxTotal
-   # redisPoolMaxTotal:
-   ```
-
-2. load user features into redis. Get features from redis, use model at 'userModelPath' to do
-   inference and get the user embedding
-   ```yaml
-   ### Basic setting
-   # Default: 8980, which port to create the server
-   servicePort: 8085
-
-   # Default: null, open a port for prometheus monitoring tool, if set, user can check the
-   # performance using prometheus
-   monitorPort: 1236
-
-   # 'kv' or 'inference' default: kv
-   serviceType: kv, inference
-
-   # default: false, if need to load initial data to redis, set true
-   loadInitialData: true
-
-   # default: ""
-   redisKeyPrefix: 2tower_
-
-   # default: 0, item slot type on redis cluster. 0 means slot number use the default value 16384, 1 means all keys save to same slot, 2 means use the last character of id as hash tag.
-   redisClusterItemSlotType: 2
-
-   # default: null, if loadInitialData=true, initialDataPath must be provided. Only support parquet
-   # file
-   initialUserDataPath: /home/yina/Documents/data/recsys/preprocess_output/guoqiong/vec_feature_user.parquet
-   # initialItemDataPath:
-
-   # default: null, if loadInitialData=true and initialUserDataPath != null, userIDColumn and
-   # userFeatureColumns must be provided
-   #userIDColumn: user
-   userIDColumn: enaging_user_id
-   userFeatureColumns: user
-
-   # default: null, if loadInitialData=true and initialItemDataPath != null, userIDColumn and
-   # userFeatureColumns must be provided
-   # itemIDColumn:
-   # itemFeatureColumns:
-
-   # default: null, user model path or item model path must be provided if serviceType
-   # includes 'inference'. If serviceType=kv, usermodelPath, itemModelPath and modelParallelism will
-   # be ignored
-   userModelPath: /home/yina/Documents/model/recys2021/2tower/guoqiong/user-model
-
-   # default: null, user model path or item model path must be provided if serviceType
-   # contains 'inference'. If serviceType=kv, usermodelPath, itemModelPath and modelParallelism will
-   # be ignored
-   # itemModelPath:
-
-   # default: 1, number of models used for inference
-   # modelParallelism:
-
-   ### Redis Configuration
-   # default: localhost:6379
-   # redisUrl:
-
-   # default: 256, JedisPoolMaxTotal
-   # redisPoolMaxTotal:
-   ```
-
-#### Recall Service Config
-Config with example:
-
-1. load initial item vector from vec_feature_item.parquet and item-model to build faiss index.
-   ```yaml
-   # Default: 8980, which port to create the server
-   servicePort: 8084
-
-   # Default: null, open a port for prometheus monitoring tool, if set, user can check the
-   # performance using prometheus
-   monitorPort: 1238
-
-   # default: 128, the dimensionality of the embedding vectors
-   indexDim: 50
-
-   # default: false, if load saved index, set true
-   # loadSavedIndex: true
-
-   # default: false, if true, the built index will be saved to indexPath. Ignored when
-   # loadSavedIndex=true
-   saveBuiltIndex: true
-
-   # default: null, path to saved index path, must be provided if loadSavedIndex=true
-   indexPath: ./2tower_item_full.idx
-
-   # default: false
-   getFeatureFromFeatureService: true
-
-   # default: localhost:8980, feature service target
-   featureServiceURL: localhost:8085
-
-   itemIDColumn: tweet_id
-   itemFeatureColumns: item
-
-   # default: null, user model path must be provided if getFeatureFromFeatureService=false
-   # userModelPath:
-
-   # default: null, item model path must be provided if loadSavedIndex=false and initialDataPath is
-   # not orca predict result
-   itemModelPath: /home/yina/Documents/model/recys2021/2tower/guoqiong/item-model
-
-   # default: null,  Only support parquet file
-   initialDataPath: /home/yina/Documents/data/recsys/preprocess_output/guoqiong/vec_feature_item.parquet
-
-   # default: 1, number of models used in inference service
-   modelParallelism: 1
-   ```
-
-2. load existing faiss index
-   ```yaml
-   # Default: 8980, which port to create the server
-   servicePort: 8084
-
-   # Default: null, open a port for prometheus monitoring tool, if set, user can check the
-   # performance using prometheus
-   monitorPort: 1238
-
-   # default: 128, the dimensionality of the embedding vectors
-   # indexDim:
-
-   # default: false, if load saved index, set true
-   loadSavedIndex: true
-
-   # default: null, path to saved index path, must be provided if loadSavedIndex=true
-   indexPath: ./2tower_item_full.idx
-
-   # default: false
-   getFeatureFromFeatureService: true
-
-   # default: localhost:8980, feature service target
-   featureServiceURL: localhost:8085
-
-   # itemIDColumn:
-   # itemFeatureColumns:
-
-   # default: null, user model path must be provided if getFeatureFromFeatureService=false
-   # userModelPath:
-
-   # default: null, item model path must be provided if loadSavedIndex=false and initialDataPath is
-   # not orca predict result
-   # itemModelPath:
-
-   # default: null,  Only support parquet file
-   # initialDataPath:
-
-   # default: 1, number of models used in inference service
-   # modelParallelism:
-   ```
-#### Recommender Service Config
-Config with example:
-
-```yaml
- Default: 8980, which port to create the server
- servicePort: 8980
-
- # Default: null, open a port for prometheus monitoring tool, if set, user can check the
- # performance using prometheus
- monitorPort: 1237
-
- # default: null, must be provided, item column name
- itemIDColumn: tweet_id
-
-# default: null, must be provided, column names for inference, order related.
-inferenceColumns: present_media_language, present_media, tweet_type, language, hashtags, present_links, present_domains, tweet_id_engaged_with_user_id, engaged_with_user_follower_count, engaged_with_user_following_count, enaging_user_follower_count, enaging_user_following_count, len_hashtags, len_domains, len_links
-
- # default: 0, if set, ranking service request will be divided
-inferenceBatch: 0
-
-# default: localhost:8980, recall service target
-recallServiceURL: localhost:8084
-
-# default: localhost:8980, feature service target
-featureServiceURL: localhost:8082
-
-# default: localhost:8980, inference service target
-rankingServiceURL: localhost:8083
-```
-
-### Run Java Client
-
-#### Generate proto java files
-You should init a maven project and use proto files in [friesian gRPC project](https://github.com/analytics-zoo/friesian/tree/recsys-grpc/src/main/proto)
-Make sure to add the following extensions and plugins in your pom.xml, and replace
-*protocExecutable* with your own protoc executable.
-```xml
-<build>
-    <extensions>
-        <extension>
-            <groupId>kr.motd.maven</groupId>
-            <artifactId>os-maven-plugin</artifactId>
-            <version>1.6.2</version>
-        </extension>
-    </extensions>
-    <plugins>
-        <plugin>
-            <groupId>org.apache.maven.plugins</groupId>
-            <artifactId>maven-compiler-plugin</artifactId>
-            <version>3.8.0</version>
-            <configuration>
-                <source>8</source>
-                <target>8</target>
-            </configuration>
-        </plugin>
-        <plugin>
-            <groupId>org.xolstice.maven.plugins</groupId>
-            <artifactId>protobuf-maven-plugin</artifactId>
-            <version>0.6.1</version>
-            <configuration>
-                <protocArtifact>com.google.protobuf:protoc:3.12.0:exe:${os.detected.classifier}</protocArtifact>
-                <pluginId>grpc-java</pluginId>
-                <pluginArtifact>io.grpc:protoc-gen-grpc-java:1.37.0:exe:${os.detected.classifier}</pluginArtifact>
-                <protocExecutable>/home/yina/Documents/protoc/bin/protoc</protocExecutable>
-            </configuration>
-            <executions>
-                <execution>
-                    <goals>
-                        <goal>compile</goal>
-                        <goal>compile-custom</goal>
-                    </goals>
-                </execution>
-            </executions>
-        </plugin>
-    </plugins>
-</build>
-```
-Then you can generate the gRPC files with
-```bash
-mvn clean install
-```
-#### Call recommend service function using blocking stub
-You can check the [Recommend service client example](https://github.com/analytics-zoo/friesian/blob/recsys-grpc/src/main/java/grpc/recommend/RecommendClient.java) on Github
-
-```java
-import com.intel.analytics.bigdl.friesian.serving.grpc.generated.recommender.RecommenderGrpc;
-import com.intel.analytics.bigdl.friesian.serving.grpc.generated.recommender.RecommenderProto.*;
-
-public class RecommendClient {
-    public static void main(String[] args) {
-        // Create a channel
-        ManagedChannel channel = ManagedChannelBuilder.forTarget(targetURL).usePlaintext().build();
-        // Init a recommend service blocking stub
-        RecommenderGrpc.RecommenderBlockingStub blockingStub = RecommenderGrpc.newBlockingStub(channel);
-        // Construct a request
-        int[] userIds = new int[]{1};
-        int candidateNum = 50;
-        int recommendNum = 10;
-        RecommendRequest.Builder request = RecommendRequest.newBuilder();
-        for (int id : userIds) {
-            request.addID(id);
-        }
-        request.setCandidateNum(candidateNum);
-        request.setRecommendNum(recommendNum);
-        RecommendIDProbs recommendIDProbs = null;
-        try {
-            recommendIDProbs = blockingStub.getRecommendIDs(request.build());
-            logger.info(recommendIDProbs.getIDProbListList());
-        } catch (StatusRuntimeException e) {
-            logger.warn("RPC failed: " + e.getStatus().toString());
-        }
-    }
-}
-```
-
-### Run Python Client
-Install the python packages listed below (you may encounter [pyspark error](https://stackoverflow.com/questions/58700384/how-to-fix-typeerror-an-integer-is-required-got-type-bytes-error-when-tryin) if you have python>=3.8 installed, try to downgrade to python<=3.7 and try again).
-```bash
-pip install jupyter notebook==6.1.4 grpcio grpcio-tools pandas fastparquet pyarrow
-```
-After you activate your server successfully, you can
-
-#### Generate proto python files
-Generate the files with
-```bash
-python -m grpc_tools.protoc -I../../protos --python_out=<path_to_output_folder> --grpc_python_out=<path_to_output_folder> <path_to_friesian>/src/main/proto/*.proto
-```
-
-#### Call recommend service function using blocking stub
-You can check the [Recommend service client example](https://github.com/analytics-zoo/friesian/blob/recsys-grpc/Serving/WideDeep/recommend_client.ipynb) on Github
-```python
-# create a channel
-channel = grpc.insecure_channel('localhost:8980')
-# create a recommend service stub
-stub = recommender_pb2_grpc.RecommenderStub(channel)
-request = recommender_pb2.RecommendRequest(recommendNum=10, candidateNum=50, ID=[36407])
-results = stub.getRecommendIDs(request)
-print(results.IDProbList)
-
-```
-### Scale-out for Big Data
-#### Redis Cluster
-For large data set, Redis standalone has no enough memory to store whole data set, data sharding and Redis cluster are supported to handle it. You only need to set up a Redis Cluster to get it work.
-
-First, start N Redis instance on N machines.
-```
-redis-server --cluster-enabled yes --cluster-config-file nodes-0.conf --cluster-node-timeout 50000 --appendonly no --save "" --logfile 0.log --daemonize yes --protected-mode no --port 6379
-```
-on each machine, choose a different port and start another M instances(M>=1), as the slave nodes of above N instances.
-
-Then, call initialization command on one machine, if you choose M=1 above, use `--cluster-replicas 1`
-```
-redis-cli --cluster create 172.168.3.115:6379 172.168.3.115:6380 172.168.3.116:6379 172.168.3.116:6380 172.168.3.117:6379 172.168.3.117:6380 --cluster-replicas 1
-```
-and the Redis cluster would be ready.
-
-#### Scale Service with Envoy
-Each of the services could be scaled out. It is recommended to use the same resource, e.g. single machine with same CPU and memory, to test which service is bottleneck. From empirical observations, vector search and inference usually be.
-
-##### How to run envoy:
-1. [download](https://www.envoyproxy.io/docs/envoy/latest/start/install) and deploy envoy(below use docker as example):
-   * download: `docker pull envoyproxy/envoy-dev:21df5e8676a0f705709f0b3ed90fc2dbbd63cfc5`
-2. run command: `docker run --rm -it  -p 9082:9082 -p 9090:9090 envoyproxy/envoy-dev:79ade4aebd02cf15bd934d6d58e90aa03ef6909e --config-yaml "$(cat path/to/service-specific-envoy.yaml)" --parent-shutdown-time-s 1000000`
-3. validate: run `netstat -tnlp` to see if the envoy process is listening to the corresponding port in the envoy config file.
-4. For details on envoy and sample procedure, read [envoy](envoy.md).
diff --git a/docs/readthedocs/source/doc/GetStarted/index.rst b/docs/readthedocs/source/doc/GetStarted/index.rst
deleted file mode 100644
index 44ff4eb3..00000000
--- a/docs/readthedocs/source/doc/GetStarted/index.rst
+++ /dev/null
@@ -1,6 +0,0 @@
-User Guide
-=========================
-
-
-Getting Started
-===========================================
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/GetStarted/install.rst b/docs/readthedocs/source/doc/GetStarted/install.rst
deleted file mode 100644
index 2ecc9bab..00000000
--- a/docs/readthedocs/source/doc/GetStarted/install.rst
+++ /dev/null
@@ -1,2 +0,0 @@
-Install Locally
-=========================
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/GetStarted/paper.md b/docs/readthedocs/source/doc/GetStarted/paper.md
deleted file mode 100644
index cfd2fb4d..00000000
--- a/docs/readthedocs/source/doc/GetStarted/paper.md
+++ /dev/null
@@ -1,28 +0,0 @@
-# Paper
-
-
-## Paper
-
-* Dai, Jason Jinquan, et al. "BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. [paper](https://arxiv.org/ftp/arxiv/papers/2204/2204.01715.pdf) [video]() [demo]()
-
-* Dai, Jason Jinquan, et al. "BigDL: A distributed deep learning framework for big data." Proceedings of the ACM Symposium on Cloud Computing. 2019. [paper](https://arxiv.org/abs/1804.05839)
-
-
-
-
-## Citing
-If you've found BigDL useful for your project, you may cite the [paper](https://arxiv.org/abs/1804.05839) as follows:
-
-```
-@inproceedings{SOCC2019_BIGDL,
-  title={BigDL: A Distributed Deep Learning Framework for Big Data},
-  author={Dai, Jason (Jinquan) and Wang, Yiheng and Qiu, Xin and Ding, Ding and Zhang, Yao and Wang, Yanzhang and Jia, Xianyan and Zhang, Li (Cherry) and Wan, Yan and Li, Zhichao and Wang, Jiao and Huang, Shengsheng and Wu, Zhongyuan and Wang, Yang and Yang, Yuhao and She, Bowen and Shi, Dongjie and Lu, Qi and Huang, Kai and Song, Guoqiong},
-  booktitle={Proceedings of the ACM Symposium on Cloud Computing},
-  publisher={Association for Computing Machinery},
-  pages={50--60},
-  year={2019},
-  series={SoCC'19},
-  doi={10.1145/3357223.3362707},
-  url={https://arxiv.org/pdf/1804.05839.pdf}
-}
-```
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/GetStarted/usecase.rst b/docs/readthedocs/source/doc/GetStarted/usecase.rst
deleted file mode 100644
index b405af0e..00000000
--- a/docs/readthedocs/source/doc/GetStarted/usecase.rst
+++ /dev/null
@@ -1,2 +0,0 @@
-Use Cases
-============================
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/GetStarted/videos.md b/docs/readthedocs/source/doc/GetStarted/videos.md
deleted file mode 100644
index e69de29b..00000000
diff --git a/docs/readthedocs/source/doc/LLM/Inference/Self_Speculative_Decoding.md b/docs/readthedocs/source/doc/LLM/Inference/Self_Speculative_Decoding.md
index 90463dde..99179194 100644
--- a/docs/readthedocs/source/doc/LLM/Inference/Self_Speculative_Decoding.md
+++ b/docs/readthedocs/source/doc/LLM/Inference/Self_Speculative_Decoding.md
@@ -4,10 +4,10 @@
 In [speculative](https://arxiv.org/abs/2302.01318) [decoding](https://arxiv.org/abs/2211.17192), a small (draft) model quickly generates multiple draft tokens, which are then verified in parallel by the large (target) model. While speculative decoding can effectively speed up the target model, ***in practice it is difficult to maintain or even obtain a proper draft model***, especially when the target model is finetuned with customized data. 
 
 ### Self-Speculative Decoding 
-Built on top of the concept of “[self-speculative decoding](https://arxiv.org/abs/2309.08168)”, BigDL-LLM can now accelerate the original FP16 or BF16 model ***without the need of a separate draft model or model finetuning***; instead, it automatically converts the original model to INT4, and uses the INT4 model as the draft model behind the scene. In practice, this brings ***~30% speedup*** for FP16 and BF16 LLM inference latency on Intel GPU and CPU respectively.
+Built on top of the concept of “[self-speculative decoding](https://arxiv.org/abs/2309.08168)”, IPEX-LLM can now accelerate the original FP16 or BF16 model ***without the need of a separate draft model or model finetuning***; instead, it automatically converts the original model to INT4, and uses the INT4 model as the draft model behind the scene. In practice, this brings ***~30% speedup*** for FP16 and BF16 LLM inference latency on Intel GPU and CPU respectively.
 
-### Using BigDL-LLM Self-Speculative Decoding
-Please refer to BigDL-LLM self-speculative decoding code snippets below, and the detailed [GPU](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/Speculative-Decoding) and [CPU](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/Speculative-Decoding) examples in the project repo.
+### Using IPEX-LLM Self-Speculative Decoding
+Please refer to IPEX-LLM self-speculative decoding code snippets below, and the detailed [GPU](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/Speculative-Decoding) and [CPU](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/Speculative-Decoding) examples in the project repo.
 
 ```python
 model = AutoModelForCausalLM.from_pretrained(model_path,
diff --git a/docs/readthedocs/source/doc/LLM/Overview/FAQ/faq.md b/docs/readthedocs/source/doc/LLM/Overview/FAQ/faq.md
index 62812e8d..a0c80050 100644
--- a/docs/readthedocs/source/doc/LLM/Overview/FAQ/faq.md
+++ b/docs/readthedocs/source/doc/LLM/Overview/FAQ/faq.md
@@ -2,44 +2,44 @@
 
 ## General Info & Concepts
 
-### GGUF format usage with BigDL-LLM?
+### GGUF format usage with IPEX-LLM?
 
-BigDL-LLM supports running GGUF/AWQ/GPTQ models on both [CPU](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Advanced-Quantizations) and [GPU](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations).
-Please also refer to [here](https://github.com/intel-analytics/BigDL?tab=readme-ov-file#latest-update-) for our latest support.
+IPEX-LLM supports running GGUF/AWQ/GPTQ models on both [CPU](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Advanced-Quantizations) and [GPU](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations).
+Please also refer to [here](https://github.com/intel-analytics/ipex-llm?tab=readme-ov-file#latest-update-) for our latest support.
 
 ## How to Resolve Errors
 
-### Fail to install `bigdl-llm` through `pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu`
+### Fail to install `ipex-llm` through `pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu`
 
-You could try to install BigDL-LLM dependencies for Intel XPU from source archives:
-- For Windows system, refer to [here](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#install-bigdl-llm-from-wheel) for the steps.
-- For Linux system, refer to [here](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#id3) for the steps.
+You could try to install IPEX-LLM dependencies for Intel XPU from source archives:
+- For Windows system, refer to [here](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#install-ipex-llm-from-wheel) for the steps.
+- For Linux system, refer to [here](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#id3) for the steps.
 
 ### PyTorch is not linked with support for xpu devices
 
-1. Before running on Intel GPUs, please make sure you've prepared environment follwing [installation instruction](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html).
-2. If you are using an older version of `bigdl-llm` (specifically, older than 2.5.0b20240104), you need to manually add `import intel_extension_for_pytorch as ipex` at the beginning of your code.
-3. After optimizing the model with BigDL-LLM, you need to move model to GPU through `model = model.to('xpu')`.
-4. If you have mutil GPUs, you could refer to [here](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/KeyFeatures/multi_gpus_selection.html) for details about GPU selection.
+1. Before running on Intel GPUs, please make sure you've prepared environment follwing [installation instruction](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html).
+2. If you are using an older version of `ipex-llm` (specifically, older than 2.5.0b20240104), you need to manually add `import intel_extension_for_pytorch as ipex` at the beginning of your code.
+3. After optimizing the model with IPEX-LLM, you need to move model to GPU through `model = model.to('xpu')`.
+4. If you have mutil GPUs, you could refer to [here](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/KeyFeatures/multi_gpus_selection.html) for details about GPU selection.
 5. If you do inference using the optimized model on Intel GPUs, you also need to set `to('xpu')` for input tensors.
 
 ### Import `intel_extension_for_pytorch` error on Windows GPU
 
-Please refer to [here](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#error-loading-intel-extension-for-pytorch) for detailed guide. We list the possible missing requirements in environment which could lead to this error.
+Please refer to [here](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#error-loading-intel-extension-for-pytorch) for detailed guide. We list the possible missing requirements in environment which could lead to this error.
 
 ### XPU device count is zero
 
 It's recommended to reinstall driver:
-- For Windows system, refer to [here](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#prerequisites) for the steps.
-- For Linux system, refer to [here](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#id1) for the steps.
+- For Windows system, refer to [here](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#prerequisites) for the steps.
+- For Linux system, refer to [here](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#id1) for the steps.
 
 ### Error such as `The size of tensor a (33) must match the size of tensor b (17) at non-singleton dimension 2` duing attention forward function
 
-If you are using BigDL-LLM PyTorch API, please try to set `optimize_llm=False` manually when call `optimize_model` function to work around. As for BigDL-LLM `transformers`-style API, you could try to set `optimize_model=False` manually when call `from_pretrained` function to work around.
+If you are using IPEX-LLM PyTorch API, please try to set `optimize_llm=False` manually when call `optimize_model` function to work around. As for IPEX-LLM `transformers`-style API, you could try to set `optimize_model=False` manually when call `from_pretrained` function to work around.
 
 ### ValueError: Unrecognized configuration class
 
-This error is not quite relevant to BigDL-LLM. It could be that you're using the incorrect AutoClass, or the transformers version is not updated, or transformers does not support using AutoClasses to load this model. You need to refer to the model card in huggingface to confirm these information. Besides, if you load the model from local path, please also make sure you download the complete model files.
+This error is not quite relevant to IPEX-LLM. It could be that you're using the incorrect AutoClass, or the transformers version is not updated, or transformers does not support using AutoClasses to load this model. You need to refer to the model card in huggingface to confirm these information. Besides, if you load the model from local path, please also make sure you download the complete model files.
 
 ### `mixed dtype (CPU): expect input to have scalar type of BFloat16` during inference
 
@@ -62,7 +62,7 @@ You may encounter this error during finetuning on multi GPUs. Please try `sudo a
 
 ### Random and unreadable output of Gemma-7b-it on Arc770 ubuntu 22.04 due to driver and OneAPI missmatching.
 
-If driver and OneAPI missmatching, it will lead to some error when BigDL use XMX(short prompts) for speeding up.
+If driver and OneAPI missmatching, it will lead to some error when IPEX-LLM uses XMX(short prompts) for speeding up.
 The output of `What's AI?` may like below:
 ```
 wiedzy Artificial Intelligence meliti: Artificial Intelligence undenti beng beng beng beng beng beng beng beng beng beng beng beng beng beng beng beng beng beng beng beng beng beng
diff --git a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/cli.md b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/cli.md
index 7c21f2f8..ab162594 100644
--- a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/cli.md
+++ b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/cli.md
@@ -4,7 +4,7 @@
 
 .. note:: 
 
-   Currently ``bigdl-llm`` CLI supports *LLaMA* (e.g., vicuna), *GPT-NeoX* (e.g., redpajama), *BLOOM* (e.g., pheonix) and *GPT2* (e.g., starcoder) model architecture; for other models, you may use the ``transformers``-style or LangChain APIs.
+   Currently ``ipex-llm`` CLI supports *LLaMA* (e.g., vicuna), *GPT-NeoX* (e.g., redpajama), *BLOOM* (e.g., pheonix) and *GPT2* (e.g., starcoder) model architecture; for other models, you may use the ``transformers``-style or LangChain APIs.
 ```
 
 ## Convert Model
diff --git a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/finetune.md b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/finetune.md
index e4cf8700..b895b89f 100644
--- a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/finetune.md
+++ b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/finetune.md
@@ -1,6 +1,6 @@
 # Finetune (QLoRA)
 
-We also support finetuning LLMs (large language models) using QLoRA with BigDL-LLM 4bit optimizations on Intel GPUs.
+We also support finetuning LLMs (large language models) using QLoRA with IPEX-LLM 4bit optimizations on Intel GPUs.
 
 ```eval_rst
 .. note::
@@ -15,7 +15,7 @@ To help you better understand the finetuning process, here we use model [Llama-2
 ```eval_rst
 .. note::
 
-   If you are using an older version of ``bigdl-llm`` (specifically, older than 2.5.0b20240104), you need to manually add ``import intel_extension_for_pytorch as ipex`` at the beginning of your code.
+   If you are using an older version of ``ipex-llm`` (specifically, older than 2.5.0b20240104), you need to manually add ``import intel_extension_for_pytorch as ipex`` at the beginning of your code.
 ```
 
 First, load model using `transformers`-style API and **set it to `to('xpu')`**. We specify `load_in_low_bit="nf4"` here to apply 4-bit NormalFloat optimization. According to the [QLoRA paper](https://arxiv.org/pdf/2305.14314.pdf), using `"nf4"` could yield better model quality than `"int4"`.
@@ -54,11 +54,11 @@ model = get_peft_model(model, config)
 ```eval_rst
 .. important::
 
-   Instead of ``from peft import prepare_model_for_kbit_training, get_peft_model`` as we did for regular QLoRA using bitandbytes and cuda, we import them from ``ipex_llm.transformers.qlora`` here to get a BigDL-LLM compatible Peft model. And the rest is just the same as regular LoRA finetuning process using ``peft``.
+   Instead of ``from peft import prepare_model_for_kbit_training, get_peft_model`` as we did for regular QLoRA using bitandbytes and cuda, we import them from ``ipex_llm.transformers.qlora`` here to get a IPEX-LLM compatible Peft model. And the rest is just the same as regular LoRA finetuning process using ``peft``.
 ```
 
 ```eval_rst
 .. seealso::
 
-   See the complete examples `here <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU>`_
+   See the complete examples `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU>`_
 ```
diff --git a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/gpu_supports.rst b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/gpu_supports.rst
index 4d908c13..6828cb05 100644
--- a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/gpu_supports.rst
+++ b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/gpu_supports.rst
@@ -1,7 +1,7 @@
 GPU Supports
 ================================
 
-BigDL-LLM not only supports running large language models for inference, but also supports QLoRA finetuning on Intel GPUs.
+IPEX-LLM not only supports running large language models for inference, but also supports QLoRA finetuning on Intel GPUs.
 
 * |inference_on_gpu|_
 * `Finetune (QLoRA) <./finetune.html>`_
diff --git a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/hugging_face_format.md b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/hugging_face_format.md
index 387d14d0..0eee498f 100644
--- a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/hugging_face_format.md
+++ b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/hugging_face_format.md
@@ -25,7 +25,7 @@ output = tokenizer.batch_decode(output_ids)
 ```eval_rst
 .. seealso::
 
-   See the complete CPU examples `here <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels>`_ and GPU examples `here <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels>`_.
+   See the complete CPU examples `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels>`_ and GPU examples `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels>`_.
 
 .. note::
 
@@ -35,7 +35,7 @@ output = tokenizer.batch_decode(output_ids)
 
       model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_low_bit="sym_int5")
 
-   See the CPU example `here <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/More-Data-Types>`_ and GPU example `here <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types>`_.
+   See the CPU example `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/More-Data-Types>`_ and GPU example `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types>`_.
 ```
 
 ## Save & Load
@@ -50,5 +50,5 @@ new_model = AutoModelForCausalLM.load_low_bit(model_path)
 ```eval_rst
 .. seealso::
 
-   See the CPU example `here <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Save-Load>`_ and GPU example `here <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Save-Load>`_
+   See the CPU example `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Save-Load>`_ and GPU example `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Save-Load>`_
 ```
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/index.rst b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/index.rst
index 62891dfa..8611f9bd 100644
--- a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/index.rst
+++ b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/index.rst
@@ -1,7 +1,7 @@
-BigDL-LLM Key Features
+IPEX-LLM Key Features
 ================================
 
-You may run the LLMs using ``bigdl-llm`` through one of the following APIs:
+You may run the LLMs using ``ipex-llm`` through one of the following APIs:
 
 * `PyTorch API <./optimize_model.html>`_
 * |transformers_style_api|_
diff --git a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/inference_on_gpu.md b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/inference_on_gpu.md
index 332a5c1f..1a9638e9 100644
--- a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/inference_on_gpu.md
+++ b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/inference_on_gpu.md
@@ -1,6 +1,6 @@
 # Inference on GPU
 
-Apart from the significant acceleration capabilites on Intel CPUs, BigDL-LLM also supports optimizations and acceleration for running LLMs (large language models) on Intel GPUs. With BigDL-LLM, PyTorch models (in FP16/BF16/FP32) can be optimized with low-bit quantizations (supported precisions include INT4, INT5, INT8, etc).
+Apart from the significant acceleration capabilites on Intel CPUs, IPEX-LLM also supports optimizations and acceleration for running LLMs (large language models) on Intel GPUs. With IPEX-LLM, PyTorch models (in FP16/BF16/FP32) can be optimized with low-bit quantizations (supported precisions include INT4, INT5, INT8, etc).
 
 Compared with running on Intel CPUs, some additional operations are required on Intel GPUs. To help you better understand the process, here we use a popular model [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) as an example.
 
@@ -9,14 +9,14 @@ Compared with running on Intel CPUs, some additional operations are required on
 ```eval_rst
 .. note::
 
-   If you are using an older version of ``bigdl-llm`` (specifically, older than 2.5.0b20240104), you need to manually add ``import intel_extension_for_pytorch as ipex`` at the beginning of your code.
+   If you are using an older version of ``ipex-llm`` (specifically, older than 2.5.0b20240104), you need to manually add ``import intel_extension_for_pytorch as ipex`` at the beginning of your code.
 ```
 
 ## Load and Optimize Model
 
 You could choose to use [PyTorch API](./optimize_model.html) or [`transformers`-style API](./transformers_style_api.html) on Intel GPUs according to your preference.
 
-**Once you have the model with BigDL-LLM low bit optimization, set it to `to('xpu')`**.
+**Once you have the model with IPEX-LLM low bit optimization, set it to `to('xpu')`**.
 
 ```eval_rst
 .. tabs::
@@ -32,7 +32,7 @@ You could choose to use [PyTorch API](./optimize_model.html) or [`transformers`-
          from ipex_llm import optimize_model
 
          model = LlamaForCausalLM.from_pretrained('meta-llama/Llama-2-7b-chat-hf', torch_dtype='auto', low_cpu_mem_usage=True)
-         model = optimize_model(model) # With only one line to enable BigDL-LLM INT4 optimization
+         model = optimize_model(model) # With only one line to enable IPEX-LLM INT4 optimization
 
          model = model.to('xpu') # Important after obtaining the optimized model
 
@@ -49,7 +49,7 @@ You could choose to use [PyTorch API](./optimize_model.html) or [`transformers`-
          from transformers import LlamaForCausalLM
          from ipex_llm.optimize import low_memory_init, load_low_bit
 
-         saved_dir='./llama-2-bigdl-llm-4-bit'
+         saved_dir='./llama-2-ipex-llm-4-bit'
          with low_memory_init(): # Fast and low cost by loading model on meta device
             model = LlamaForCausalLM.from_pretrained(saved_dir,
                                                      torch_dtype="auto",
@@ -84,7 +84,7 @@ You could choose to use [PyTorch API](./optimize_model.html) or [`transformers`-
 
          from ipex_llm.transformers import AutoModelForCausalLM
 
-         saved_dir='./llama-2-bigdl-llm-4-bit'
+         saved_dir='./llama-2-ipex-llm-4-bit'
          model = AutoModelForCausalLM.load_low_bit(saved_dir) # Load the optimized model
 
          model = model.to('xpu') # Important after obtaining the optimized model
@@ -124,5 +124,5 @@ with torch.inference_mode():
 ```eval_rst
 .. seealso::
 
-   See the complete examples `here <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU>`_
+   See the complete examples `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU>`_
 ```
diff --git a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/langchain_api.md b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/langchain_api.md
index 962f4b2a..e51d029a 100644
--- a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/langchain_api.md
+++ b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/langchain_api.md
@@ -1,6 +1,6 @@
 # LangChain API
 
-You may run the models using the LangChain API in `bigdl-llm`.
+You may run the models using the LangChain API in `ipex-llm`.
 
 ## Using Hugging Face `transformers` INT4 Format
 
@@ -12,16 +12,16 @@ from ipex_llm.langchain.embeddings import TransformersEmbeddings
 from langchain.chains.question_answering import load_qa_chain
 
 embeddings = TransformersEmbeddings.from_model_id(model_id=model_path)
-bigdl_llm = TransformersLLM.from_model_id(model_id=model_path, ...)
+ipex_llm = TransformersLLM.from_model_id(model_id=model_path, ...)
 
-doc_chain = load_qa_chain(bigdl_llm, ...)
+doc_chain = load_qa_chain(ipex_llm, ...)
 output = doc_chain.run(...)
 ```
 
 ```eval_rst
 .. seealso::
 
-   See the examples `here <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/LangChain/transformers_int4>`_.
+   See the examples `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/LangChain/transformers_int4>`_.
 ```
 
 ## Using Native INT4 Format
@@ -44,14 +44,14 @@ from langchain.chains.question_answering import load_qa_chain
 # switch to ChatGLMEmbeddings/GptneoxEmbeddings/BloomEmbeddings/StarcoderEmbeddings to load other models
 embeddings = LlamaEmbeddings(model_path='/path/to/converted/model.bin')
 # switch to ChatGLMLLM/GptneoxLLM/BloomLLM/StarcoderLLM to load other models
-bigdl_llm = LlamaLLM(model_path='/path/to/converted/model.bin')
+ipex_llm = LlamaLLM(model_path='/path/to/converted/model.bin')
 
-doc_chain = load_qa_chain(bigdl_llm, ...)
+doc_chain = load_qa_chain(ipex_llm, ...)
 doc_chain.run(...)
 ```
 
 ```eval_rst
 .. seealso::
 
-   See the examples `here <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/LangChain/native_int4>`_.
+   See the examples `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/LangChain/native_int4>`_.
 ```
diff --git a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/native_format.md b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/native_format.md
index 49184835..6a0847c0 100644
--- a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/native_format.md
+++ b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/native_format.md
@@ -11,7 +11,7 @@ You may also convert Hugging Face *Transformers* models into native INT4 format
 ```python
 # convert the model
 from ipex_llm import llm_convert
-bigdl_llm_path = llm_convert(model='/path/to/model/',
+ipex_llm_path = llm_convert(model='/path/to/model/',
        outfile='/path/to/output/', outtype='int4', model_family="llama")
 
 # load the converted model
@@ -28,5 +28,5 @@ output = llm.batch_decode(output_ids)
 ```eval_rst
 .. seealso::
    
-   See the complete example `here <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/Native-Models>`_
+   See the complete example `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/Native-Models>`_
 ```
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/optimize_model.md b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/optimize_model.md
index 9c640e8d..f6d3c02b 100644
--- a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/optimize_model.md
+++ b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/optimize_model.md
@@ -1,6 +1,6 @@
 ## PyTorch API
 
-In general, you just need one-line `optimize_model` to easily optimize any loaded PyTorch model, regardless of the library or API you are using. With BigDL-LLM, PyTorch models (in FP16/BF16/FP32) can be optimized with low-bit quantizations (supported precisions include INT4, INT5, INT8, etc).
+In general, you just need one-line `optimize_model` to easily optimize any loaded PyTorch model, regardless of the library or API you are using. With IPEX-LLM, PyTorch models (in FP16/BF16/FP32) can be optimized with low-bit quantizations (supported precisions include INT4, INT5, INT8, etc).
 
 ### Optimize model
 
@@ -16,11 +16,11 @@ Then, just need to call `optimize_model` to optimize the loaded model and INT4 o
 ```python
 from ipex_llm import optimize_model
 
-# With only one line to enable BigDL-LLM INT4 optimization
+# With only one line to enable IPEX-LLM INT4 optimization
 model = optimize_model(model)
 ```
 
-After optimizing the model, BigDL-LLM does not require any change in the inference code. You can use any libraries to run the optimized model with very low latency.
+After optimizing the model, IPEX-LLM does not require any change in the inference code. You can use any libraries to run the optimized model with very low latency.
 
 ### More Precisions
 
@@ -44,7 +44,7 @@ The loading process of the original model may be time-consuming and memory-inten
 
 Continuing with the [example of Llama-2-7b-chat-hf](#optimize-model), we can save the previously optimized model as follows:
 ```python
-saved_dir='./llama-2-bigdl-llm-4-bit'
+saved_dir='./llama-2-ipex-llm-4-bit'
 model.save_low_bit(saved_dir)
 ```
 #### Load
@@ -63,7 +63,7 @@ model = load_low_bit(model, saved_dir) # Load the optimized model
 ```eval_rst
 .. seealso::
 
-   * Please refer to the `API documentation <https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/LLM/optimize.html>`_ for more details.
+   * Please refer to the `API documentation <https://ipex-llm.readthedocs.io/en/latest/doc/PythonAPI/LLM/optimize.html>`_ for more details.
 
-   * We also provide detailed examples on how to run PyTorch models (e.g., Openai Whisper, LLaMA2, ChatGLM2, Falcon, MPT, Baichuan2, etc.) using BigDL-LLM. See the complete CPU examples `here <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models>`_ and GPU examples `here <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models>`_.
+   * We also provide detailed examples on how to run PyTorch models (e.g., Openai Whisper, LLaMA2, ChatGLM2, Falcon, MPT, Baichuan2, etc.) using IPEX-LLM. See the complete CPU examples `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/PyTorch-Models>`_ and GPU examples `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/PyTorch-Models>`_.
 ```
diff --git a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/transformers_style_api.rst b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/transformers_style_api.rst
index 2e4723a2..07fad70b 100644
--- a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/transformers_style_api.rst
+++ b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/transformers_style_api.rst
@@ -1,7 +1,7 @@
 ``transformers``-style API
 ================================
 
-You may run the LLMs using ``transformers``-style API in ``bigdl-llm``.
+You may run the LLMs using ``transformers``-style API in ``ipex-llm``.
 
 * |hugging_face_transformers_format|_
 * `Native Format <./native_format.html>`_
diff --git a/docs/readthedocs/source/doc/LLM/Overview/examples.rst b/docs/readthedocs/source/doc/LLM/Overview/examples.rst
index c531d8b7..89e9a8dd 100644
--- a/docs/readthedocs/source/doc/LLM/Overview/examples.rst
+++ b/docs/readthedocs/source/doc/LLM/Overview/examples.rst
@@ -1,9 +1,9 @@
-BigDL-LLM Examples
+IPEX-LLM Examples
 ================================
 
-You can use BigDL-LLM to run any PyTorch model with INT4 optimizations on Intel XPU (from Laptop to GPU to Cloud).
+You can use IPEX-LLM to run any PyTorch model with INT4 optimizations on Intel XPU (from Laptop to GPU to Cloud).
 
-Here, we provide examples to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Please refer to the appropriate guide based on your device:
+Here, we provide examples to help you quickly get started using IPEX-LLM to run some popular open-source models in the community. Please refer to the appropriate guide based on your device:
 
 * `CPU <./examples_cpu.html>`_
 * `GPU <./examples_gpu.html>`_
diff --git a/docs/readthedocs/source/doc/LLM/Overview/examples_cpu.md b/docs/readthedocs/source/doc/LLM/Overview/examples_cpu.md
index 462231fb..f715e638 100644
--- a/docs/readthedocs/source/doc/LLM/Overview/examples_cpu.md
+++ b/docs/readthedocs/source/doc/LLM/Overview/examples_cpu.md
@@ -1,8 +1,8 @@
-# BigDL-LLM Examples: CPU
+# IPEX-LLM Examples: CPU
 
-Here, we provide some examples on how you could apply BigDL-LLM INT4 optimizations on popular open-source models in the community.
+Here, we provide some examples on how you could apply IPEX-LLM INT4 optimizations on popular open-source models in the community.
 
-To run these examples, please first refer to [here](./install_cpu.html) for more information about how to install ``bigdl-llm``, requirements and best practices for setting up your environment.
+To run these examples, please first refer to [here](./install_cpu.html) for more information about how to install ``ipex-llm``, requirements and best practices for setting up your environment.
 
 The following models have been verified on either servers or laptops with Intel CPUs.
 
@@ -10,17 +10,17 @@ The following models have been verified on either servers or laptops with Intel
 
 | Model      | Example of PyTorch API                                |
 |------------|-------------------------------------------------------|
-| LLaMA 2    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/llama2)  |
-| ChatGLM    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/chatglm) |
-| Mistral    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/mistral) |
-| Bark       | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/bark)    |
-| BERT       | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/bert)    |
-| Openai Whisper    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/openai-whisper) |
+| LLaMA 2    | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/PyTorch-Models/Model/llama2)  |
+| ChatGLM    | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/PyTorch-Models/Model/chatglm) |
+| Mistral    | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/PyTorch-Models/Model/mistral) |
+| Bark       | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/PyTorch-Models/Model/bark)    |
+| BERT       | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/PyTorch-Models/Model/bert)    |
+| Openai Whisper    | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/PyTorch-Models/Model/openai-whisper) |
 
 ```eval_rst
 .. important::
 
-   In addition to INT4 optimization, BigDL-LLM also provides other low bit optimizations (such as INT8, INT5, NF4, etc.). You may apply other low bit optimizations through PyTorch API as `example <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/More-Data-Types>`_.
+   In addition to INT4 optimization, IPEX-LLM also provides other low bit optimizations (such as INT8, INT5, NF4, etc.). You may apply other low bit optimizations through PyTorch API as `example <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/PyTorch-Models/More-Data-Types>`_.
 ```
 
 
@@ -28,37 +28,37 @@ The following models have been verified on either servers or laptops with Intel
 
 | Model      | Example of `transformers`-style API                   |
 |------------|-------------------------------------------------------|
-| LLaMA *(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.)* | [link1](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/Native-Models), [link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/vicuna) |
-| LLaMA 2    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/llama2) | [link1](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/Native-Models), [link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/llama2) |
-| ChatGLM    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/chatglm) | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm)   |
-| ChatGLM2   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm2)  |
-| Mistral    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mistral)   |
-| Falcon     | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/falcon)    |
-| MPT        | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mpt)       |
-| Dolly-v1   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v1)  |
-| Dolly-v2   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v2)  |
-| Replit Code| [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/replit)    |
-| RedPajama  | [link1](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/Native-Models), [link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/redpajama) |
-| Phoenix    | [link1](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/Native-Models), [link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/phoenix)   |
-| StarCoder  | [link1](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/Native-Models), [link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/starcoder) |
-| Baichuan   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan)  |
-| Baichuan2  | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan2) |
-| InternLM   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm)  |
-| Qwen       | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen)      |
-| Aquila     | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/aquila)    |
-| MOSS       | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/moss)      |
-| Whisper    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/whisper)   |
+| LLaMA *(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.)* | [link1](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/Native-Models), [link2](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/vicuna) |
+| LLaMA 2    | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/PyTorch-Models/Model/llama2) | [link1](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/Native-Models), [link2](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/llama2) |
+| ChatGLM    | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/PyTorch-Models/Model/chatglm) | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm)   |
+| ChatGLM2   | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm2)  |
+| Mistral    | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mistral)   |
+| Falcon     | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/falcon)    |
+| MPT        | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mpt)       |
+| Dolly-v1   | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v1)  |
+| Dolly-v2   | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v2)  |
+| Replit Code| [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/replit)    |
+| RedPajama  | [link1](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/Native-Models), [link2](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/redpajama) |
+| Phoenix    | [link1](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/Native-Models), [link2](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/phoenix)   |
+| StarCoder  | [link1](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/Native-Models), [link2](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/starcoder) |
+| Baichuan   | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan)  |
+| Baichuan2  | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan2) |
+| InternLM   | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm)  |
+| Qwen       | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen)      |
+| Aquila     | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/aquila)    |
+| MOSS       | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/moss)      |
+| Whisper    | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/whisper)   |
 
 ```eval_rst
 .. important::
 
-   In addition to INT4 optimization, BigDL-LLM also provides other low bit optimizations (such as INT8, INT5, NF4, etc.). You may apply other low bit optimizations through ``transformers``-style API as `example <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/More-Data-Types>`_.
+   In addition to INT4 optimization, IPEX-LLM also provides other low bit optimizations (such as INT8, INT5, NF4, etc.). You may apply other low bit optimizations through ``transformers``-style API as `example <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/More-Data-Types>`_.
 ```
 
 
 ```eval_rst
 .. seealso::
 
-   See the complete examples `here <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU>`_.
+   See the complete examples `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU>`_.
 ```
 
diff --git a/docs/readthedocs/source/doc/LLM/Overview/examples_gpu.md b/docs/readthedocs/source/doc/LLM/Overview/examples_gpu.md
index b5504cbb..8eea9f9f 100644
--- a/docs/readthedocs/source/doc/LLM/Overview/examples_gpu.md
+++ b/docs/readthedocs/source/doc/LLM/Overview/examples_gpu.md
@@ -1,8 +1,8 @@
-# BigDL-LLM Examples: GPU
+# IPEX-LLM Examples: GPU
 
-Here, we provide some examples on how you could apply BigDL-LLM INT4 optimizations on popular open-source models in the community.
+Here, we provide some examples on how you could apply IPEX-LLM INT4 optimizations on popular open-source models in the community.
 
-To run these examples, please first refer to [here](./install_gpu.html) for more information about how to install ``bigdl-llm``, requirements and best practices for setting up your environment.
+To run these examples, please first refer to [here](./install_gpu.html) for more information about how to install ``ipex-llm``, requirements and best practices for setting up your environment.
 
 ```eval_rst
 .. important::
@@ -16,20 +16,20 @@ The following models have been verified on either servers or laptops with Intel
 
 | Model      | Example of PyTorch API                                |
 |------------|-------------------------------------------------------|
-| LLaMA 2    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/llama2)    |
-| ChatGLM 2  | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/chatglm2)  |
-| Mistral    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/mistral)   |
-| Baichuan   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/baichuan)  |
-| Baichuan2  | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/baichuan2) |
-| Replit     | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/replit)    |
-| StarCoder  | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/starcoder) |
-| Dolly-v1   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/dolly-v1)  |
-| Dolly-v2   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/dolly-v2)  |
+| LLaMA 2    | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/PyTorch-Models/Model/llama2)    |
+| ChatGLM 2  | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/PyTorch-Models/Model/chatglm2)  |
+| Mistral    | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/PyTorch-Models/Model/mistral)   |
+| Baichuan   | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/PyTorch-Models/Model/baichuan)  |
+| Baichuan2  | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/PyTorch-Models/Model/baichuan2) |
+| Replit     | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/PyTorch-Models/Model/replit)    |
+| StarCoder  | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/PyTorch-Models/Model/starcoder) |
+| Dolly-v1   | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/PyTorch-Models/Model/dolly-v1)  |
+| Dolly-v2   | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/PyTorch-Models/Model/dolly-v2)  |
 
 ```eval_rst
 .. important::
 
-   In addition to INT4 optimization, BigDL-LLM also provides other low bit optimizations (such as INT8, INT5, NF4, etc.). You may apply other low bit optimizations through PyTorch API as `example <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/More-Data-Types>`_.
+   In addition to INT4 optimization, IPEX-LLM also provides other low bit optimizations (such as INT8, INT5, NF4, etc.). You may apply other low bit optimizations through PyTorch API as `example <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/PyTorch-Models/More-Data-Types>`_.
 ```
 
 
@@ -37,34 +37,34 @@ The following models have been verified on either servers or laptops with Intel
 
 | Model      | Example of `transformers`-style API                   |
 |------------|-------------------------------------------------------|
-| LLaMA *(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.)* |[link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/vicuna)|
-| LLaMA 2    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2) |
-| ChatGLM2   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2)   |
-| Mistral    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral)    |
-| Falcon     | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon)     |
-| MPT        | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mpt)        |
-| Dolly-v1   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v1)   | 
-| Dolly-v2   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v2)   | 
-| Replit     | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/replit)     |
-| StarCoder  | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder)  | 
-| Baichuan   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan)   |
-| Baichuan2  | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2)  |
-| InternLM   | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm)   |
-| Qwen       | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen)       |
-| Aquila     | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila)     |
-| Whisper    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper)    |
-| Chinese Llama2	    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chinese-llama2)    |
-| GPT-J    | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gpt-j)    |
+| LLaMA *(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.)* |[link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/vicuna)|
+| LLaMA 2    | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2) |
+| ChatGLM2   | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2)   |
+| Mistral    | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral)    |
+| Falcon     | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon)     |
+| MPT        | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mpt)        |
+| Dolly-v1   | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v1)   | 
+| Dolly-v2   | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v2)   | 
+| Replit     | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/replit)     |
+| StarCoder  | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder)  | 
+| Baichuan   | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan)   |
+| Baichuan2  | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2)  |
+| InternLM   | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm)   |
+| Qwen       | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen)       |
+| Aquila     | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila)     |
+| Whisper    | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper)    |
+| Chinese Llama2	    | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chinese-llama2)    |
+| GPT-J    | [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gpt-j)    |
 
 ```eval_rst
 .. important::
 
-   In addition to INT4 optimization, BigDL-LLM also provides other low bit optimizations (such as INT8, INT5, NF4, etc.). You may apply other low bit optimizations through ``transformers``-style API as `example <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types>`_.
+   In addition to INT4 optimization, IPEX-LLM also provides other low bit optimizations (such as INT8, INT5, NF4, etc.). You may apply other low bit optimizations through ``transformers``-style API as `example <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types>`_.
 ```
 
 
 ```eval_rst
 .. seealso::
 
-   See the complete examples `here <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU>`_.
+   See the complete examples `here <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU>`_.
 ```
diff --git a/docs/readthedocs/source/doc/LLM/Overview/install.rst b/docs/readthedocs/source/doc/LLM/Overview/install.rst
index 1ec5ea2d..ff2d94e1 100644
--- a/docs/readthedocs/source/doc/LLM/Overview/install.rst
+++ b/docs/readthedocs/source/doc/LLM/Overview/install.rst
@@ -1,7 +1,7 @@
-BigDL-LLM Installation
+IPEX-LLM Installation
 ================================
 
-Here, we provide instructions on how to install ``bigdl-llm`` and best practices for setting up your environment. Please refer to the appropriate guide based on your device:
+Here, we provide instructions on how to install ``ipex-llm`` and best practices for setting up your environment. Please refer to the appropriate guide based on your device:
 
 * `CPU <./install_cpu.html>`_
 * `GPU <./install_gpu.html>`_
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/LLM/Overview/install_cpu.md b/docs/readthedocs/source/doc/LLM/Overview/install_cpu.md
index 4285327d..bb2b952c 100644
--- a/docs/readthedocs/source/doc/LLM/Overview/install_cpu.md
+++ b/docs/readthedocs/source/doc/LLM/Overview/install_cpu.md
@@ -1,11 +1,11 @@
-# BigDL-LLM Installation: CPU
+# IPEX-LLM Installation: CPU
 
 ## Quick Installation
 
-Install BigDL-LLM for CPU supports using pip through:
+Install IPEX-LLM for CPU supports using pip through:
 
 ```bash
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 ```
 
 Please refer to [Environment Setup](#environment-setup) for more information.
@@ -17,12 +17,12 @@ Please refer to [Environment Setup](#environment-setup) for more information.
 
 .. important::
 
-   ``bigdl-llm`` is tested with Python 3.9, 3.10 and 3.11; Python 3.9 is recommended for best practices.
+   ``ipex-llm`` is tested with Python 3.9, 3.10 and 3.11; Python 3.9 is recommended for best practices.
 ```
 
 ## Recommended Requirements
 
-Here list the recommended hardware and OS for smooth BigDL-LLM optimization experiences on CPU:
+Here list the recommended hardware and OS for smooth IPEX-LLM optimization experiences on CPU:
 
 * Hardware
 
@@ -37,7 +37,7 @@ Here list the recommended hardware and OS for smooth BigDL-LLM optimization expe
 
 ## Environment Setup
 
-For optimal performance with LLM models using BigDL-LLM optimizations on Intel CPUs, here are some best practices for setting up environment:
+For optimal performance with LLM models using IPEX-LLM optimizations on Intel CPUs, here are some best practices for setting up environment:
 
 First we recommend using [Conda](https://docs.conda.io/en/latest/miniconda.html) to create a python 3.9 enviroment:
 
@@ -45,10 +45,10 @@ First we recommend using [Conda](https://docs.conda.io/en/latest/miniconda.html)
 conda create -n llm python=3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 ```
 
-Then for running a LLM model with BigDL-LLM optimizations (taking an `example.py` an example):
+Then for running a LLM model with IPEX-LLM optimizations (taking an `example.py` an example):
 
 ```eval_rst	
 .. tabs::
diff --git a/docs/readthedocs/source/doc/LLM/Overview/install_gpu.md b/docs/readthedocs/source/doc/LLM/Overview/install_gpu.md
index 7b3a901d..1f1de081 100644
--- a/docs/readthedocs/source/doc/LLM/Overview/install_gpu.md
+++ b/docs/readthedocs/source/doc/LLM/Overview/install_gpu.md
@@ -1,15 +1,15 @@
-# BigDL-LLM Installation: GPU
+# IPEX-LLM Installation: GPU
 
 ## Windows
 
 ### Prerequisites
 
-BigDL-LLM on Windows supports Intel iGPU and dGPU.
+IPEX-LLM on Windows supports Intel iGPU and dGPU.
 
 ```eval_rst
 .. important::
 
-    BigDL-LLM on Windows only supports PyTorch 2.1.
+    IPEX-LLM on Windows only supports PyTorch 2.1.
 ```
 
 To apply Intel GPU acceleration, there're several prerequisite steps for tools installation and environment preparation:
@@ -40,28 +40,28 @@ Intel® oneAPI Base Toolkit 2024.0 installation methods:
          Activating your working conda environment will automatically configure oneAPI environment variables.
 ```
 
-### Install BigDL-LLM From PyPI
+### Install IPEX-LLM From PyPI
 
 We recommend using [miniconda](https://docs.conda.io/en/latest/miniconda.html) to create a python 3.9 enviroment:
 
 ```eval_rst
 .. important::
 
-   ``bigdl-llm`` is tested with Python 3.9, 3.10 and 3.11. Python 3.9 is recommended for best practices.
+   ``ipex-llm`` is tested with Python 3.9, 3.10 and 3.11. Python 3.9 is recommended for best practices.
 ```
 
-The easiest ways to install `bigdl-llm` is the following commands:
+The easiest ways to install `ipex-llm` is the following commands:
 
 ```
 conda create -n llm python=3.9 libuv
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
-### Install BigDL-LLM From Wheel
+### Install IPEX-LLM From Wheel
 
-If you encounter network issues when installing IPEX, you can also install BigDL-LLM dependencies for Intel XPU from source archives. First you need to download and install torch/torchvision/ipex from wheels listed below before installing `bigdl-llm`.
+If you encounter network issues when installing IPEX, you can also install IPEX-LLM dependencies for Intel XPU from source archives. First you need to download and install torch/torchvision/ipex from wheels listed below before installing `ipex-llm`.
 
 Download the wheels on Windows system:
 
@@ -71,14 +71,14 @@ wget https://intel-extension-for-pytorch.s3.amazonaws.com/ipex_stable/xpu/torchv
 wget https://intel-extension-for-pytorch.s3.amazonaws.com/ipex_stable/xpu/intel_extension_for_pytorch-2.1.10%2Bxpu-cp39-cp39-win_amd64.whl
 ```
 
-You may install dependencies directly from the wheel archives and then install `bigdl-llm` using following commands:
+You may install dependencies directly from the wheel archives and then install `ipex-llm` using following commands:
 
 ```
 pip install torch-2.1.0a0+cxx11.abi-cp39-cp39-win_amd64.whl
 pip install torchvision-0.16.0a0+cxx11.abi-cp39-cp39-win_amd64.whl
 pip install intel_extension_for_pytorch-2.1.10+xpu-cp39-cp39-win_amd64.whl
 
-pip install --pre --upgrade bigdl-llm[xpu]
+pip install --pre --upgrade ipex-llm[xpu]
 ```
 
 ```eval_rst
@@ -154,7 +154,7 @@ If you met error when importing `intel_extension_for_pytorch`, please ensure tha
 
 ### Prerequisites
 
-BigDL-LLM GPU support on Linux has been verified on:
+IPEX-LLM GPU support on Linux has been verified on:
 
 * Intel Arc™ A-Series Graphics
 * Intel Data Center GPU Flex Series
@@ -163,7 +163,7 @@ BigDL-LLM GPU support on Linux has been verified on:
 ```eval_rst
 .. important::
 
-    BigDL-LLM on Linux supports PyTorch 2.0 and PyTorch 2.1.
+    IPEX-LLM on Linux supports PyTorch 2.0 and PyTorch 2.1.
 ```
 
 ```eval_rst
@@ -176,7 +176,7 @@ BigDL-LLM GPU support on Linux has been verified on:
 .. tabs::
    .. tab:: PyTorch 2.1
 
-      To enable BigDL-LLM for Intel GPUs with PyTorch 2.1, here are several prerequisite steps for tools installation and environment preparation:
+      To enable IPEX-LLM for Intel GPUs with PyTorch 2.1, here are several prerequisite steps for tools installation and environment preparation:
 
 
       * Step 1: Install Intel GPU Driver version >= stable_775_20_20231219. We highly recommend installing the latest version of intel-i915-dkms using apt.
@@ -213,7 +213,7 @@ BigDL-LLM GPU support on Linux has been verified on:
 
             .. note::
                You can view the configured environment variables for your environment (e.g. with name ``llm``) by running ``conda env config vars list -n llm``.
-               You can continue with your working conda environment and install ``bigdl-llm`` as guided in the next section.
+               You can continue with your working conda environment and install ``ipex-llm`` as guided in the next section.
 
             .. note::
 
@@ -269,7 +269,7 @@ BigDL-LLM GPU support on Linux has been verified on:
 
    .. tab:: PyTorch 2.0
 
-      To enable BigDL-LLM for Intel GPUs with PyTorch 2.0, here're several prerequisite steps for tools installation and environment preparation:
+      To enable IPEX-LLM for Intel GPUs with PyTorch 2.0, here're several prerequisite steps for tools installation and environment preparation:
 
 
       * Step 1: Install Intel GPU Driver version >= stable_775_20_20231219. Highly recommend installing the latest version of intel-i915-dkms using apt.
@@ -306,7 +306,7 @@ BigDL-LLM GPU support on Linux has been verified on:
 
             .. note::
                You can view the configured environment variables for your environment (e.g. with name ``llm``) by running ``conda env config vars list -n llm``.
-               You can continue with your working conda environment and install ``bigdl-llm`` as guided in the next section.
+               You can continue with your working conda environment and install ``ipex-llm`` as guided in the next section.
 
             .. note::
 
@@ -369,19 +369,19 @@ BigDL-LLM GPU support on Linux has been verified on:
                   sudo ./installer
 ```
 
-### Install BigDL-LLM From PyPI
+### Install IPEX-LLM From PyPI
 
 We recommend using [miniconda](https://docs.conda.io/en/latest/miniconda.html) to create a python 3.9 enviroment:
 
 ```eval_rst
 .. important::
 
-   ``bigdl-llm`` is tested with Python 3.9, 3.10 and 3.11. Python 3.9 is recommended for best practices.
+   ``ipex-llm`` is tested with Python 3.9, 3.10 and 3.11. Python 3.9 is recommended for best practices.
 ```
 
 ```eval_rst
 .. important::
-   Make sure you install matching versions of BigDL-LLM/pytorch/IPEX and oneAPI Base Toolkit. BigDL-LLM with Pytorch 2.1 should be used with oneAPI Base Toolkit version 2024.0. BigDL-LLM with Pytorch 2.0 should be used with oneAPI Base Toolkit version 2023.2.
+   Make sure you install matching versions of ipex-llm/pytorch/IPEX and oneAPI Base Toolkit. IPEX-LLM with Pytorch 2.1 should be used with oneAPI Base Toolkit version 2024.0. IPEX-LLM with Pytorch 2.0 should be used with oneAPI Base Toolkit version 2023.2.
 ```
 
 ```eval_rst
@@ -393,15 +393,15 @@ We recommend using [miniconda](https://docs.conda.io/en/latest/miniconda.html) t
          conda create -n llm python=3.9
          conda activate llm
 
-         pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+         pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 
       .. note::
 
-         The ``xpu`` option will install BigDL-LLM with PyTorch 2.1 by default, which is equivalent to
+         The ``xpu`` option will install IPEX-LLM with PyTorch 2.1 by default, which is equivalent to
 
          .. code-block:: bash
 
-            pip install --pre --upgrade bigdl-llm[xpu_2.1] -f https://developer.intel.com/ipex-whl-stable-xpu
+            pip install --pre --upgrade ipex-llm[xpu_2.1] -f https://developer.intel.com/ipex-whl-stable-xpu
             
 
    .. tab:: PyTorch 2.0
@@ -411,13 +411,13 @@ We recommend using [miniconda](https://docs.conda.io/en/latest/miniconda.html) t
          conda create -n llm python=3.9
          conda activate llm
 
-         pip install --pre --upgrade bigdl-llm[xpu_2.0] -f https://developer.intel.com/ipex-whl-stable-xpu
+         pip install --pre --upgrade ipex-llm[xpu_2.0] -f https://developer.intel.com/ipex-whl-stable-xpu
 
 ```
 
-### Install BigDL-LLM From Wheel
+### Install IPEX-LLM From Wheel
 
-If you encounter network issues when installing IPEX, you can also install BigDL-LLM dependencies for Intel XPU from source archives. First you need to download and install torch/torchvision/ipex from wheels listed below before installing `bigdl-llm`.
+If you encounter network issues when installing IPEX, you can also install IPEX-LLM dependencies for Intel XPU from source archives. First you need to download and install torch/torchvision/ipex from wheels listed below before installing `ipex-llm`.
 
 ```eval_rst
 .. tabs::
@@ -439,8 +439,8 @@ If you encounter network issues when installing IPEX, you can also install BigDL
          pip install torchvision-0.16.0a0+cxx11.abi-cp39-cp39-linux_x86_64.whl
          pip install intel_extension_for_pytorch-2.1.10+xpu-cp39-cp39-linux_x86_64.whl
 
-         # install bigdl-llm for Intel GPU
-         pip install --pre --upgrade bigdl-llm[xpu]
+         # install ipex-llm for Intel GPU
+         pip install --pre --upgrade ipex-llm[xpu]
 
    .. tab:: PyTorch 2.0
 
@@ -460,8 +460,8 @@ If you encounter network issues when installing IPEX, you can also install BigDL
          pip install torchvision-0.15.2a0+cxx11.abi-cp39-cp39-linux_x86_64.whl
          pip install intel_extension_for_pytorch-2.0.110+xpu-cp39-cp39-linux_x86_64.whl
 
-         # install bigdl-llm for Intel GPU
-         pip install --pre --upgrade bigdl-llm[xpu_2.0]
+         # install ipex-llm for Intel GPU
+         pip install --pre --upgrade ipex-llm[xpu_2.0]
 
 ```
 
@@ -543,8 +543,8 @@ OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or
 Error: libmkl_sycl_blas.so.4: cannot open shared object file: No such file or directory
 ```
 
-The reason for such errors is that oneAPI has not been initialized properly before running BigDL-LLM code or before importing IPEX package.
+The reason for such errors is that oneAPI has not been initialized properly before running IPEX-LLM code or before importing IPEX package.
 
-* For oneAPI installed using APT or Offline Installer, make sure you execute `setvars.sh` of oneAPI Base Toolkit before running BigDL-LLM.
+* For oneAPI installed using APT or Offline Installer, make sure you execute `setvars.sh` of oneAPI Base Toolkit before running IPEX-LLM.
 * For PIP-installed oneAPI, activate your working environment and run ``echo $LD_LIBRARY_PATH`` to check if the installation path is properly configured for the environment. If the output does not contain oneAPI path (e.g. ``~/intel/oneapi/lib``), check [Prerequisites](#id1) to re-install oneAPI with PIP installer.
-* Make sure you install matching versions of BigDL-LLM/pytorch/IPEX and oneAPI Base Toolkit. BigDL-LLM with PyTorch 2.1 should be used with oneAPI Base Toolkit version 2024.0. BigDL-LLM with PyTorch 2.0 should be used with oneAPI Base Toolkit version 2023.2.
+* Make sure you install matching versions of ipex-llm/pytorch/IPEX and oneAPI Base Toolkit. IPEX-LLM with PyTorch 2.1 should be used with oneAPI Base Toolkit version 2024.0. IPEX-LLM with PyTorch 2.0 should be used with oneAPI Base Toolkit version 2023.2.
diff --git a/docs/readthedocs/source/doc/LLM/Overview/known_issues.md b/docs/readthedocs/source/doc/LLM/Overview/known_issues.md
index 23a521a5..5b2621db 100644
--- a/docs/readthedocs/source/doc/LLM/Overview/known_issues.md
+++ b/docs/readthedocs/source/doc/LLM/Overview/known_issues.md
@@ -1 +1 @@
-# BigDL-LLM Known Issues
\ No newline at end of file
+# IPEX-LLM Known Issues
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/LLM/Overview/llm.md b/docs/readthedocs/source/doc/LLM/Overview/llm.md
index 7f7d4194..ef0cba3a 100644
--- a/docs/readthedocs/source/doc/LLM/Overview/llm.md
+++ b/docs/readthedocs/source/doc/LLM/Overview/llm.md
@@ -1,14 +1,14 @@
-# BigDL-LLM in 5 minutes
+# IPEX-LLM in 5 minutes
 
-You can use BigDL-LLM to run any [*Hugging Face Transformers*](https://huggingface.co/docs/transformers/index) PyTorch model. It automatically optimizes and accelerates LLMs using low-precision (INT4/INT5/INT8) techniques, modern hardware accelerations and latest software optimizations.
+You can use IPEX-LLM to run any [*Hugging Face Transformers*](https://huggingface.co/docs/transformers/index) PyTorch model. It automatically optimizes and accelerates LLMs using low-precision (INT4/INT5/INT8) techniques, modern hardware accelerations and latest software optimizations.
 
-Hugging Face transformers-based applications can run on BigDL-LLM with one-line code change, and you'll immediately observe significant speedup<sup><a href="#footnote-perf" id="ref-perf">[1]</a></sup>.
+Hugging Face transformers-based applications can run on IPEX-LLM with one-line code change, and you'll immediately observe significant speedup<sup><a href="#footnote-perf" id="ref-perf">[1]</a></sup>.
 
-Here, let's take a relatively small LLM model, i.e [open_llama_3b_v2](https://huggingface.co/openlm-research/open_llama_3b_v2), and BigDL-LLM INT4 optimizations as an example.
+Here, let's take a relatively small LLM model, i.e [open_llama_3b_v2](https://huggingface.co/openlm-research/open_llama_3b_v2), and IPEX-LLM INT4 optimizations as an example.
 
 ## Load a Pretrained Model
 
-Simply use one-line `transformers`-style API in `bigdl-llm` to load `open_llama_3b_v2` with INT4 optimization (by specifying `load_in_4bit=True`) as follows:
+Simply use one-line `transformers`-style API in `ipex-llm` to load `open_llama_3b_v2` with INT4 optimization (by specifying `load_in_4bit=True`) as follows:
 
 ```python
 from ipex_llm.transformers import AutoModelForCausalLM
@@ -20,7 +20,7 @@ model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path="open
 ```eval_rst
 .. tip::
 
-   `open_llama_3b_v2 <https://huggingface.co/openlm-research/open_llama_3b_v2>`_ is a pretrained large language model hosted on Hugging Face. ``openlm-research/open_llama_3b_v2`` is its Hugging Face model id. ``from_pretrained`` will automatically download the model from Hugging Face to a local cache path (e.g. ``~/.cache/huggingface``), load the model, and converted it to ``bigdl-llm`` INT4 format.
+   `open_llama_3b_v2 <https://huggingface.co/openlm-research/open_llama_3b_v2>`_ is a pretrained large language model hosted on Hugging Face. ``openlm-research/open_llama_3b_v2`` is its Hugging Face model id. ``from_pretrained`` will automatically download the model from Hugging Face to a local cache path (e.g. ``~/.cache/huggingface``), load the model, and converted it to ``ipex-llm`` INT4 format.
 
    It may take a long time to download the model using API. You can also download the model yourself, and set ``pretrained_model_name_or_path`` to the local path of the downloaded model. This way, ``from_pretrained`` will load and convert directly from local path without download.
 ```
@@ -62,7 +62,7 @@ with torch.inference_mode():
 <div>
     <p>
         <sup><a href="#ref-perf" id="footnote-perf">[1]</a>
-            Performance varies by use, configuration and other factors. <code><span>bigdl-llm</span></code> may not optimize to the same degree for non-Intel products. Learn more at <a href="https://www.Intel.com/PerformanceIndex">www.Intel.com/PerformanceIndex</a>.
+            Performance varies by use, configuration and other factors. <code><span>ipex-llm</span></code> may not optimize to the same degree for non-Intel products. Learn more at <a href="https://www.Intel.com/PerformanceIndex">www.Intel.com/PerformanceIndex</a>.
         </sup>
     </p>
 </div>
diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/benchmark_quickstart.md b/docs/readthedocs/source/doc/LLM/Quickstart/benchmark_quickstart.md
index f2f0d4f3..9165778e 100644
--- a/docs/readthedocs/source/doc/LLM/Quickstart/benchmark_quickstart.md
+++ b/docs/readthedocs/source/doc/LLM/Quickstart/benchmark_quickstart.md
@@ -1,10 +1,10 @@
-# BigDL-LLM Benchmarking
+# IPEX-LLM Benchmarking
 
-We can do benchmarking for BigDL-LLM on Intel CPUs and GPUs using the benchmark scripts we provide.
+We can do benchmarking for IPEX-LLM on Intel CPUs and GPUs using the benchmark scripts we provide.
 
 ## Prepare The Environment
 
-You can refer to [here](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install.html) to install BigDL-LLM in your environment. The following dependencies are also needed to run the benchmark scripts.
+You can refer to [here](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install.html) to install IPEX-LLM in your environment. The following dependencies are also needed to run the benchmark scripts.
 
 ```
 pip install pandas
@@ -13,12 +13,12 @@ pip install omegaconf
 
 ## Prepare The Scripts
 
-Navigate to your local workspace and then download BigDL from GitHub. Modify the `config.yaml` under `all-in-one` folder for your own benchmark configurations.
+Navigate to your local workspace and then download IPEX-LLM from GitHub. Modify the `config.yaml` under `all-in-one` folder for your own benchmark configurations.
 
 ```
 cd your/local/workspace
-git clone https://github.com/intel-analytics/BigDL.git
-cd BigDL/python/llm/dev/benchmark/all-in-one/
+git clone https://github.com/intel-analytics/ipex-llm.git
+cd ipex-llm/python/llm/dev/benchmark/all-in-one/
 ```
 
 ## Configure YAML File
@@ -55,7 +55,7 @@ Some parameters in the yaml file that you can configure:
 
 ## Run on Windows
 
-Please refer to [here](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration) to configure oneAPI environment variables.
+Please refer to [here](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration) to configure oneAPI environment variables.
 
 ```eval_rst
 .. tabs::
@@ -144,4 +144,4 @@ Please refer to [here](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/i
 
 ## Result
 
-After the script runnning is completed, you can obtain a CSV result file under the current folder. You can mainly look at the results of columns `1st token avg latency (ms)` and `2+ avg latency (ms/token)` for  performance results. You can also check whether the column `actual input/output tokens` is consistent with the column `input/output tokens` and whether the parameters you specified in config.yaml have been successfully applied in the benchmarking.
\ No newline at end of file
+After the script runnning is completed, you can obtain a CSV result file under the current folder. You can mainly look at the results of columns `1st token avg latency (ms)` and `2+ avg latency (ms/token)` for  performance results. You can also check whether the column `actual input/output tokens` is consistent with the column `input/output tokens` and whether the parameters you specified in config.yaml have been successfully applied in the benchmarking.
diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/docker_windows_gpu.md b/docs/readthedocs/source/doc/LLM/Quickstart/docker_windows_gpu.md
index 3b581920..f19283ef 100644
--- a/docs/readthedocs/source/doc/LLM/Quickstart/docker_windows_gpu.md
+++ b/docs/readthedocs/source/doc/LLM/Quickstart/docker_windows_gpu.md
@@ -1,6 +1,6 @@
-# Install BigDL-LLM in Docker on Windows with Intel GPU
+# Install IPEX-LLM in Docker on Windows with Intel GPU
 
-This guide demonstrates how to install BigDL-LLM in Docker on Windows with Intel GPUs. 
+This guide demonstrates how to install IPEX-LLM in Docker on Windows with Intel GPUs. 
 
 It applies to Intel Core Core 12 - 14 gen integrated GPUs (iGPUs) and Intel Arc Series GPU.
 
@@ -51,20 +51,20 @@ It applies to Intel Core Core 12 - 14 gen integrated GPUs (iGPUs) and Intel Arc
      >Note: During the use of Docker in WSL, Docker Desktop needs to be kept open all the time.
    
      
-## BigDL LLM Inference with XPU on Windows
-### 1. Prepare bigdl-llm-xpu Docker Image
+## IPEX LLM Inference with XPU on Windows
+### 1. Prepare ipex-llm-xpu Docker Image
 Run the following command in WSL:
 ```bash
-docker pull intelanalytics/bigdl-llm-xpu:2.5.0-SNAPSHOT
+docker pull intelanalytics/ipex-llm-xpu:2.5.0-SNAPSHOT
 ```
 This step will take around 20 minutes depending on your network.
 
-### 2. Start bigdl-llm-xpu Docker Container
+### 2. Start ipex-llm-xpu Docker Container
 
 To map the xpu into the container, an example (docker_setup.sh) could be:
 ```bash
 #/bin/bash
-export DOCKER_IMAGE=intelanalytics/bigdl-llm-xpu:2.5.0-SNAPSHOT
+export DOCKER_IMAGE=intelanalytics/ipex-llm-xpu:2.5.0-SNAPSHOT
 export CONTAINER_NAME=my_container
 export MODEL_PATH=/llm/models[change to your model path]
 
@@ -115,7 +115,7 @@ root@docker-desktop:/# sycl-ls
   The output is similar like this:
 ```bash
 Human: What is AI?
-BigDL-LLM:
+IPEX-LLM:
 AI, or Artificial Intelligence, refers to the development of computer systems or machines that can perform tasks that typically require human intelligence. These systems are designed to learn from data and make decisions, or take actions, based on that data.
 ``` 
 
diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/index.rst b/docs/readthedocs/source/doc/LLM/Quickstart/index.rst
index 5eb8328f..675526b6 100644
--- a/docs/readthedocs/source/doc/LLM/Quickstart/index.rst
+++ b/docs/readthedocs/source/doc/LLM/Quickstart/index.rst
@@ -1,4 +1,4 @@
-BigDL-LLM Quickstart
+IPEX-LLM Quickstart
 ================================
 
 .. note::
@@ -7,9 +7,9 @@ BigDL-LLM Quickstart
 
 This section includes efficient guide to show you how to:
 
-* `Install BigDL-LLM on Linux with Intel GPU <./install_linux_gpu.html>`_
-* `Install BigDL-LLM on Windows with Intel GPU <./install_windows_gpu.html>`_
-* `Install BigDL-LLM in Docker on Windows with Intel GPU <./docker_windows_gpu.html>`_
+* `Install IPEX-LLM on Linux with Intel GPU <./install_linux_gpu.html>`_
+* `Install IPEX-LLM on Windows with Intel GPU <./install_windows_gpu.html>`_
+* `Install IPEX-LLM in Docker on Windows with Intel GPU <./docker_windows_gpu.html>`_
 * `Use Text Generation WebUI on Windows with Intel GPU <./webui_quickstart.html>`_
-* `Conduct Performance Benchmarking with BigDL-LLM <./benchmark_quickstart.html>`_
-* `Use llama.cpp with BigDL-LLM on Intel GPU <./llama_cpp_quickstart.html>`_
+* `Conduct Performance Benchmarking with IPEX-LLM <./benchmark_quickstart.html>`_
+* `Use llama.cpp with IPEX-LLM on Intel GPU <./llama_cpp_quickstart.html>`_
diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/install_linux_gpu.md b/docs/readthedocs/source/doc/LLM/Quickstart/install_linux_gpu.md
index efdf7d10..a800f127 100644
--- a/docs/readthedocs/source/doc/LLM/Quickstart/install_linux_gpu.md
+++ b/docs/readthedocs/source/doc/LLM/Quickstart/install_linux_gpu.md
@@ -1,8 +1,8 @@
-# Install BigDL-LLM on Linux with Intel GPU
+# Install IPEX-LLM on Linux with Intel GPU
 
-This guide demonstrates how to install BigDL-LLM on Linux with Intel GPUs. It applies to Intel Data Center GPU Flex Series and Max Series, as well as Intel Arc Series GPU.
+This guide demonstrates how to install IPEX-LLM on Linux with Intel GPUs. It applies to Intel Data Center GPU Flex Series and Max Series, as well as Intel Arc Series GPU.
 
-BigDL-LLM currently supports the Ubuntu 20.04 operating system and later, and supports PyTorch 2.0 and PyTorch 2.1 on Linux. This page demonstrates BigDL-LLM with PyTorch 2.1. Check the [Installation](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#linux) page for more details.
+IPEX-LLM currently supports the Ubuntu 20.04 operating system and later, and supports PyTorch 2.0 and PyTorch 2.1 on Linux. This page demonstrates IPEX-LLM with PyTorch 2.1. Check the [Installation](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#linux) page for more details.
 
 
 ## Install Intel GPU Driver
@@ -91,14 +91,14 @@ Install the Miniconda as follows if you don't have conda installed on your machi
   > <img src="https://llm-assets.readthedocs.io/en/latest/_images/basekit.png" alt="image-20240221102252565" width=100%; />
 
 
-## Install `bigdl-llm`
+## Install `ipex-llm`
 
-* With the `llm` environment active, use `pip` to install `bigdl-llm` for GPU:
+* With the `llm` environment active, use `pip` to install `ipex-llm` for GPU:
   ```
   conda create -n llm python=3.9
   conda activate llm
 
-  pip install --pre --upgrade bigdl-llm[xpu] --extra-index-url https://developer.intel.com/ipex-whl-stable-xpu
+  pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://developer.intel.com/ipex-whl-stable-xpu
   ```
 
   > <img src="https://llm-assets.readthedocs.io/en/latest/_images/create_conda_env.png" alt="image-20240221102252564" width=100%; />
@@ -106,7 +106,7 @@ Install the Miniconda as follows if you don't have conda installed on your machi
   > <img src="https://llm-assets.readthedocs.io/en/latest/_images/create_conda_env.png" alt="image-20240221102252564" width=100%; />
 
 
-* You can verify if bigdl-llm is successfully installed by simply importing a few classes from the library. For example, execute the following import command in the terminal:
+* You can verify if ipex-llm is successfully installed by simply importing a few classes from the library. For example, execute the following import command in the terminal:
   ```bash
   source /opt/intel/oneapi/setvars.sh
 
@@ -115,7 +115,7 @@ Install the Miniconda as follows if you don't have conda installed on your machi
   > from ipex_llm.transformers import AutoModel, AutoModelForCausalLM
   ```
 
-  > <img src="https://llm-assets.readthedocs.io/en/latest/_images/verify_bigdl_import.png" alt="image-20240221102252562" width=100%; />
+  > <img src="https://llm-assets.readthedocs.io/en/latest/_images/verify_ipex_import.png" alt="image-20240221102252562" width=100%; />
 
 
 ## Runtime Configurations
@@ -157,7 +157,7 @@ Now let's play with a real LLM. We'll be using the [phi-1.5](https://huggingface
    conda activate llm
    ```
 * Step 2: If you're running on iGPU, set some environment variables by running below commands:
-  > For more details about runtime configurations, refer to [this guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration): 
+  > For more details about runtime configurations, refer to [this guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration): 
   ```bash
   # Skip this step for PIP-installed oneAPI since the environment has already been configured in LD_LIBRARY_PATH.
   source /opt/intel/oneapi/setvars.sh
@@ -175,7 +175,7 @@ Now let's play with a real LLM. We'll be using the [phi-1.5](https://huggingface
    generation_config = GenerationConfig(use_cache = True)
    
    tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-7b", trust_remote_code=True)
-   # load Model using bigdl-llm and load it to GPU
+   # load Model using ipex-llm and load it to GPU
    model = AutoModelForCausalLM.from_pretrained(
        "tiiuae/falcon-7b", load_in_4bit=True, cpu_embedding=True, trust_remote_code=True)
    model = model.to('xpu')
diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/install_windows_gpu.md b/docs/readthedocs/source/doc/LLM/Quickstart/install_windows_gpu.md
index 370422d1..7997d740 100644
--- a/docs/readthedocs/source/doc/LLM/Quickstart/install_windows_gpu.md
+++ b/docs/readthedocs/source/doc/LLM/Quickstart/install_windows_gpu.md
@@ -1,6 +1,6 @@
-# Install BigDL-LLM on Windows with Intel GPU
+# Install IPEX-LLM on Windows with Intel GPU
 
-This guide demonstrates how to install BigDL-LLM on Windows with Intel GPUs. 
+This guide demonstrates how to install IPEX-LLM on Windows with Intel GPUs. 
 
 It applies to Intel Core Ultra and Core 12 - 14 gen integrated GPUs (iGPUs), as well as Intel Arc Series GPU.
 
@@ -66,9 +66,9 @@ Activate the newly created environment `llm`:
 conda activate llm
 ```
   
-## Install `bigdl-llm`
+## Install `ipex-llm`
 
-With the `llm` environment active, use `pip` to install `bigdl-llm` for GPU:
+With the `llm` environment active, use `pip` to install `ipex-llm` for GPU:
 Choose either US or CN website for `extra-index-url`:
 
 ```eval_rst
@@ -77,23 +77,23 @@ Choose either US or CN website for `extra-index-url`:
 
       .. code-block:: cmd
 
-         pip install --pre --upgrade bigdl-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+         pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 
    .. tab:: CN
 
       .. code-block:: cmd
 
-         pip install --pre --upgrade bigdl-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
+         pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
 ```
 
 ```eval_rst
 .. note::
 
-  If you encounter network issues while installing IPEX, refer to `this guide <https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#install-bigdl-llm-from-wheel>`_ for troubleshooting advice.
+  If you encounter network issues while installing IPEX, refer to `this guide <https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#install-ipex-llm-from-wheel>`_ for troubleshooting advice.
 ```
 
 ## Verify Installation
-You can verify if `bigdl-llm` is successfully installed by simply running a few lines of code:
+You can verify if `ipex-llm` is successfully installed by simply running a few lines of code:
 
 * Step 1: Open the **Anaconda Prompt** and activate the Python environment `llm` you previously created: 
   ```cmd
@@ -123,7 +123,7 @@ You can verify if `bigdl-llm` is successfully installed by simply running a few
   ```eval_rst
   .. seealso::
 
-     For other Intel dGPU Series, please refer to `this guide <https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration>`_ for more details regarding runtime configuration.
+     For other Intel dGPU Series, please refer to `this guide <https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration>`_ for more details regarding runtime configuration.
   ```
 * Step 4: Launch the Python interactive shell by typing `python` in the Anaconda prompt window and then press Enter.
 
@@ -143,7 +143,7 @@ You can verify if `bigdl-llm` is successfully installed by simply running a few
   ```eval_rst
   .. seealso::
 
-    If you encounter any problem, please refer to `here <https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#troubleshooting>`_ for help.
+    If you encounter any problem, please refer to `here <https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#troubleshooting>`_ for help.
   ```
 * To exit the Python interactive shell, simply press Ctrl+Z then press Enter (or input `exit()` then press Enter).
 
@@ -184,17 +184,17 @@ Now let's play with a real LLM. We'll be using the [Qwen-1.8B-Chat](https://hugg
   ```eval_rst
   .. seealso::
 
-     For other Intel dGPU Series, please refer to `this guide <https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration>`_ for more details regarding runtime configuration.
+     For other Intel dGPU Series, please refer to `this guide <https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration>`_ for more details regarding runtime configuration.
   ```
 * Step 4: Install additional package required for Qwen-1.8B-Chat to conduct:
    ```cmd
    pip install tiktoken transformers_stream_generator einops
    ```
-* Step 5: Create code file. BigDL-LLM supports loading model from Hugging Face or ModelScope. Please choose according to your requirements.
+* Step 5: Create code file. IPEX-LLM supports loading model from Hugging Face or ModelScope. Please choose according to your requirements.
   ```eval_rst
   .. tabs::
      .. tab:: Hugging Face
-        Create a new file named ``demo.py`` and insert the code snippet below to run `Qwen-1.8B-Chat <https://huggingface.co/Qwen/Qwen-1_8B-Chat>`_ model with BigDL-LLM optimizations.
+        Create a new file named ``demo.py`` and insert the code snippet below to run `Qwen-1.8B-Chat <https://huggingface.co/Qwen/Qwen-1_8B-Chat>`_ model with IPEX-LLM optimizations.
   
         .. code-block:: python
   
@@ -208,7 +208,7 @@ Now let's play with a real LLM. We'll be using the [Qwen-1.8B-Chat](https://hugg
            tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-1_8B-Chat",
                                                       trust_remote_code=True)
 
-           # Load Model using bigdl-llm and load it to GPU
+           # Load Model using ipex-llm and load it to GPU
            model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-1_8B-Chat",
                                                         load_in_4bit=True,
                                                         cpu_embedding=True,
@@ -254,7 +254,7 @@ Now let's play with a real LLM. We'll be using the [Qwen-1.8B-Chat](https://hugg
 
            pip install modelscope==1.11.0
 
-        Create a new file named ``demo.py`` and insert the code snippet below to run `Qwen-1.8B-Chat <https://www.modelscope.cn/models/qwen/Qwen-1_8B-Chat/summary>`_ model with BigDL-LLM optimizations.
+        Create a new file named ``demo.py`` and insert the code snippet below to run `Qwen-1.8B-Chat <https://www.modelscope.cn/models/qwen/Qwen-1_8B-Chat/summary>`_ model with IPEX-LLM optimizations.
 
         .. code-block:: python
 
@@ -269,7 +269,7 @@ Now let's play with a real LLM. We'll be using the [Qwen-1.8B-Chat](https://hugg
            tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-1_8B-Chat",
                                                       trust_remote_code=True)
 
-           # Load Model using bigdl-llm and load it to GPU
+           # Load Model using ipex-llm and load it to GPU
            model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-1_8B-Chat",
                                                         load_in_4bit=True,
                                                         cpu_embedding=True,
diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/llama_cpp_quickstart.md b/docs/readthedocs/source/doc/LLM/Quickstart/llama_cpp_quickstart.md
index 4cc0fc21..092ac85e 100644
--- a/docs/readthedocs/source/doc/LLM/Quickstart/llama_cpp_quickstart.md
+++ b/docs/readthedocs/source/doc/LLM/Quickstart/llama_cpp_quickstart.md
@@ -1,34 +1,34 @@
-# Use llama.cpp with BigDL-LLM on Intel GPU
+# Use llama.cpp with IPEX-LLM on Intel GPU
 
-Now you can use BigDL-LLM as an Intel GPU accelerated backend of [llama.cpp](https://github.com/ggerganov/llama.cpp). This quickstart guide walks you through setting up and using [llama.cpp](https://github.com/ggerganov/llama.cpp) with `bigdl-llm` on Intel GPU (both iGPU and dGPU). 
+Now you can use IPEX-LLM as an Intel GPU accelerated backend of [llama.cpp](https://github.com/ggerganov/llama.cpp). This quickstart guide walks you through setting up and using [llama.cpp](https://github.com/ggerganov/llama.cpp) with `ipex-llm` on Intel GPU (both iGPU and dGPU). 
 
 ```eval_rst
 .. note::
 
-   ``llama.cpp`` now provides Q-series (Q4_0 / Q4_1 / Q8_0 / Q4_K / Q5_K / Q6_K /...) and IQ-series(IQ2 / IQ3 / IQ4 /...) quantization types. Only Q-series GGUF models are supported in BigDL-LLM now, support for IQ-series is still work in progress.
+   ``llama.cpp`` now provides Q-series (Q4_0 / Q4_1 / Q8_0 / Q4_K / Q5_K / Q6_K /...) and IQ-series(IQ2 / IQ3 / IQ4 /...) quantization types. Only Q-series GGUF models are supported in IPEX-LLM now, support for IQ-series is still work in progress.
 ```
 
 ## 0 Prerequisites
-BigDL-LLM's support for `llama.cpp` now is avaliable for Linux system and Windows system.
+IPEX-LLM's support for `llama.cpp` now is avaliable for Linux system and Windows system.
 
 ### Linux
 For Linux system, we recommend Ubuntu 20.04 or later (Ubuntu 22.04 is preferred).
 
-Visit the [Install BigDL-LLM on Linux with Intel GPU](https://bigdl.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html), follow [Install Intel GPU Driver](https://bigdl.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#install-intel-gpu-driver) and [Install oneAPI](https://bigdl.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#install-oneapi) to install GPU driver and [Intel® oneAPI Base Toolkit 2024.0](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html).
+Visit the [Install IPEX-LLM on Linux with Intel GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html), follow [Install Intel GPU Driver](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#install-intel-gpu-driver) and [Install oneAPI](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#install-oneapi) to install GPU driver and [Intel® oneAPI Base Toolkit 2024.0](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html).
 
 ### Windows
-Visit the [Install BigDL-LLM on Windows with Intel GPU Guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html), and follow [Install Prerequisites](https://bigdl.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html#install-prerequisites) to install [Visual Studio 2022](https://visualstudio.microsoft.com/downloads/) Community Edition, latest [GPU driver](https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html) and [Intel® oneAPI Base Toolkit 2024.0](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html).
+Visit the [Install IPEX-LLM on Windows with Intel GPU Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html), and follow [Install Prerequisites](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html#install-prerequisites) to install [Visual Studio 2022](https://visualstudio.microsoft.com/downloads/) Community Edition, latest [GPU driver](https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html) and [Intel® oneAPI Base Toolkit 2024.0](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html).
 
-## 1 Install BigDL-LLM for llama.cpp
+## 1 Install IPEX-LLM for llama.cpp
 
-To use `llama.cpp` with BigDL-LLM, first ensure that `bigdl-llm[cpp]` is installed.
+To use `llama.cpp` with IPEX-LLM, first ensure that `ipex-llm[cpp]` is installed.
 ```cmd
 conda create -n llm-cpp python=3.9
 conda activate llm-cpp
-pip install --pre --upgrade bigdl-llm[cpp]
+pip install --pre --upgrade ipex-llm[cpp]
 ```
 
-**After the installation, you should have created a conda environment, named `llm-cpp` for instance, for running `llama.cpp` commands with BigDL-LLM.**
+**After the installation, you should have created a conda environment, named `llm-cpp` for instance, for running `llama.cpp` commands with IPEX-LLM.**
 
 ## 2 Setup for running llama.cpp
 
@@ -38,9 +38,9 @@ mkdir llama-cpp
 cd llama-cpp
 ```
 
-### Initialize llama.cpp with BigDL-LLM
+### Initialize llama.cpp with IPEX-LLM
 
-Then you can use following command to initialize `llama.cpp` with BigDL-LLM:
+Then you can use following command to initialize `llama.cpp` with IPEX-LLM:
 ```eval_rst
 .. tabs::
    .. tab:: Linux
@@ -75,9 +75,9 @@ Then you can use following command to initialize `llama.cpp` with BigDL-LLM:
 
 **Now you can use these executable files by standard llama.cpp's usage.**
 
-## 3 Example: Running community GGUF models with BigDL-LLM
+## 3 Example: Running community GGUF models with IPEX-LLM
 
-Here we provide a simple example to show how to run a community GGUF model with BigDL-LLM.
+Here we provide a simple example to show how to run a community GGUF model with IPEX-LLM.
 
 ### Set Environment Variables
 Configure oneAPI variables by running the following command:
diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/webui_quickstart.md b/docs/readthedocs/source/doc/LLM/Quickstart/webui_quickstart.md
index 29d82e25..2f51ea3b 100644
--- a/docs/readthedocs/source/doc/LLM/Quickstart/webui_quickstart.md
+++ b/docs/readthedocs/source/doc/LLM/Quickstart/webui_quickstart.md
@@ -1,7 +1,7 @@
 
 # Use Text Generation WebUI on Windows with Intel GPU
 
-This quickstart guide walks you through setting up and using the [Text Generation WebUI](https://github.com/intel-analytics/text-generation-webui) (a Gradio WebUI for running Large Language Models) with `bigdl-llm`. 
+This quickstart guide walks you through setting up and using the [Text Generation WebUI](https://github.com/intel-analytics/text-generation-webui) (a Gradio WebUI for running Large Language Models) with `ipex-llm`. 
 
 
 A preview of the WebUI in action is shown below:
@@ -11,21 +11,21 @@ A preview of the WebUI in action is shown below:
 </a>
 
 
-## 1 Install BigDL-LLM
+## 1 Install IPEX-LLM
 
-To use the WebUI, first ensure that BigDL-LLM is installed. Follow the instructions on the [BigDL-LLM Installation Quickstart for Windows with Intel GPU](install_windows_gpu.html). 
+To use the WebUI, first ensure that IPEX-LLM is installed. Follow the instructions on the [IPEX-LLM Installation Quickstart for Windows with Intel GPU](install_windows_gpu.html). 
 
-**After the installation, you should have created a conda environment, named `llm` for instance, for running `bigdl-llm` applications.**
+**After the installation, you should have created a conda environment, named `llm` for instance, for running `ipex-llm` applications.**
 
 ## 2 Install the WebUI
 
 
 ### Download the WebUI
-Download the `text-generation-webui` with BigDL-LLM integrations from [this link](https://github.com/intel-analytics/text-generation-webui/archive/refs/heads/bigdl-llm.zip). Unzip the content into a directory, e.g.,`C:\text-generation-webui`. 
+Download the `text-generation-webui` with IPEX-LLM integrations from [this link](https://github.com/intel-analytics/text-generation-webui/archive/refs/heads/ipex-llm.zip). Unzip the content into a directory, e.g.,`C:\text-generation-webui`. 
   
 ### Install Dependencies
 
-Open **Anaconda Prompt** and activate the conda environment you have created in [section 1](#1-install-bigdl-llm), e.g., `llm`. 
+Open **Anaconda Prompt** and activate the conda environment you have created in [section 1](#1-install-ipex-llm), e.g., `llm`. 
 ```
 conda activate llm
 ```
@@ -43,7 +43,7 @@ Configure oneAPI variables by running the following command in **Anaconda Prompt
 ```eval_rst
 .. note::
    
-   For more details about runtime configurations, `refer to this guide <https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration>`_ 
+   For more details about runtime configurations, `refer to this guide <https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration>`_ 
 ```
 
 ```cmd
@@ -145,11 +145,11 @@ In this case, go to `Parameters` tab and then `Instruction template` tab.
 
 You can verify and edit the loaded instruction template in the `Instruction template` field.
 You can also manually select an instruction template from `Saved instruction templates` and click `load` to load it into `Instruction template`.
-You can add custom template files to this list in `/instruction-templates/` [folder](https://github.com/intel-analytics/text-generation-webui/tree/bigdl-llm/instruction-templates).
+You can add custom template files to this list in `/instruction-templates/` [folder](https://github.com/intel-analytics/text-generation-webui/tree/ipex-llm/instruction-templates).
 <!-- For instance, the automatically loaded instruction template for `chatGLM3` model is incorrect, and you should manually select the `chatGLM3` instruction template. -->
 
 ### Tested models
-We have tested the following models with `bigdl-llm` using Text Generation WebUI.
+We have tested the following models with `ipex-llm` using Text Generation WebUI.
 
 | Model | Notes |
 |-------|-------|
diff --git a/docs/readthedocs/source/doc/LLM/index.rst b/docs/readthedocs/source/doc/LLM/index.rst
index e13cb0aa..26fa9f22 100644
--- a/docs/readthedocs/source/doc/LLM/index.rst
+++ b/docs/readthedocs/source/doc/LLM/index.rst
@@ -1,10 +1,10 @@
-BigDL-LLM
+IPEX-LLM
 =========================
 
 .. raw:: html
 
    <p>
-      <a href="https://github.com/intel-analytics/BigDL/tree/main/python/llm"><code><span>bigdl-llm</span></code></a> is a library for running <strong>LLM</strong> (large language model) on Intel <strong>XPU</strong> (from <em>Laptop</em> to <em>GPU</em> to <em>Cloud</em>) using <strong>INT4</strong> with very low latency <sup><a href="#footnote-perf" id="ref-perf">[1]</a></sup> (for any <strong>PyTorch</strong> model).
+      <a href="https://github.com/intel-analytics/ipex-llm/tree/main/python/llm"><code><span>ipex-llm</span></code></a> is a library for running <strong>LLM</strong> (large language model) on Intel <strong>XPU</strong> (from <em>Laptop</em> to <em>GPU</em> to <em>Cloud</em>) using <strong>INT4</strong> with very low latency <sup><a href="#footnote-perf" id="ref-perf">[1]</a></sup> (for any <strong>PyTorch</strong> model).
    </p>
 
 -------
@@ -17,10 +17,10 @@ BigDL-LLM
         **Get Started**
         ^^^
 
-        Documents in these sections helps you getting started quickly with BigDL-LLM.
+        Documents in these sections helps you getting started quickly with IPEX-LLM.
 
         +++
-        :bdg-link:`BigDL-LLM in 5 minutes <./Overview/llm.html>` |
+        :bdg-link:`IPEX-LLM in 5 minutes <./Overview/llm.html>` |
         :bdg-link:`Installation <./Overview/install.html>`
 
     .. grid-item-card::
@@ -28,7 +28,7 @@ BigDL-LLM
         **Key Features Guide**
         ^^^
 
-        Each guide in this section provides you with in-depth information, concepts and knowledges about BigDL-LLM key features.
+        Each guide in this section provides you with in-depth information, concepts and knowledges about IPEX-LLM key features.
 
         +++
 
@@ -42,7 +42,7 @@ BigDL-LLM
         **Examples & Tutorials**
         ^^^
 
-        Examples contain scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community.
+        Examples contain scripts to help you quickly get started using IPEX-LLM to run some popular open-source models in the community.
 
         +++
 
@@ -53,7 +53,7 @@ BigDL-LLM
         **API Document**
         ^^^
 
-        API Document provides detailed description of BigDL-LLM APIs.
+        API Document provides detailed description of IPEX-LLM APIs.
 
         +++
 
@@ -66,7 +66,7 @@ BigDL-LLM
     <div>
         <p>
             <sup><a href="#ref-perf" id="footnote-perf">[1]</a>
-               Performance varies by use, configuration and other factors. <code><span>bigdl-llm</span></code> may not optimize to the same degree for non-Intel products. Learn more at <a href="https://www.Intel.com/PerformanceIndex">www.Intel.com/PerformanceIndex</a>.
+               Performance varies by use, configuration and other factors. <code><span>ipex-llm</span></code> may not optimize to the same degree for non-Intel products. Learn more at <a href="https://www.Intel.com/PerformanceIndex">www.Intel.com/PerformanceIndex</a>.
             </sup>
         </p>
     </div>
@@ -74,4 +74,4 @@ BigDL-LLM
 ..  toctree::
     :hidden:
 
-    BigDL-LLM Document <self>
+    IPEX-LLM Document <self>
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/OpenVINO/accelerate_inference_openvino_gpu.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/OpenVINO/accelerate_inference_openvino_gpu.nblink
deleted file mode 100644
index 42e9dd99..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/OpenVINO/accelerate_inference_openvino_gpu.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/openvino/accelerate_inference_openvino_gpu.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/OpenVINO/index.rst b/docs/readthedocs/source/doc/Nano/Howto/Inference/OpenVINO/index.rst
deleted file mode 100644
index 1111bd72..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/OpenVINO/index.rst
+++ /dev/null
@@ -1,6 +0,0 @@
-Inference Optimization: For OpenVINO Users
-=============================================
-
-* `How to run inference on OpenVINO model <openvino_inference.html>`_
-* `How to run asynchronous inference on OpenVINO model <openvino_inference_async.html>`_
-* `How to accelerate a PyTorch / TensorFlow inference pipeline on Intel GPUs through OpenVINO <accelerate_inference_openvino_gpu.html>`_
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/OpenVINO/openvino_inference.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/OpenVINO/openvino_inference.nblink
deleted file mode 100644
index c3d9c786..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/OpenVINO/openvino_inference.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/openvino/openvino_inference.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/OpenVINO/openvino_inference_async.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/OpenVINO/openvino_inference_async.nblink
deleted file mode 100644
index 883bb4f8..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/OpenVINO/openvino_inference_async.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/openvino/openvino_inference_async.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_async_pipeline.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_async_pipeline.nblink
deleted file mode 100644
index 4dd6ac54..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_async_pipeline.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/accelerate_pytorch_inference_async_pipeline.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_gpu.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_gpu.nblink
deleted file mode 100644
index 9c152362..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_gpu.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/accelerate_pytorch_inference_gpu.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_jit_ipex.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_jit_ipex.nblink
deleted file mode 100644
index a242e2c4..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_jit_ipex.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/accelerate_pytorch_inference_jit_ipex.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_onnx.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_onnx.nblink
deleted file mode 100644
index 58811988..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_onnx.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/accelerate_pytorch_inference_onnx.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_openvino.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_openvino.nblink
deleted file mode 100644
index 3d8752d1..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_openvino.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/accelerate_pytorch_inference_openvino.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/index.rst b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/index.rst
deleted file mode 100644
index 3de72aa1..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/index.rst
+++ /dev/null
@@ -1,17 +0,0 @@
-Inference Optimization: For PyTorch Users
-=============================================
-
-* `How to find accelerated method with minimal latency using InferenceOptimizer <inference_optimizer_optimize.html>`_
-* `How to accelerate a PyTorch inference pipeline through ONNXRuntime <accelerate_pytorch_inference_onnx.html>`_
-* `How to accelerate a PyTorch inference pipeline through OpenVINO <accelerate_pytorch_inference_openvino.html>`_
-* `How to accelerate a PyTorch inference pipeline through JIT/IPEX <accelerate_pytorch_inference_jit_ipex.html>`_
-* `How to quantize your PyTorch model in INT8 for inference using Intel Neural Compressor <quantize_pytorch_inference_inc.html>`_
-* `How to quantize your PyTorch model in INT8 for inference using OpenVINO Post-training Optimization Tools <quantize_pytorch_inference_pot.html>`_
-* `How to enable automatic context management for PyTorch inference on Nano optimized models <pytorch_context_manager.html>`_
-* `How to save and load optimized ONNXRuntime model <pytorch_save_and_load_onnx.html>`_
-* `How to save and load optimized OpenVINO model <pytorch_save_and_load_openvino.html>`_
-* `How to save and load optimized JIT model <pytorch_save_and_load_jit.html>`_
-* `How to save and load optimized IPEX model <pytorch_save_and_load_ipex.html>`_
-* `How to accelerate a PyTorch inference pipeline through multiple instances <multi_instance_pytorch_inference.html>`_
-* `How to accelerate a PyTorch inference pipeline using Intel ARC series dGPU <accelerate_pytorch_inference_gpu.html>`_
-* `How to accelerate PyTorch inference using async multi-stage pipeline <accelerate_pytorch_inference_async_pipeline.html>`_
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/inference_optimizer_optimize.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/inference_optimizer_optimize.nblink
deleted file mode 100644
index 46ff598e..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/inference_optimizer_optimize.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/inference_optimizer_optimize.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference.nblink
deleted file mode 100644
index 878428b7..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/multi_instance_pytorch_inference.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/pytorch_context_manager.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/pytorch_context_manager.nblink
deleted file mode 100644
index 57788ed6..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/pytorch_context_manager.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/pytorch_context_manager.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_ipex.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_ipex.nblink
deleted file mode 100644
index 7f88781a..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_ipex.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/pytorch_save_and_load_ipex.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_jit.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_jit.nblink
deleted file mode 100644
index 0e47915a..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_jit.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/pytorch_save_and_load_jit.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_onnx.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_onnx.nblink
deleted file mode 100644
index 1777bb80..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_onnx.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/pytorch_save_and_load_onnx.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_openvino.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_openvino.nblink
deleted file mode 100644
index 9b8b481d..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_openvino.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/pytorch_save_and_load_openvino.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_inc.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_inc.nblink
deleted file mode 100644
index aa657bce..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_inc.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/quantize_pytorch_inference_inc.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_pot.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_pot.nblink
deleted file mode 100644
index f56e834f..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_pot.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/quantize_pytorch_inference_pot.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/TensorFlow/accelerate_tensorflow_inference_onnx.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/TensorFlow/accelerate_tensorflow_inference_onnx.nblink
deleted file mode 100644
index 443efe0d..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/TensorFlow/accelerate_tensorflow_inference_onnx.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/tensorflow/accelerate_tensorflow_inference_onnx.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/TensorFlow/accelerate_tensorflow_inference_openvino.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/TensorFlow/accelerate_tensorflow_inference_openvino.nblink
deleted file mode 100644
index 275ffe1b..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/TensorFlow/accelerate_tensorflow_inference_openvino.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/tensorflow/accelerate_tensorflow_inference_openvino.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/TensorFlow/index.rst b/docs/readthedocs/source/doc/Nano/Howto/Inference/TensorFlow/index.rst
deleted file mode 100644
index 7a4cf3d5..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/TensorFlow/index.rst
+++ /dev/null
@@ -1,8 +0,0 @@
-Inference Optimization: For TensorFlow Users
-=============================================
-
-* `How to accelerate a TensorFlow inference pipeline through ONNXRuntime <accelerate_tensorflow_inference_onnx.html>`_
-* `How to accelerate a TensorFlow inference pipeline through OpenVINO <accelerate_tensorflow_inference_openvino.html>`_
-* `How to conduct BFloat16 Mixed Precision inference in a TensorFlow Keras application <tensorflow_inference_bf16.html>`_
-* `How to save and load optimized ONNXRuntime model in TensorFlow <tensorflow_save_and_load_onnx.html>`_
-* `How to save and load optimized OpenVINO model in TensorFlow <tensorflow_save_and_load_openvino.html>`_
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/TensorFlow/tensorflow_inference_bf16.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/TensorFlow/tensorflow_inference_bf16.nblink
deleted file mode 100644
index fa8a1ae6..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/TensorFlow/tensorflow_inference_bf16.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/tensorflow/tensorflow_inference_bf16.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/TensorFlow/tensorflow_save_and_load_onnx.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/TensorFlow/tensorflow_save_and_load_onnx.nblink
deleted file mode 100644
index e5f7702d..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/TensorFlow/tensorflow_save_and_load_onnx.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/tensorflow/tensorflow_save_and_load_onnx.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/TensorFlow/tensorflow_save_and_load_openvino.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/TensorFlow/tensorflow_save_and_load_openvino.nblink
deleted file mode 100644
index 7cc4c0d9..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/TensorFlow/tensorflow_save_and_load_openvino.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/tensorflow/tensorflow_save_and_load_openvino.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/index.rst b/docs/readthedocs/source/doc/Nano/Howto/Inference/index.rst
deleted file mode 100644
index cd5865fb..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/index.rst
+++ /dev/null
@@ -1,33 +0,0 @@
-Inference Optimization
-=========================
-
-Here you could find detailed guides on how to apply BigDL-Nano to optimize your inference workloads. Select your desired use case below for further navigation:
-
-.. grid:: 1 2 2 2
-
-    .. grid-item::
-
-        .. button-link:: OpenVINO/index.html
-            :color: primary
-            :expand:
-            :outline:
-
-            I use **OpenVINO** toolkit.
-
-    .. grid-item::
-
-        .. button-link:: PyTorch/index.html
-            :color: primary
-            :expand:
-            :outline:
-
-            I am a **PyTorch** user.
-
-    .. grid-item::
-
-        .. button-link:: TensorFlow/index.html
-            :color: primary
-            :expand:
-            :outline:
-
-            I am a **TensorFlow** user.
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Install/index.rst b/docs/readthedocs/source/doc/Nano/Howto/Install/index.rst
deleted file mode 100644
index d326cd91..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Install/index.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-Install
-=========================
-
-Here you could find detailed guides on how to install BigDL-Nano for different use cases:
-
-* `How to install BigDL-Nano in Google Colab <install_in_colab.html>`_
-* `How to install BigDL-Nano on Windows <windows_guide.html>`_
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Install/install_in_colab.md b/docs/readthedocs/source/doc/Nano/Howto/Install/install_in_colab.md
deleted file mode 100644
index caf59cbd..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Install/install_in_colab.md
+++ /dev/null
@@ -1,84 +0,0 @@
-# Install BigDL-Nano in Google Colab
-
-```eval_rst
-.. note::
-    This page is still a work in progress.
-```
-
-In this guide, we will show you how to install BigDL-Nano in Google Colab, and the solutions to possible version conflicts caused by pre-installed packages in Colab hosted runtime.
-
-Please select the corresponding section to follow for your specific usage. 
-
-## PyTorch
-For PyTorch users, you need to install BigDL-Nano for PyTorch first:
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Latest
-
-        .. code-block:: python
-
-            !pip install bigdl-nano[pytorch]
-
-    .. tab:: Nightly-Built
-
-        .. code-block:: python
-
-            !pip install --pre --upgrade bigdl-nano[pytorch]
-```
-
-```eval_rst
-.. warning::
-    For Google Colab hosted runtime, ``source bigdl-nano-init`` is hardly to take effect as environment variables need to be set before jupyter kernel is started.
-```
-
-To avoid version conflicts caused by `torchtext`, you should uninstall it:
-
-```python
-!pip uninstall -y torchtext
-```
-
-### ONNXRuntime
-To enable ONNXRuntime acceleration, you need to install corresponding onnx packages:
-
-```python
-!pip install onnx onnxruntime
-```
-
-### OpenVINO / Post-training Optimization Tools (POT)
-To enable OpenVINO acceleration, or use POT for quantization, you need to install the OpenVINO toolkit:
-
-```python
-!pip install openvino-dev
-# Please remember to restart runtime to use packages with newly-installed version
-```
-
-```eval_rst
-.. note::
-    If you meet ``ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject`` when using ``InferenceOptimizer.trace`` or ``InferenceOptimizer.quantize`` function, you could try to solve it by upgrading ``numpy`` through:
-    
-    .. code-block:: python
-
-            !pip install --upgrade numpy
-            # Please remember to restart runtime to use numpy with newly-installed version
-```
-
-### Intel Neural Compressor (INC)
-To use INC as your quantization backend, you need to install it:
-
-```eval_rst
-.. tabs::
-
-    .. tab:: With no Extra Runtime Acceleration
-
-        .. code-block:: python
-
-            !pip install neural-compressor==1.11.0
-
-    .. tab:: With Extra ONNXRuntime Acceleration
-
-        .. code-block:: python
-
-            !pip install neural-compressor==1.11.0 onnx onnxruntime onnxruntime_extensions
-```
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Install/windows_guide.md b/docs/readthedocs/source/doc/Nano/Howto/Install/windows_guide.md
deleted file mode 100644
index 9837e3b5..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Install/windows_guide.md
+++ /dev/null
@@ -1,37 +0,0 @@
-# Install BigDL-Nano on Windows
-
-## Step 1: Install WSL2
-
-
-Follow [BigDL Windows User guide](../../../UserGuide/win.md) to install WSL2.
-
-
-## Step 2: Install conda in WSL2
-
-It is highly recommended to use conda to manage the python environment for BigDL-Nano. Follow [BigDL Windows User Guide/Conda Install](../../../UserGuide/win.md#install-conda) to install conda.
-
-## Step 3: Create a BigDL-Nano env
-
-Use conda to create a new environment. For example, use `bigdl-nano` as the new environment name:
-
-```bash
-conda create -n bigdl-nano
-conda activate bigdl-nano
-```
-
-
-## Step 4: Install BigDL-Nano from Pypi
-
-You can install BigDL-Nano from Pypi with `pip`. Specifically, for PyTorch extensions, please run:
-
-```
-pip install bigdl-nano[pytorch]
-source bigdl-nano-init
-```
-
-For Tensorflow:
-
-```
-pip install bigdl-nano[tensorflow]
-source bigdl-nano-init
-```
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Preprocessing/PyTorch/accelerate_pytorch_cv_data_pipeline.nblink b/docs/readthedocs/source/doc/Nano/Howto/Preprocessing/PyTorch/accelerate_pytorch_cv_data_pipeline.nblink
deleted file mode 100644
index ea9e7529..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Preprocessing/PyTorch/accelerate_pytorch_cv_data_pipeline.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/preprocessing/pytorch/accelerate_pytorch_cv_data_pipeline.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Preprocessing/PyTorch/index.rst b/docs/readthedocs/source/doc/Nano/Howto/Preprocessing/PyTorch/index.rst
deleted file mode 100644
index a0a6e53d..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Preprocessing/PyTorch/index.rst
+++ /dev/null
@@ -1,4 +0,0 @@
-Preprocessing Optimization: For PyTorch Users
-==============================================
-
-* `How to accelerate a computer vision data processing pipeline <accelerate_pytorch_cv_data_pipeline.html>`_
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Preprocessing/index.rst b/docs/readthedocs/source/doc/Nano/Howto/Preprocessing/index.rst
deleted file mode 100644
index caf9ff03..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Preprocessing/index.rst
+++ /dev/null
@@ -1,15 +0,0 @@
-Preprocessing Optimization
-===========================
-
-Here you could find detailed guides on how to apply BigDL-Nano to accelerate your data preprocess pipeline. Select your desired use case below for further navigation:
-
-.. grid:: 1 2 2 2
-
-    .. grid-item::
-
-        .. button-link:: PyTorch/index.html
-            :color: primary
-            :expand:
-            :outline:
-
-            I am a **PyTorch** user.
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Training/General/choose_num_processes_training.md b/docs/readthedocs/source/doc/Nano/Howto/Training/General/choose_num_processes_training.md
deleted file mode 100644
index 8a979551..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Training/General/choose_num_processes_training.md
+++ /dev/null
@@ -1,42 +0,0 @@
-# Choose the Number of Processes for Multi-Instance Training
-
-BigDL-Nano supports multi-instance training on a server with multiple CPU cores or sockets. With Nano, you could launch a self-defined number of processes to perform data-parallel training. When choosing the number of processes, there are 3 empirical recommendations for better training performance:
-
-1. There should be at least 7 CPU cores assigned to each process.
-2. For multiple sockets, the CPU cores assiged to each process should belong to the same socket (due to NUMA issue). That is, the number of CPU cores per process should be a divisor of the number of CPU cores placed in each sockets.
-3. Only physical CPU cores should be considered (do not count in CPU cores for hyperthreading).
-
-```eval_rst
-.. note:: 
-    By default, Nano will distribute CPU cores evenly among processes.
-```
-
-Here is an example. Suppose we have a sever with 2 sockets. Each socket has 28 physical CPU cores. For this case, the number of CPU cores per process c should satisfiy:
-
-```eval_rst
-.. math::
-    \begin{cases}
-    c \text{ is divisor of } 28 \\
-    c \ge 7 \\
-    \end{cases} \Rightarrow 
-    c \in \{7, 14, 28\}
-``` 
-
-Based on that, the number of processes np can be calculated as:
-
-```eval_rst
-.. math::
-    \begin{cases}
-    np = \frac{28+28}{c}\ , c \in \{7, 14, 28\} \\
-    np > 1 \\
-    \end{cases} \Rightarrow np = \text{8 or 4 or 2}
-``` 
-
-That is, empirically, we could set the number of processes to 2, 4 or 8 here for good training performance.
-
-```eval_rst
-.. seealso::
-
-    * `How to accelerate a PyTorch Lightning application on training workloads through multiple instances <../PyTorchLightning/accelerate_pytorch_lightning_training_multi_instance.html>`_
-    * `How to accelerate a TensorFlow Keras application on training workloads through multiple instances <../TensorFlow/accelerate_tensorflow_training_multi_instance.html>`_
-```
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Training/General/index.rst b/docs/readthedocs/source/doc/Nano/Howto/Training/General/index.rst
deleted file mode 100644
index 39cf6c5a..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Training/General/index.rst
+++ /dev/null
@@ -1,4 +0,0 @@
-Training Optimization: General Tips
-====================================
-
-* `How to choose the number of processes for multi-instance training <choose_num_processes_training.html>`_
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorch/accelerate_pytorch_training_bf16.nblink b/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorch/accelerate_pytorch_training_bf16.nblink
deleted file mode 100644
index 38883226..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorch/accelerate_pytorch_training_bf16.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/training/pytorch/accelerate_pytorch_training_bf16.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorch/accelerate_pytorch_training_ipex.nblink b/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorch/accelerate_pytorch_training_ipex.nblink
deleted file mode 100644
index c27ec972..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorch/accelerate_pytorch_training_ipex.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/training/pytorch/accelerate_pytorch_training_ipex.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorch/accelerate_pytorch_training_multi_instance.nblink b/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorch/accelerate_pytorch_training_multi_instance.nblink
deleted file mode 100644
index f5ad8c48..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorch/accelerate_pytorch_training_multi_instance.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/training/pytorch/accelerate_pytorch_training_multi_instance.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorch/convert_pytorch_training_torchnano.nblink b/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorch/convert_pytorch_training_torchnano.nblink
deleted file mode 100644
index c78f3c3f..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorch/convert_pytorch_training_torchnano.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/training/pytorch/convert_pytorch_training_torchnano.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorch/index.rst b/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorch/index.rst
deleted file mode 100644
index da2b7b5d..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorch/index.rst
+++ /dev/null
@@ -1,14 +0,0 @@
-Training Optimization: For PyTorch Users
-=========================================
-
-* |convert_pytorch_training_torchnano|_
-* |use_nano_decorator_pytorch_training|_
-* `How to accelerate a PyTorch application on training workloads through Intel® Extension for PyTorch* <accelerate_pytorch_training_ipex.html>`_
-* `How to accelerate a PyTorch application on training workloads through multiple instances <accelerate_pytorch_training_multi_instance.html>`_
-* `How to use the channels last memory format in your PyTorch application for training <pytorch_training_channels_last.html>`_
-* `How to conduct BFloat16 Mixed Precision training in your PyTorch application <accelerate_pytorch_training_bf16.html>`_
-
-.. |use_nano_decorator_pytorch_training| replace:: How to accelerate your PyTorch training loop with ``@nano`` decorator
-.. _use_nano_decorator_pytorch_training: use_nano_decorator_pytorch_training.html
-.. |convert_pytorch_training_torchnano| replace:: How to convert your PyTorch training loop to use ``TorchNano`` for acceleration
-.. _convert_pytorch_training_torchnano: convert_pytorch_training_torchnano.html
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorch/pytorch_training_channels_last.nblink b/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorch/pytorch_training_channels_last.nblink
deleted file mode 100644
index 271e0fbf..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorch/pytorch_training_channels_last.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/training/pytorch/pytorch_training_channels_last.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorch/use_nano_decorator_pytorch_training.nblink b/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorch/use_nano_decorator_pytorch_training.nblink
deleted file mode 100644
index a77da85e..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorch/use_nano_decorator_pytorch_training.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/training/pytorch/use_nano_decorator_pytorch_training.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorchLightning/accelerate_pytorch_lightning_training_ipex.nblink b/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorchLightning/accelerate_pytorch_lightning_training_ipex.nblink
deleted file mode 100644
index 867b5524..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorchLightning/accelerate_pytorch_lightning_training_ipex.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/training/pytorch-lightning/accelerate_pytorch_lightning_training_ipex.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorchLightning/accelerate_pytorch_lightning_training_multi_instance.nblink b/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorchLightning/accelerate_pytorch_lightning_training_multi_instance.nblink
deleted file mode 100644
index d8e6e1ff..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorchLightning/accelerate_pytorch_lightning_training_multi_instance.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/training/pytorch-lightning/accelerate_pytorch_lightning_training_multi_instance.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorchLightning/index.rst b/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorchLightning/index.rst
deleted file mode 100644
index 23824489..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorchLightning/index.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-Training Optimization: For PyTorch Lightning Users
-===================================================
-
-* `How to accelerate a PyTorch Lightning application on training workloads through Intel® Extension for PyTorch* <accelerate_pytorch_lightning_training_ipex.html>`_
-* `How to accelerate a PyTorch Lightning application on training workloads through multiple instances <accelerate_pytorch_lightning_training_multi_instance.html>`_
-* `How to use the channels last memory format in your PyTorch Lightning application for training <pytorch_lightning_training_channels_last.html>`_
-* `How to conduct BFloat16 Mixed Precision training in your PyTorch Lightning application <pytorch_lightning_training_bf16.html>`_
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorchLightning/pytorch_lightning_training_bf16.nblink b/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorchLightning/pytorch_lightning_training_bf16.nblink
deleted file mode 100644
index 8db7c389..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorchLightning/pytorch_lightning_training_bf16.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/training/pytorch-lightning/pytorch_lightning_training_bf16.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorchLightning/pytorch_lightning_training_channels_last.nblink b/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorchLightning/pytorch_lightning_training_channels_last.nblink
deleted file mode 100644
index 7e5b4448..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorchLightning/pytorch_lightning_training_channels_last.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/training/pytorch-lightning/pytorch_lightning_training_channels_last.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Training/TensorFlow/accelerate_tensorflow_training_multi_instance.nblink b/docs/readthedocs/source/doc/Nano/Howto/Training/TensorFlow/accelerate_tensorflow_training_multi_instance.nblink
deleted file mode 100644
index 3a5a783e..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Training/TensorFlow/accelerate_tensorflow_training_multi_instance.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/training/tensorflow/accelerate_tensorflow_training_multi_instance.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Training/TensorFlow/index.rst b/docs/readthedocs/source/doc/Nano/Howto/Training/TensorFlow/index.rst
deleted file mode 100644
index f5b4be21..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Training/TensorFlow/index.rst
+++ /dev/null
@@ -1,10 +0,0 @@
-Training Optimization: For TensorFlow Users
-============================================
-
-* `How to accelerate a TensorFlow Keras application on training workloads through multiple instances <accelerate_tensorflow_training_multi_instance.html>`_
-* |tensorflow_training_embedding_sparseadam_link|_
-* `How to conduct BFloat16 Mixed Precision training in your TensorFlow application <tensorflow_training_bf16.html>`_
-* `How to accelerate TensorFlow Keras customized training loop through multiple instances <tensorflow_custom_training_multi_instance.html>`_
-
-.. |tensorflow_training_embedding_sparseadam_link| replace:: How to optimize your model with a sparse ``Embedding`` layer and ``SparseAdam`` optimizer
-.. _tensorflow_training_embedding_sparseadam_link: tensorflow_training_embedding_sparseadam.html
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Training/TensorFlow/tensorflow_custom_training_multi_instance.nblink b/docs/readthedocs/source/doc/Nano/Howto/Training/TensorFlow/tensorflow_custom_training_multi_instance.nblink
deleted file mode 100644
index 70153501..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Training/TensorFlow/tensorflow_custom_training_multi_instance.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/training/tensorflow/tensorflow_custom_training_multi_instance.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Training/TensorFlow/tensorflow_training_bf16.nblink b/docs/readthedocs/source/doc/Nano/Howto/Training/TensorFlow/tensorflow_training_bf16.nblink
deleted file mode 100644
index f1f97e74..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Training/TensorFlow/tensorflow_training_bf16.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/training/tensorflow/tensorflow_training_bf16.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Training/TensorFlow/tensorflow_training_embedding_sparseadam.nblink b/docs/readthedocs/source/doc/Nano/Howto/Training/TensorFlow/tensorflow_training_embedding_sparseadam.nblink
deleted file mode 100644
index 3f3c663f..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Training/TensorFlow/tensorflow_training_embedding_sparseadam.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../../../python/nano/tutorial/notebook/training/tensorflow/tensorflow_training_embedding_sparseadam.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Training/index.rst b/docs/readthedocs/source/doc/Nano/Howto/Training/index.rst
deleted file mode 100644
index b0720e3d..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/Training/index.rst
+++ /dev/null
@@ -1,42 +0,0 @@
-Training Optimization
-=========================
-
-Here you could find detailed guides on how to apply BigDL-Nano to optimize your training workloads. Select your desired use case below for further navigation:
-
-.. grid:: 1 2 2 2
-
-    .. grid-item::
-
-        .. button-link:: PyTorchLightning/index.html
-            :color: primary
-            :expand:
-            :outline:
-
-            I am a **PyTorch Lightning** user.
-
-    .. grid-item::
-
-        .. button-link:: PyTorch/index.html
-            :color: primary
-            :expand:
-            :outline:
-
-            I am a **PyTorch** user.
-
-    .. grid-item::
-
-        .. button-link:: TensorFlow/index.html
-            :color: primary
-            :expand:
-            :outline:
-
-            I am a **TensorFlow** user.
-
-    .. grid-item::
-
-        .. button-link:: General/index.html
-            :color: primary
-            :expand:
-            :outline:
-
-            I want to know general optimization tips.
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/index.rst b/docs/readthedocs/source/doc/Nano/Howto/index.rst
deleted file mode 100644
index a33f128e..00000000
--- a/docs/readthedocs/source/doc/Nano/Howto/index.rst
+++ /dev/null
@@ -1,93 +0,0 @@
-Nano How-to Guides
-=========================
-.. note::
-    This page is still a work in progress. We are adding more guides.
-
-In Nano How-to Guides, you could expect to find multiple task-oriented, bite-sized, and executable examples. These examples will show you various tasks that BigDL-Nano could help you accomplish smoothly.
-
-Preprocessing Optimization
----------------------------
-
-PyTorch
-~~~~~~~~~~~~~~~~~~~~~~~~~
-* `How to accelerate a computer vision data processing pipeline <Preprocessing/PyTorch/accelerate_pytorch_cv_data_pipeline.html>`_
-
-
-Training Optimization
--------------------------
-
-PyTorch Lightning
-~~~~~~~~~~~~~~~~~~~~~~~~~
-* `How to accelerate a PyTorch Lightning application on training workloads through Intel® Extension for PyTorch* <Training/PyTorchLightning/accelerate_pytorch_lightning_training_ipex.html>`_
-* `How to accelerate a PyTorch Lightning application on training workloads through multiple instances <Training/PyTorchLightning/accelerate_pytorch_lightning_training_multi_instance.html>`_
-* `How to use the channels last memory format in your PyTorch Lightning application for training <Training/PyTorchLightning/pytorch_lightning_training_channels_last.html>`_
-* `How to conduct BFloat16 Mixed Precision training in your PyTorch Lightning application <Training/PyTorchLightning/pytorch_lightning_training_bf16.html>`_
-
-PyTorch
-~~~~~~~~~~~~~~~~~~~~~~~~~
-* |convert_pytorch_training_torchnano|_
-* |use_nano_decorator_pytorch_training|_
-* `How to accelerate a PyTorch application on training workloads through Intel® Extension for PyTorch* <Training/PyTorch/accelerate_pytorch_training_ipex.html>`_
-* `How to accelerate a PyTorch application on training workloads through multiple instances <Training/PyTorch/accelerate_pytorch_training_multi_instance.html>`_
-* `How to use the channels last memory format in your PyTorch application for training <Training/PyTorch/pytorch_training_channels_last.html>`_
-* `How to conduct BFloat16 Mixed Precision training in your PyTorch application <Training/PyTorch/accelerate_pytorch_training_bf16.html>`_
-
-.. |use_nano_decorator_pytorch_training| replace:: How to accelerate your PyTorch training loop with ``@nano`` decorator
-.. _use_nano_decorator_pytorch_training: Training/PyTorch/use_nano_decorator_pytorch_training.html
-.. |convert_pytorch_training_torchnano| replace:: How to convert your PyTorch training loop to use ``TorchNano`` for acceleration
-.. _convert_pytorch_training_torchnano: Training/PyTorch/convert_pytorch_training_torchnano.html
-
-TensorFlow
-~~~~~~~~~~~~~~~~~~~~~~~~~
-* `How to accelerate a TensorFlow Keras application on training workloads through multiple instances <Training/TensorFlow/accelerate_tensorflow_training_multi_instance.html>`_
-* |tensorflow_training_embedding_sparseadam_link|_
-* `How to conduct BFloat16 Mixed Precision training in your TensorFlow Keras application <Training/TensorFlow/tensorflow_training_bf16.html>`_
-* `How to accelerate TensorFlow Keras customized training loop through multiple instances <Training/TensorFlow/tensorflow_custom_training_multi_instance.html>`_
-
-.. |tensorflow_training_embedding_sparseadam_link| replace:: How to optimize your model with a sparse ``Embedding`` layer and ``SparseAdam`` optimizer
-.. _tensorflow_training_embedding_sparseadam_link: Training/TensorFlow/tensorflow_training_embedding_sparseadam.html
-
-General
-~~~~~~~~~~~~~~~~~~~~~~~~~
-* `How to choose the number of processes for multi-instance training <Training/General/choose_num_processes_training.html>`_
-
-Inference Optimization
--------------------------
-
-OpenVINO
-~~~~~~~~~~~~~~~~~~~~~~~~~
-
-* `How to run inference on OpenVINO model <Inference/OpenVINO/openvino_inference.html>`_
-* `How to run asynchronous inference on OpenVINO model <Inference/OpenVINO/openvino_inference_async.html>`_
-* `How to accelerate a PyTorch / TensorFlow inference pipeline on Intel GPUs through OpenVINO <Inference/OpenVINO/accelerate_inference_openvino_gpu.html>`_
-
-PyTorch
-~~~~~~~~~~~~~~~~~~~~~~~~~
-
-* `How to find accelerated method with minimal latency using InferenceOptimizer <Inference/PyTorch/inference_optimizer_optimize.html>`_
-* `How to accelerate a PyTorch inference pipeline through ONNXRuntime <Inference/PyTorch/accelerate_pytorch_inference_onnx.html>`_
-* `How to accelerate a PyTorch inference pipeline through OpenVINO <Inference/PyTorch/accelerate_pytorch_inference_openvino.html>`_
-* `How to accelerate a PyTorch inference pipeline through JIT/IPEX <Inference/PyTorch/accelerate_pytorch_inference_jit_ipex.html>`_
-* `How to quantize your PyTorch model in INT8 for inference using Intel Neural Compressor <Inference/PyTorch/quantize_pytorch_inference_inc.html>`_
-* `How to quantize your PyTorch model in INT8 for inference using OpenVINO Post-training Optimization Tools <Inference/PyTorch/quantize_pytorch_inference_pot.html>`_
-* `How to enable automatic context management for PyTorch inference on Nano optimized models <Inference/PyTorch/pytorch_context_manager.html>`_
-* `How to save and load optimized ONNXRuntime model <Inference/PyTorch/pytorch_save_and_load_onnx.html>`_
-* `How to save and load optimized OpenVINO model <Inference/PyTorch/pytorch_save_and_load_openvino.html>`_
-* `How to save and load optimized JIT model <Inference/PyTorch/pytorch_save_and_load_jit.html>`_
-* `How to save and load optimized IPEX model <Inference/PyTorch/pytorch_save_and_load_ipex.html>`_
-* `How to accelerate a PyTorch inference pipeline through multiple instances <Inference/PyTorch/multi_instance_pytorch_inference.html>`_
-* `How to accelerate a PyTorch inference pipeline using Intel ARC series dGPU <Inference/PyTorch/accelerate_pytorch_inference_gpu.html>`_
-* `How to accelerate PyTorch inference using async multi-stage pipeline <Inference/PyTorch/accelerate_pytorch_inference_async_pipeline.html>`_
-
-TensorFlow
-~~~~~~~~~~~~~~~~~~~~~~~~~
-* `How to accelerate a TensorFlow inference pipeline through ONNXRuntime <Inference/TensorFlow/accelerate_tensorflow_inference_onnx.html>`_
-* `How to accelerate a TensorFlow inference pipeline through OpenVINO <Inference/TensorFlow/accelerate_tensorflow_inference_openvino.html>`_
-* `How to conduct BFloat16 Mixed Precision inference in a TensorFlow Keras application <Inference/TensorFlow/tensorflow_inference_bf16.html>`_
-* `How to save and load optimized ONNXRuntime model in TensorFlow <Inference/TensorFlow/tensorflow_save_and_load_onnx.html>`_
-* `How to save and load optimized OpenVINO model in TensorFlow <Inference/TensorFlow/tensorflow_save_and_load_openvino.html>`_
-
-Install
--------------------------
-* `How to install BigDL-Nano in Google Colab <Install/install_in_colab.html>`_
-* `How to install BigDL-Nano on Windows <Install/windows_guide.html>`_
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Image/learning_rate.png b/docs/readthedocs/source/doc/Nano/Image/learning_rate.png
deleted file mode 100644
index 19467db8..00000000
Binary files a/docs/readthedocs/source/doc/Nano/Image/learning_rate.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/Nano/Overview/hpo.rst b/docs/readthedocs/source/doc/Nano/Overview/hpo.rst
deleted file mode 100644
index 3b0a0f69..00000000
--- a/docs/readthedocs/source/doc/Nano/Overview/hpo.rst
+++ /dev/null
@@ -1,708 +0,0 @@
-AutoML
-***************
-
-Nano provides built-in AutoML support through hyperparameter optimization.
-
-By simply changing imports, you are able to search the model architecture (e.g. by specifying search spaces in layer/activation/function arguments when defining the model), or the training procedure (e.g. by specifying search spaces in ``learning_rate`` or ``batch_size``). You can simply use ``search`` on Model (for tensorflow) or on Trainier (for pytorch) to launch search trials, and ``search_summary`` to review the search results.
-
-Under the hood, the objects (layers, activations, model, etc.) are implicitly turned into searchable objects at creation, which allows search spaces to be specified in their init arguments. Nano HPO collects those search spaces and passes them to the underlying HPO engine (i.e. Optuna) which generates hyperparameter suggestions accordingly. The instantiation and execution of the corresponding objects are delayed until the hyperparameter values are available in each trial.
-
-
-Install
-=======
-
-If you have not installed BigDL-Nano, follow :doc:`Nano Install Guide <../Overview/nano.md#2-install>` to install it according to your system and framework (i.e. tensorflow or pytorch).
-
-Next, install a few dependencies required for Nano HPO using below commands.
-
-.. code-block:: console
-
-    pip install ConfigSpace
-    pip install optuna<=3.1.1
-
-
-
-Search Spaces
-=============
-
-Search spaces are value range specifications that the search engine uses for sampling hyperparameters. The available search spaces in Nano HPO is defined in ``bigdl.nano.automl.hpo.space``. Refer to [Search Space API doc]() for more details.
-
-
-
-For Tensorflow Users
-====================
-
-
-Enable/Disable HPO for tensorflow
----------------------------------
-
-For tensorflow training, you should call ``hpo_config.enable_hpo_tf`` before using Nano HPO.
-
-``hpo_config.enable_hpo_tf`` will dynamically add searchable layers, activations, functions, optimizers, etc into the ``bigdl.nano.tf`` module. When importing layers, you need to change the imports from ``tf.keras.layers`` to ``bigdl.nano.tf.keras.layers``, so that you can specify search spaces in their init arguments. Note even if you don't need to search the model architecture, you still need to change the imports to use HPO.
-
-.. code-block:: python
-    :linenos:
-
-    import bigdl.nano.automl as nano_automl
-    nano_automl.hpo_config.enable_hpo_tf()
-
-
-To disable HPO, use ``hpo_config.disable_hpo_tf``. This will remove the searchable objects from ``bigdl.nano.tf`` module.
-
-.. code-block:: python
-    :linenos:
-
-    import bigdl.nano.automl as nano_automl
-    nano_automl.hpo_config.disable_hpo_tf()
-
-
-Search the Model Architecture
------------------------------
-
-To search different versions of your model, you can specify search spaces when defining the model using either sequential API, functional API or by subclassing ``tf.keras.Model``.
-
-using Sequential API
-^^^^^^^^^^^^^^^^^^^^
-
-You can specify search spaces in layer arguments. Note that search spaces can only be specified in key-word argument (which means ``Dense(space.Int(...))`` should be changed to ``Dense(units=space.Int(...))``). Remember to import ``Sequential`` from ``bigdl.nano.automl.tf.keras`` instead of ``tensorflow.keras``
-
-.. code-block:: python
-    :linenos:
-
-    from bigdl.nano.tf.keras.layers import Dense, Conv2D, Flatten
-    from bigdl.nano.automl.tf.keras import Sequential
-    model = Sequential()
-    model.add(Conv2D(
-        filters=space.Categorical(32, 64),
-        kernel_size=space.Categorical(3, 5),
-        strides=space.Categorical(1, 2),
-        activation=space.Categorical("relu", "linear"),
-        input_shape=input_shape))
-    model.add(Flatten())
-    model.add(Dense(10, activation="softmax"))
-
-
-using Functional API
-^^^^^^^^^^^^^^^^^^^^
-
-You can specify search spaces in layer arguments. Note that if a layer is used more than once in the model, we strongly suggest you specify a ``prefix`` for each search space in such layers to distinguish them, or they will share the same search space (the last space will override all previous definition), as shown in the below example. Remember to import ``Model`` from ``bigdl.nano.automl.tf.keras`` instead of ``tensorflow.keras``.
-
-.. code-block:: python
-    :linenos:
-
-    import bigdl.nano.automl.hpo.space as space
-    from bigdl.nano.tf.keras import Input
-    from bigdl.nano.tf.keras.layers import Dense, Dropout
-    from bigdl.nano.automl.tf.keras import Model
-
-    inputs = Input(shape=(784,))
-    x = Dense(units=space.Categorical(8,16,prefix='dense_1'), activation="linear")(inputs)
-    x = Dense(units=space.Categorical(32,64,prefix='dense_2'), activation="tanh")(x)
-    x = Dropout(rate=space.Real(0.1,0.5, prefix='dropout'))(x)
-    outputs = Dense(units=10)(x)
-    model = Model(inputs=inputs, outputs=outputs, name="mnist_model")
-
-
-by Subclassing tf.keras.Model
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-For models defined by subclassing tf.keras.Model, use the decorator ``@hpo.tfmodel`` to turn the model into a searchable object. Then you will able to specify either search spaces or normal values in the model init arguments.
-
-.. code-block:: python
-    :linenos:
-
-    import bigdl.nano.automl.hpo.space as space
-    import bigdl.nano.automl.hpo as hpo
-    @hpo.tfmodel()
-    class MyModel(tf.keras.Model):
-        def __init__(self, filters, kernel_size, strides, num_classes=10):
-            super().__init__()
-            self.conv1 = tf.keras.layers.Conv2D(filters=filters,
-                                kernel_size=kernel_size,
-                                strides=strides,
-                                activation="relu")
-            self.max1  = tf.keras.layers.MaxPooling2D(3)
-            self.bn1   = tf.keras.layers.BatchNormalization()
-
-            self.gap   = tf.keras.layers.GlobalAveragePooling2D()
-            self.dense = tf.keras.layers.Dense(num_classes)
-
-        def call(self, inputs, training=False):
-            x = self.conv1(inputs)
-            x = self.max1(x)
-            x = self.bn1(x)
-            x = self.gap(x)
-            return self.dense(x)
-
-    model = MyModel(
-        filters=hpo.space.Categorical(32, 64),
-        kernel_size=hpo.space.Categorical(3, 5),
-        strides=hpo.space.Categorical(1, 2)
-    )
-
-
-
-Search the Learning Rate
-------------------------
-
-To search the learning rate, specify search space in ``learning_rate`` argument in the optimizer argument in ``model.compile``. Remember to import the optimizer from ``bigdl.nano.tf.optimizers`` instead of ``tf.keras.optimizers``.
-
-.. code-block:: python
-    :linenos:
-
-    import bigdl.nano.automl.hpo.space as space
-    from bigdl.nano.tf.optimizers import RMSprop
-    model.compile(
-        loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
-        optimizer=RMSprop(learning_rate=space.Real(0.0001, 0.01, log=True)),
-        metrics=["accuracy"],
-    )
-
-
-Search the Batch Size
-----------------------
-
-To search the batch size, specify search space in ``batch_size`` argument in ``model.search``.
-
-.. code-block:: python
-    :linenos:
-
-    import bigdl.nano.automl.hpo.space as space
-    model.search(n_trials=2, target_metric='accuracy', direction="maximize",
-        x=x_train, y=y_train,validation_data=(x_valid, y_valid),
-        batch_size=space.Categorical(128,64))
-
-
-Launch Hyperparameter Search and Review the Results
-----------------------------------------------------
-
-To launch hyperparameter search, call ``model.search`` after compile, as shown below. ``model.search`` runs the ``n_trials`` number of trials (meaning ``n_trials`` set of hyperparameter combinations are searched), and optimizes the ``target_metric`` in the specified ``direction``. Besides search arguments, you also need to specify fit arguments in ``model.search`` which will be used in the fitting process in each trial. Refer to [API docs]() for details.
-
-Call ``model.search_summary`` to retrieve the search results, which you can use to get all trial statistics in pandas dataframe format, pick the best trial, or do visualizations.  Examples of search results analysis and visualization can be found [here](#analysis-and-visualization).
-
-Finally, ``model.fit`` will automatically fit the model using the best set of hyper parameters found in the search. You can also use the hyperparameters from a particular trial other than the best one. Refer to [API docs]() for details.
-
-.. code-block:: python
-    :linenos:
-
-    model = ... # define the model
-    model.compile(...)
-    model.search(n_trials=100, target_metric='accuracy', direction="maximize",
-        x=x_train, y=y_train, batch_size=32, epochs=20, validation_split=0.2)
-    study = model.search_summary()
-    model.fit(...)
-
-
-
-
-For PyTorch Users
-==================
-
-
-Nano-HPO now only supports hyperparameter search for [pytorch-lightning]() modules.
-
-
-Search the Model Architecture
------------------------------
-
-To search the model architecture, use the decorator ``@hpo.plmodel()`` to turn the model into a searchable object. Put the arguments that you want to search in the init arguments and use the arguments to construct the model. The arguments can be either space or non-space values, as shown below.
-
-.. code-block:: python
-    :linenos:
-
-    import bigdl.nano.automl.hpo.space as space
-    import bigdl.nano.automl.hpo as hpo
-
-    @hpo.plmodel()
-    class MyModel(pl.LightningModule):
-        """Customized Model."""
-        def __init__(self,out_dim1,out_dim2,dropout_1,dropout_2):
-            super().__init__()
-            layers = []
-            input_dim = 32
-            for out_dim, dropout in [(out_dim1, dropout_1),(out_dim2,dropout_2)]:
-                layers.append(torch.nn.Linear(input_dim, out_dim))
-                layers.append(torch.nn.Tanh())
-                layers.append(torch.nn.Dropout(dropout))
-                input_dim = out_dim
-            layers.append(torch.nn.Linear(input_dim, 2))
-            self.layers: torch.nn.Module = torch.nn.Sequential(*layers)
-            self.save_hyperparameters()
-        def forward(self, x):
-            return self.layers(x)
-
-    model = MyModel(
-        out_dim1=space.Categorical(16,32),
-        out_dim2=space.Categorical(16,32),
-        dropout_1=space.Categorical(0.1, 0.2, 0.3, 0.4, 0.5),
-        dropout_2 = 0.5)
-
-
-Search the Learning Rate
--------------------------
-
-``learning_rate`` can be specified in the init arguments of your model. You can use ``learning_rate`` to construct the optimizer in ``configure_optimizers()``, as shown below.
-
-.. code-block:: python
-    :linenos:
-
-    import bigdl.nano.automl.hpo.space as space
-    import bigdl.nano.automl.hpo as hpo
-
-    @hpo.plmodel()
-    class MyModel(pl.LightningModule):
-        def __init__(self, ..., learning_rate=0.1):
-            ...
-            self.save_hyperparameters()
-        def configure_optimizers(self):
-            # set learning rate in the optimizer
-            self.optimizer = torch.optim.Adam(self.layers.parameters(),
-                                            lr=self.hparams.learning_rate)
-            return [self.optimizer], []
-    model = MyModel(..., learning_rate=space.Real(0.001,0.01,log=True))
-
-
-Search the Batch Size
--------------------------
-
-``batch_size`` can be specified in the init arguments of your model. You can use the ``batch_size`` to construct the ``DataLoader`` in ``train_dataloader()``, as shown below.
-
-.. code-block:: python
-    :linenos:
-
-    import bigdl.nano.automl.hpo.space as space
-    import bigdl.nano.automl.hpo as hpo
-    @hpo.plmodel()
-    class MyModel(pl.LightningModule):
-        def __init__(self, ..., batch_size=16):
-            ...
-            self.save_hyperparameters()
-        def train_dataloader(self):
-            # set the batch size in train dataloader
-            return DataLoader(RandomDataset(32, 64),
-                            batch_size=self.hparams.batch_size)
-    model = MyModel(..., batch_size = space.Categorical(32,64))
-
-
-Launch Hyperparameter Search and Review the Results
-----------------------------------------------------
-
-First of all, import ``Trainer`` from ``bigdl.nano.pytorch`` instead of ``pytorch_lightning``. Remember to set ``use_hpo=True`` when initializing the ``Trainer``.
-
-To launch hyperparameter search, call ``Trainer.search`` after model is defined. ``Trainer.search`` takes the decorated model as input. Similar to tensorflow, ``trainer.search`` runs the ``n_trials`` number of trials (meaning ``n_trials`` set of hyperparameter combinations are searched), and optimizes the ``target_metric`` in the specified ``direction``. There's an extra argument ``max_epochs`` which is used only in the fitting process in search trials without affecting ``Trainer.fit``. ``Trainer.search`` returns a model configured with the best set of hyper parameters.
-
-Call ``Trainer.search_summary`` to retrieve the search results, which you can use to get all trial statistics in pandas dataframe format, pick the best trial, or do visualizations.  Examples of search results analysis and visualization can be found [here](#analysis-and-visualization).
-
-Finally you can use ``Trainer.fit()`` to fit the best model. You can also get a model constructed with hyperparameters from a particular trial other than the best one. Refer to [Trainer.search API doc]() for more details.
-
-.. code-block:: python
-    :linenos:
-
-    from bigdl.nano.pytorch import Trainer
-    model = MyModel(...)
-    trainer = Trainer(...,use_hpo=True)
-    best_model = trainer.search(
-        model,
-        target_metric='val_loss',
-        direction='minimize',
-        n_trials=100,
-        max_epochs=20,
-    )
-    study = trainer.search_summary()
-    trainer.fit(best_model)
-
-
-Resume Search
-=================
-
-
-You can call ``search`` more than once with flag ``resume=True`` to resume from a previous search instead of starting a new one.
-
-The _resumed_ search will take into consideration all trials in the previous search when sampling hyperparameters. The trials in the resumed search will be stored in the same repo as the first search, and all trials will be retrieved as a whole by ``search_summary``.
-
-Note that the flag ``resume`` is by default set to ``False``, which means each search will by default start from scratch and any previous search results will be overridden and can no longer be retrieved.
-
-
-Use a Persistent Storage
--------------------------
-
-By default, the storage used for storing trial info is created in-memory, so once the process is stopped the trial statistics can not be retrieved anymore. If you are expecting to run search for a long time and may resume search several times, it is highly recommended to use a persistent storage instead of the default in-memory storage.
-
-To use a persistent storage, specify ``storage`` with an RDB url (e.g SQLlite, MySQL, etc.) in ``search``. The simplest way is to specify a sqllite url, as shown in the example below. It will automatically create a db file in the specified path. Also specify ``study_name`` so that all the search with the same name will be gathered into the same repo.
-
-Example
---------
-
-.. tabs::
-
-    .. tab:: Tensorflow
-
-        .. code-block:: python
-
-         name = "resume-example"
-         storage = "sqlite:///example.db"
-         #the first search from scratch
-         model.search(study_name=name, storage=storage,...)
-         # the resumed search
-         model.search(study_name=name, storage=storage, resume=True,...)
-
-    .. tab:: PyTorch
-
-        .. code-block:: python
-
-         name = "resume-example"
-         storage = "sqlite:///example.db"
-         #the first search from scratch
-         trainer.search(study_name=name, storage=storage,...)
-         # the resumed search
-         trainer.search(study_name=name, storage=storage, resume=True,...)
-
-
-If the model/trainer object is still accessible along the searches (e.g. in a running jupyter notebook), the specification of ``storage`` and ``study_name`` can be omitted. Simply call ``search`` with ``resume=True`` to resume search.
-
-
-
-Parallel Search
-================
-
-Parallel search allows trials to be run in multiple processes simultaneously. To use parallel search, you need to prepare an RDB database as storage. Then in ``search``, specify the database url for ``storage``, specify ``study_name``, and set ``n_parallels`` to the number of parallel processes you want to run.
-
-We do not recommend SQLite as storage for parallel search as it may cause deadlocks and performance issues. Here we provide an example using MySQL.
-
-
-Setup MySQL database
----------------------
-
-
-If you already know how to create a database in MySQL, you can skip this step. We assume MySQL service is already installed and started in your local machine.
-
-Create a new file with name ``setup_db.sql``, paste the below contents.
-
-.. code-block:: sql
-    :linenos:
-
-    CREATE DATABASE IF NOT EXISTS example;
-    CREATE USER IF NOT EXISTS bigdlhpo ;
-    GRANT ALL PRIVILEGEs ON example.* TO bigdlhpo;
-    FLUSH PRIVILEGES;
-
-
-Run below command
-
-.. code-block:: console
-
-    $ sudo mysql -u root < setup_db.sql
-
-
-The above command creates a new user ``bigdlhpo`` and a new database ``example``, and grants all access privileges on the ``example`` database to ``bigdlhpo``.
-
-
-Install MySQL client for python
--------------------------------
-
-Install ``mysqlclient`` so that search can access MySQL databases from python.
-
-.. code-block:: console
-
-    pip install mysqlclient
-
-
-
-Example
---------
-
-In search, specify ``storage`` to the MySQL database ``example`` we just created as user ``bigdlhpo``, specify ``study_name`` and also set ``n_parallels=8``.
-
-.. tabs::
-
-    .. tab:: Tensorflow
-
-        .. code-block:: python
-
-         name = "parallel-example-tf"
-         storage = "mysql://bigdlhpo@localhost/example"
-         # the first search from scratch
-         model.search(study_name=name,
-                    storage=storage,
-                    n_parallels=8,
-                    ...)
-
-    .. tab:: PyTorch
-
-        .. code-block:: python
-
-         name = "parallel-example-torch"
-         storage = "mysql://bigdlhpo@localhost/example"
-         #the first search from scratch
-         trainer.search(study_name=name,
-                    storage=storage,
-                    n_parallels=8,
-                    ...)
-
-
-
-
-Analysis and Visualization
-============================
-
-The result of ``search_summary`` can be used for further analysis and visualization.
-
-Get trial statistics as dataframe
----------------------------------
-
-You can export the trial statistics as pandas dataframe, as shown below.
-
-.. tabs::
-
-    .. tab:: Tensorflow
-
-        .. code-block:: python
-
-         ...
-         study = model.search_summary()
-         trials_df = study.trials_dataframe(attrs=("number", "value", "params", "state"))
-
-    .. tab:: PyTorch
-
-        .. code-block:: python
-
-         ...
-         study = trainer.search_summary()
-         trials_df = study.trials_dataframe(attrs=("number", "value", "params", "state"))
-
-
-Below an example of the trials history we have exported as below.
-
-.. image:: ../../../../image/trial_dataframe.png
-   :width: 600
-
-
-Plot Hyperparamter Optimization History
---------------------------------------------------------
-
-You can also plot the optimization history as shown below.
-
-.. tabs::
-
-    .. tab:: Tensorflow
-
-        .. code-block:: python
-
-         ...
-         study = model.search_summary()
-
-         from bigdl.nano.automl.hpo.visualization import plot_optimization_history
-         plot1=plot_optimization_history(study)
-
-    .. tab:: PyTorch
-
-        .. code-block:: python
-
-         ...
-         study = trainer.search_summary()
-
-         from bigdl.nano.automl.hpo.visualization import plot_optimization_history
-         plot_optimization_history(study)
-
-Example plot as below. It is an interactive chart which you can zoom-in and zoom-out and select data points.
-
-.. only:: html
-
-    .. raw:: html
-
-        <iframe src='../../../_static/hpovis/optimization_history.html' height="400px" width="100%" scrolling='no'></iframe>
-
-
-Plot Intermediate Values
---------------------------------------------------------
-
-You can also plot the intermediate values as shown below. This plot shows the metric result on each epoch/step of each trial, including pruned trials.
-
-.. tabs::
-
-    .. tab:: Tensorflow
-
-        .. code-block:: python
-
-         ...
-         study = model.search_summary()
-
-         from bigdl.nano.automl.hpo.visualization import plot_intermediate_values
-         plot_intermediate_values(study)
-
-    .. tab:: PyTorch
-
-        .. code-block:: python
-
-         ...
-         study = trainer.search_summary()
-
-         from bigdl.nano.automl.hpo.visualization import plot_intermediate_values
-         plot_intermediate_values(study)
-
-Example plot as below. It is an interactive chart which you can zoom-in and zoom-out and select data points.
-
-.. only:: html
-
-    .. raw:: html
-
-        <iframe src='../../../_static/hpovis/intermediate_values.html' height="400px" width="100%" scrolling='no'></iframe>
-
-
-Plot the Hyperparameters in Parallel Coordinates
-------------------------------------------------
-
-You can plot the hyperparamters in parallel coordinates chart.
-
-
-.. tabs::
-
-    .. tab:: Tensorflow
-
-        .. code-block:: python
-
-         ...
-         study = model.search_summary()
-
-         from bigdl.nano.automl.hpo.visualization import plot_parallel_coordinate
-         plot_parallel_coordinate(study)
-
-    .. tab:: PyTorch
-
-        .. code-block:: python
-
-         ...
-         study = trainer.search_summary()
-
-         from bigdl.nano.automl.hpo.visualization import plot_parallel_coordinate
-         plot_parallel_coordinate(study)
-
-
-Example plot as below. It is an interactive chart which you can zoom-in and zoom-out and select data points.
-
-
-.. only:: html
-
-    .. raw:: html
-
-        <iframe src='../../../_static/hpovis/parallel_coordinate.html' height="400px" width="100%" scrolling='no'></iframe>
-
-
-Plot the Hyperparameter Contour
-------------------------------------------------
-
-You can plot the hyperparameter contour chart.
-
-
-.. tabs::
-
-    .. tab:: Tensorflow
-
-        .. code-block:: python
-
-         ...
-         study = model.search_summary()
-
-         from bigdl.nano.automl.hpo.visualization import plot_contour
-         plot_contour(study)
-
-    .. tab:: PyTorch
-
-        .. code-block:: python
-
-         ...
-         study = trainer.search_summary()
-
-         from bigdl.nano.automl.hpo.visualization import plot_contour
-         plot_contour(study)
-
-
-Example plot as below. It is an interactive chart which you can zoom-in and zoom-out and select data points.
-
-
-.. only:: html
-
-    .. raw:: html
-
-        <iframe src='../../../_static/hpovis/contour.html' height="400px" width="100%" scrolling='no'></iframe>
-
-
-
-
-Inspect Hyperparameter Importance by accuracy
----------------------------------------------
-
-You can plot the hyperparameter importance according to their relationship to accuracy.
-
-
-.. tabs::
-
-    .. tab:: Tensorflow
-
-        .. code-block:: python
-
-         ...
-         study = model.search_summary()
-
-         from bigdl.nano.automl.hpo.visualization import plot_param_importances
-         plot_param_importances(study)
-
-    .. tab:: PyTorch
-
-        .. code-block:: python
-
-         ...
-         study = trainer.search_summary()
-
-         from bigdl.nano.automl.hpo.visualization import plot_param_importances
-         plot_param_importances(study)
-
-
-Example plot as below. It is an interactive chart which you can zoom-in and zoom-out and select data points.
-
-
-.. only:: html
-
-    .. raw:: html
-
-        <iframe src='../../../_static/hpovis/param_importance.html' height="400px" width="100%" scrolling='no'></iframe>
-
-
-Inspect Hyperparameter Importance by latency
---------------------------------------------
-
-
-You can plot the hyperparameter importance according to their relationship to latency.
-
-.. tabs::
-
-    .. tab:: Tensorflow
-
-        .. code-block:: python
-
-         ...
-         study = model.search_summary()
-
-         from bigdl.nano.automl.hpo.visualization import plot_param_importances
-         plot_param_importances(study, target=lambda t: t.duration.total_seconds(), target_name="duration")
-
-    .. tab:: PyTorch
-
-        .. code-block:: python
-
-         ...
-         study = trainer.search_summary()
-
-         from bigdl.nano.automl.hpo.visualization import plot_param_importances
-         plot_param_importances(study, target=lambda t: t.duration.total_seconds(), target_name="duration")
-
-
-Example plot as below. It is an interactive chart which you can zoom-in and zoom-out and select data points.
-
-
-.. only:: html
-
-    .. raw:: html
-
-        <iframe src='../../../_static/hpovis/param_importance_latency.html' height="400px" width="100%" scrolling='no'></iframe>
-
diff --git a/docs/readthedocs/source/doc/Nano/Overview/index.rst b/docs/readthedocs/source/doc/Nano/Overview/index.rst
deleted file mode 100644
index e4d8a29c..00000000
--- a/docs/readthedocs/source/doc/Nano/Overview/index.rst
+++ /dev/null
@@ -1,9 +0,0 @@
-Nano Key Features
-================================
-
-* `PyTorch Training <pytorch_train.html>`_
-* `PyTorch Inference <pytorch_inference.html>`_
-* `PyTorch CUDA patch <pytorch_cuda_patch.html>`_
-* `Tensorflow Training <tensorflow_train.html>`_
-* `Tensorflow Inference <tensorflow_inference.html>`_
-* `AutoML <hpo.html>`_
diff --git a/docs/readthedocs/source/doc/Nano/Overview/install.md b/docs/readthedocs/source/doc/Nano/Overview/install.md
deleted file mode 100644
index 4c5aba5d..00000000
--- a/docs/readthedocs/source/doc/Nano/Overview/install.md
+++ /dev/null
@@ -1,105 +0,0 @@
-# Nano Installation
-
-You can select bigdl-nano along with some dependencies specific to PyTorch or Tensorflow using the following panel.
-
-```eval_rst
-.. raw:: html
-
-    <link rel="stylesheet" type="text/css" href="../../../_static/css/installation_panel.css" />
-
-    <div class="installation-panel-wrapper">
-      <table class="installation-panel-table">
-        <tbody>
-          <tr>
-            <td colspan="1">FrameWork</td>
-            <td colspan="2"><button id="pytorch" class="install_option_button">PyTorch</button></td>
-            <td colspan="2"><button id="tensorflow" class="install_option_button">TensorFlow</button></td>
-          </tr>
-          <tr id="version">
-            <td colspan="1">Version</td>
-            <td colspan="1"><button id="pytorch_113" class="install_option_button">1.13</button></td>
-            <td colspan="1"><button id="pytorch_112" class="install_option_button">1.12</button></td>
-            <td colspan="1"><button id="pytorch_111" class="install_option_button">1.11</button></td>
-            <td colspan="1"><button id="pytorch_110" class="install_option_button">1.10</button></td>
-          </tr>
-          <tr>
-            <td colspan="1">Inference Opt</td>
-            <td colspan="2"><button id="inference_yes" class="install_option_button">Yes</button></td>
-            <td colspan="2"><button id="inference_no" class="install_option_button">No</button></td>
-          </tr>
-          <tr>
-            <td colspan="1">Release</td>
-            <td colspan="2"><button id="nightly" class="install_option_button">Nightly</button></td>
-            <td colspan="2"><button id="stable" class="install_option_button">Stable</button></td>
-          </tr>
-          <tr>
-            <td colspan="1">Install CMD</td>
-            <td colspan="4" id="cmd">NA</td>
-          </tr>
-        </tbody>
-      </table>
-    </div>
-
-    <script src="../../../_static/js/nano_installation_panel.js"></script>
-```
-
-```eval_rst
-.. note::
-    Since bigdl-nano is still in the process of rapid iteration, we highly recommend that you install nightly build version through the above command to facilitate your use of the latest features.
-
-    For stable version, please refer to the document and installation guide `here <https://bigdl.readthedocs.io/en/v2.2.0/doc/Nano/Overview/install.html>`_ .
-```
-
-## Environment Management
-### Install in conda environment (Recommended)
-
-```bash
-conda create -n env
-conda activate env
-
-# select your preference in above panel to find the proper command to replace the below command, e.g.
-pip install --pre --upgrade bigdl-nano[pytorch]
-
-# after installing bigdl-nano, you can run the following command to setup a few environment variables.
-source bigdl-nano-init
-```
-
-The `bigdl-nano-init` scripts will export a few environment variable according to your hardware to maximize performance.
-
-In a conda environment, when you run `source bigdl-nano-init` manually, this command will also be added to `$CONDA_PREFIX/etc/conda/activate.d/`, which will automaticly run when you activate your current environment.
-
-
-### Install in pure pip environment
-
-In a pure pip environment, you need to run `source bigdl-nano-init` every time you open a new shell to get optimal performance and run `source bigdl-nano-unset-env` if you want to unset these environment variables.
-
-## Other PyTorch/Tensorflow Version Support
-We support a wide range of PyTorch and Tensorflow. We only care the MAJOR.MINOR in [Semantic Versioning](https://semver.org/). If you have a specific PyTorch/Tensorflow version want to use, e.g. PyTorch 1.11.0+cpu, you may select corresponding MAJOR.MINOR (i.e., PyTorch 1.11 in this case) and install PyTorch again after installing nano.
-
-## Python Version
-`bigdl-nano` is validated on Python 3.8-3.10.
-
-
-## Operating System
-Some specific note should be awared of when installing `bigdl-nano`.`
-
-### Install on Linux
-For Linux, Ubuntu (22.04/20.04) is recommended.
-
-### Install on Windows (experimental support)
-
-For Windows OS, users could only run `bigdl-nano-init` every time they open a new cmd terminal.
-
-We recommend using Windows Subsystem for Linux 2 (WSL2) to run BigDL-Nano. Please refer to [Nano Windows install guide](../Howto/Install/windows_guide.md) for instructions.
-
-### Install on MacOS (experimental support)
-#### MacOS with Intel Chip
-Same usage as Linux, while some of the funcions now rely on lower version dependencies.
-
-#### MacOS with M-series chip
-Currently, only tensorflow is supported for M-series chip Mac.
-```bash
-# any way to install tensorflow on macos
-
-pip install --pre --upgrade bigdl-nano[tensorflow]
-```
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Overview/known_issues.md b/docs/readthedocs/source/doc/Nano/Overview/known_issues.md
deleted file mode 100644
index c0828b1e..00000000
--- a/docs/readthedocs/source/doc/Nano/Overview/known_issues.md
+++ /dev/null
@@ -1,71 +0,0 @@
-# Nano Known Issues
-
-## PyTorch Issues
-
-### AttributeError: module 'distutils' has no attribute 'version'
-
-This usually is because the latest setuptools does not compatible with PyTorch 1.9.
-
-You can downgrade setuptools to 58.0.4 to solve this problem.
-
-For example, if your `setuptools` is installed by conda, you can run:
-
-```bash
-conda install setuptools==58.0.4
-```
-
-### error while loading shared libraries: libunwind.so.8
-
-You may see this error message when running `source bigdl-nano-init`
-```
- Sed: error while loading shared libraries: libunwind.so.8: cannot open shared object file: No such file or directory.
-```
-You can use the following command to fix this issue.
-
-* `apt-get install libunwind8-dev` 
-
-### Bus error (core dumped) in multi-instance training with spawn distributed backend
-
-This usually is because you did not set enough shared memory size in your docker container.
-
-You can increase `--shm-size` to a larger value, e.g. a few GB, to your `docker run` command, or use `--ipc=host`.
-
-If you are running in k8s, you can mount larger storage in `/dev/shm`. For example, you can add the following `volume` and `volumeMount` in your pod and container definition.
-
-```yaml
-spec:
-  containers:
-    ...
-    volumeMounts:
-    - mountPath: /dev/shm
-      name: cache-volume
-  volumes:
-  - emptyDir:
-    medium: Memory
-    sizeLimit: 8Gi
-    name: cache-volume
-```
-
-## TensorFlow Issues
-
-### ValueError: Calling `Model.xxx` in graph mode is not supported when the `Model` instance was constructed with eager mode enabled.
-
-Nano keras only supports running in eager mode, if you are using graph mode, please make sure not to import anything from `bigdl.nano.tf`.
-
-### Nano keras multi-instance training currently does not suport tensorflow dataset.from_generators, numpy_function, py_function
-
-Nano keras multi-instance training will serialize TensorFlow dataset object into a `graph.pb` file, which does not work with `dataset.from_generators`, `dataset.numpy_function`, `dataset.py_function` due to limitations in TensorFlow.
-
-### RuntimeError: A keras.Model for quantization must include Input layers.
-
-You may meet this error when running quantization, INC quantization doesn't support model without `Input` layer, you can use OpenVINO or ONNXRuntime in this case, i.e. `InferenceOptimizer.quantize(model, accelerator="openvino", ...)` or `InferenceOptimizer.quantize(model, accelerator="onnxruntime", ...)`
-
-### RuntimeError: Inter op parallelism cannot be modified after initialization
-
-If you meet this error when import `bigdl.nano.tf`, it could be that you have already run some TensorFlow code that set the inter/intra op parallelism, such as `tfds.load`. You can try to workaround this issue by trying to import `bigdl.nano.tf` first before running TensorFlow code. See https://github.com/tensorflow/tensorflow/issues/57812 for more information.
-
-## Ray Issues
-
-### protobuf version error
-
-Now `pip install ray[default]==1.11.0` will install `google-api-core>=2.10.0`, which depends on `protobuf>=3.20.1`. However, nano depends on `protobuf==3.19.4`, so you will meet this error if you install `ray` after `bigdl-nano`. The solution is `pip install google-api-core==2.8.2` before installing `ray`.
diff --git a/docs/readthedocs/source/doc/Nano/Overview/nano.md b/docs/readthedocs/source/doc/Nano/Overview/nano.md
deleted file mode 100644
index 9d6c29e7..00000000
--- a/docs/readthedocs/source/doc/Nano/Overview/nano.md
+++ /dev/null
@@ -1,70 +0,0 @@
-# Nano in 5 minutes
-
-BigDL-Nano is a Python package to transparently accelerate PyTorch and TensorFlow applications on Intel hardware. It provides a unified and easy-to-use API for several optimization techniques and tools, so that users can only apply a few lines of code changes to make their PyTorch or TensorFlow code run faster.
-
-----
-
-
-### PyTorch Bite-sized Example
-
-BigDL-Nano supports both PyTorch and PyTorch Lightning models and most optimizations require only changing a few "import" lines in your code and adding a few flags.
-
-BigDL-Nano uses a extended version of PyTorch Lightning trainer for integrating our optimizations.
-
-For example, if you are using a LightningModule, you can use the following code snippet to enable intel-extension-for-pytorch and multi-instance training.
-
-```python
-from bigdl.nano.pytorch import Trainer
-net = create_lightning_model()
-train_loader = create_training_loader()
-trainer = Trainer(max_epochs=1, use_ipex=True, num_processes=4)
-trainer.fit(net, train_loader)
-```
-
-If you are using custom training loop, you can use the following code to enable intel-extension-for-pytorch, multi-instance training and other nano's optimizations.
-
-```python
-from bigdl.nano.pytorch import TorchNano
-
-class MyNano(TorchNano):
-    def train(...):
-      # copy your train loop here and make a few changes
-      ...
-
-MyNano(use_ipex=True, num_processes=2).train()
-```
-
-For more details on the BigDL-Nano's PyTorch usage, please refer to the [PyTorch Training](./pytorch_train.md) and [PyTorch Inference](./pytorch_inference.md) page.
-
-
-### TensorFlow Bite-sized Example
-
-BigDL-Nano supports `tensorflow.keras` API and most optimizations require only changing a few "import" lines in your code and adding a few flags.
-
-BigDL-Nano uses a extended version of `tf.keras.Model` or `tf.keras.Sequential` for integrating our optimizations.
-
-For example, you can conduct a multi-instance training using the following lines of code:
-
-```python
-import tensorflow as tf
-from bigdl.nano.tf.keras import Sequential
-mnist = tf.keras.datasets.mnist
-
-(x_train, y_train),(x_test, y_test) = mnist.load_data()
-x_train, x_test = x_train / 255.0, x_test / 255.0
-
-model = Sequential([
-  tf.keras.layers.Flatten(input_shape=(28, 28)),
-  tf.keras.layers.Dense(128, activation='relu'),
-  tf.keras.layers.Dropout(0.2),
-  tf.keras.layers.Dense(10, activation='softmax')
-])
-
-model.compile(optimizer='adam',
-              loss='sparse_categorical_crossentropy',
-              metrics=['accuracy'])
-
-model.fit(x_train, y_train, epochs=5, num_processes=4)
-```
-
-For more details on the BigDL-Nano's Tensorflow usage, please refer to the [TensorFlow Training](./tensorflow_train.md) and [TensorFlow Inference](./tensorflow_inference.md) page.
diff --git a/docs/readthedocs/source/doc/Nano/Overview/pytorch_cuda_patch.md b/docs/readthedocs/source/doc/Nano/Overview/pytorch_cuda_patch.md
deleted file mode 100644
index b20d3c86..00000000
--- a/docs/readthedocs/source/doc/Nano/Overview/pytorch_cuda_patch.md
+++ /dev/null
@@ -1,29 +0,0 @@
-# PyTorch CUDA Patch
-
-BigDL-Nano also provides CUDA patch (`bigdl.nano.pytorch.patching.patch_cuda`) to help you run CUDA code without GPU. This patch will replace CUDA operations with equivalent CPU operations, so after applying it, you can run CUDA code on your CPU without changing any code.
-
-```eval_rst
-.. tip::
-    There is also ``bigdl.nano.pytorch.patching.unpatch_cuda`` to unpatch it.
-```
-
-You can use it as following:
-```python
-from bigdl.nano.pytorch.patching import patch_cuda, unpatch_cuda
-patch_cuda()
-
-# Then you can run CUDA code directly even without GPU
-model = torchvision.models.resnet50(pretrained=True).cuda()
-inputs = torch.rand((1, 3, 128, 128)).cuda()
-with torch.no_grad():
-    outputs = model(inputs)
-
-unpatch_cuda()
-```
-
-```eval_rst
-.. note::
-    - You should apply this patch at the beginning of your code, because it can only affect the code after calling it.
-    - This CUDA patch is incompatible with JIT, applying it will disable JIT automatically.
-```
-
diff --git a/docs/readthedocs/source/doc/Nano/Overview/pytorch_inference.md b/docs/readthedocs/source/doc/Nano/Overview/pytorch_inference.md
deleted file mode 100644
index ade82638..00000000
--- a/docs/readthedocs/source/doc/Nano/Overview/pytorch_inference.md
+++ /dev/null
@@ -1,443 +0,0 @@
-# PyTorch Inference
-
-BigDL-Nano provides several APIs which can help users easily apply optimizations on inference pipelines to improve latency and throughput. Currently, performance accelerations are achieved by integrating extra runtimes as inference backend engines or using quantization methods on full-precision trained models to reduce computation during inference. InferenceOptimizer (`bigdl.nano.pytorch.InferenceOptimizer`) provides the APIs for all optimizations that you need for inference.
-
-
-## Automatically Choose the Best Optimization
-
-If you have no idea about which one optimization to choose or you just want to compare them and choose the best one, you can use `InferenceOptimizer.optimize`.
-
-Let's take mobilenetv3 as an example model, you can use it as following:
-
-```python
-from torchvision.models.mobilenetv3 import mobilenet_v3_small
-import torch
-from torch.utils.data.dataset import TensorDataset
-from torch.utils.data.dataloader import DataLoader
-from bigdl.nano.pytorch import InferenceOptimizer, Trainer
-
-# step 1: create your model
-model = mobilenet_v3_small(num_classes=10)
-
-# step 2: prepare your data and dataloader
-x = torch.rand((10, 3, 256, 256))
-y = torch.ones((10, ), dtype=torch.long)
-ds = TensorDataset(x, y)
-dataloader = DataLoader(ds, batch_size=2)
-
-# (Optional) step 3: Something else, like training ...
-
-# try all supproted optimizations
-opt = InferenceOptimizer()
-opt.optimize(model, training_data=dataloader, thread_num=4)
-
-# get the best optimization
-best_model, option = opt.get_best_model()
-
-# use the quantized model as before
-with InferenceOptimizer.get_context(best_model):
-    y_hat = best_model(x)
-```
-
-`InferenceOptimizer.optimize()` will try all supported optimizations and choose the best one by `get_best_model()`.
-The output table of `optimize()` looks like:
-```bash
- -------------------------------- ---------------------- --------------
-|             method             |        status        | latency(ms)  |
- -------------------------------- ---------------------- --------------
-|            original            |      successful      |    9.337     |
-|              bf16              |      successful      |    8.974     |
-|          static_int8           |      successful      |    8.934     |
-|         jit_fp32_ipex          |      successful      |    10.013    |
-|  jit_fp32_ipex_channels_last   |      successful      |    4.955     |
-|         jit_bf16_ipex          |      successful      |    2.563     |
-|  jit_bf16_ipex_channels_last   |      successful      |    3.135     |
-|         openvino_fp32          |      successful      |    1.727     |
-|         openvino_int8          |      successful      |    1.635     |
-|        onnxruntime_fp32        |      successful      |    3.801     |
-|    onnxruntime_int8_qlinear    |      successful      |    4.727     |
- -------------------------------- ---------------------- --------------
-Optimization cost 58.3s in total.
-```
-
-For more details, you can refer [How-to guide](https://bigdl.readthedocs.io/en/latest/doc/Nano/Howto/Inference/PyTorch/inference_optimizer_optimize.html) and [API Doc](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Nano/pytorch.html#bigdl-nano-pytorch-inferenceoptimizer). 
-
-
-Before you go ahead with these APIs, you have to make sure BigDL-Nano is correctly installed for PyTorch. If not, please follow [this](../Overview/nano.md) to set up your environment.
-
-```eval_rst
-.. note::
-    You can install all required dependencies by
-
-    .. code-block:: bash
-
-        pip install --pre --upgrade bigdl-nano[pytorch,inference]
-
-    This will install all dependencies required by BigDL-Nano PyTorch inference. It's recommanded since it will install all dependencies required by BigDL-Nano PyTorch inference with no version conflict issue.
-
-    Or if you just want to use one of supported optimizations, you could install BigDL-Nano for PyTorch with manually installed dependencies:
-
-    .. code-block:: bash
-
-        pip install --pre --upgrade bigdl-nano[pytorch]
-
-    with
-
-    - `INC (Intel Neural Compressor) <https://github.com/intel/neural-compressor>`_: ``pip install neural-compressor``
-
-    - `OpenVINO <https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/overview.html>`_: ``pip install openvino-dev``
-
-    - `ONNXRuntime <https://onnxruntime.ai/>`_: ``pip install onnx onnxruntime onnxruntime-extensions onnxsim neural-compressor``
-```
-
-
-## Runtime Acceleration
-
-For runtime acceleration, BigDL-Nano has enabled three kinds of graph mode format and corresponding runtime in `InferenceOptimizer.trace()`: ONNXRuntime, OpenVINO and TorchScript.
-
-```eval_rst
-.. warning::
-    ``bigdl.nano.pytorch.Trainer.trace`` will be deprecated in future release.
-
-    Please use ``bigdl.nano.pytorch.InferenceOptimizer.trace`` instead.
-```
-
-All available runtime accelerations are integrated in `InferenceOptimizer.trace(accelerator='onnxruntime'/'openvino'/'jit')` with different accelerator values. 
-
-### ONNXRuntime Acceleration
-You can simply append the following part to enable your [ONNXRuntime](https://onnxruntime.ai/) acceleration.
-```python
-# step 4: trace your model as an ONNXRuntime model
-# if you have run `trainer.fit` before trace, then argument `input_sample` is not required.
-ort_model = InferenceOptimizer.trace(model, accelerator='onnxruntime', input_sample=x)
-
-# step 5: use returned model for transparent acceleration
-# The usage is almost the same with any PyTorch module,
-# except for the change to wrap the inference process with Nano context manager
-with InferenceOptimizer.get_context(ort_model):
-    y_hat = ort_model(x)
-```
-
-```eval_rst
-.. note::
-    For all Nano optimized models, you need to wrap the inference process with the automatic context manager provided by Nano through the API ``InferenceOptimizer.get_context(model=...)``.
-
-    Please note that the context manager is not needed for `multi-instance inference <#multi-instance-acceleration>`_.
-
-    For more details about the context manager, you could refer to section `Automatic Context Management <#automatic-context-management>`_.
-```
-### OpenVINO Acceleration
-The [OpenVINO](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/overview.html) usage is quite similar to ONNXRuntime, the following usage is for OpenVINO:
-```python
-# step 4: trace your model as a openvino model
-# if you have run `trainer.fit` before trace, then argument `input_sample` is not required.
-ov_model = InferenceOptimizer.trace(model, accelerator='openvino', input_sample=x)
-
-# step 5: use returned model for transparent acceleration
-# The usage is almost the same with any PyTorch module,
-# except for the change to wrap the inference process with Nano context manager
-with InferenceOptimizer.get_context(ov_model):
-    y_hat = ov_model(x)
-```
-
-### TorchScript Acceleration
-The [TorchScript](https://pytorch.org/docs/stable/jit.html) usage is a little different from above two cases. In addition to specifying `accelerator=jit`, you can also set `use_ipex=True` to enable the additional acceleration provided by [IPEX (Intel® Extension for PyTorch*)](https://www.intel.com/content/www/us/en/developer/tools/oneapi/extension-for-pytorch.html), we generally recommend the combination of `jit` and `ipex`.The following usage is for TorchScript:
-```python
-# step 4: trace your model as a JIT model
-jit_model = InferenceOptimizer.trace(model, accelerator='jit', input_sample=x)
-
-# or you can combine jit with ipex
-jit_model = InferenceOptimizer.trace(model, accelerator='jit',
-                                     use_ipex=True, input_sample=x)
-
-# step 5: use returned model for transparent acceleration
-# The usage is almost the same with any PyTorch module,
-# except for the change to wrap the inference process with Nano context manager
-with InferenceOptimizer.get_context(jit_model):
-    y_hat = jit_model(x)
-```
-
-## Quantization
-Quantization is widely used to compress models to a lower precision, which not only reduces the model size but also accelerates inference. 
-
-For quantization, BigDL-Nano provides only post-training quantization in `InferenceOptimizer.quantize()` for users to infer with models of 8-bit precision or 16-bit precision. Quantization-aware training is not available for now.
-
-```eval_rst
-.. warning::
-    ``bigdl.nano.pytorch.Trainer.quantize`` will be deprecated in future release.
-
-    Please use ``bigdl.nano.pytorch.InferenceOptimizer.quantize`` instead.
-```
-
-### Int8 Quantization
-BigDL-Nano provides `InferenceOptimizer.quantize()` API for users to quickly obtain a int8 quantized model with accuracy control by specifying a few arguments. Intel Neural Compressor (INC) and Post-training Optimization Tools (POT) from OpenVINO toolkit are enabled as options.
-
-To use INC as your quantization engine, you can choose accelerator as `None` or `'onnxruntime'`. Otherwise, `accelerator='openvino'` means using OpenVINO POT to do quantization.
-
-By default, `InferenceOptimizer.quantize()` doesn't search the tuning space and returns the fully-quantized model without considering the accuracy drop. If you need to search quantization tuning space for a model with accuracy control, you'll have to specify a few arguments to define the tuning space. More instructions in [Quantization with Accuracy Control](#quantization-with-accuracy-control)
-
-#### Quantization using Intel Neural Compressor
-**Quantization without extra accelerator**
-
-Without extra accelerator, `InferenceOptimizer.quantize()` returns a PyTorch module with desired precision and accuracy. Following the example in [Runtime Acceleration](#runtime-acceleration), you can add quantization as below:
-```python
-q_model = InferenceOptimizer.quantize(model, calib_data=dataloader)
-# run simple prediction with transparent acceleration
-with InferenceOptimizer.get_context(q_model):
-    y_hat = q_model(x)
-```
-This is a most basic usage to quantize a model with defaults, INT8 precision, and without search tuning space to control accuracy drop.
-
-**Quantization with ONNXRuntime accelerator**
-
-With the ONNXRuntime accelerator, `InferenceOptimizer.quantize()` will return a model with compressed precision and running inference in the ONNXRuntime engine.
-Still taking the example in [Runtime Acceleration](pytorch_inference.md#runtime-acceleration), you can add quantization as below:
-```python
-ort_q_model = InferenceOptimizer.quantize(model, accelerator='onnxruntime', calib_data=dataloader)
-# run simple prediction with transparent acceleration
-with InferenceOptimizer.get_context(ort_q_model):
-    y_hat = ort_q_model(x)
-```
-
-#### Quantization using Post-training Optimization Tools
-The POT (Post-training Optimization Tools) is provided by OpenVINO toolkit.
-Take the example in [Runtime Acceleration](#runtime-acceleration), and add quantization:
-```python
-ov_q_model = InferenceOptimizer.quantize(model, accelerator='openvino', calib_data=dataloader)
-# run simple prediction with transparent acceleration
-with InferenceOptimizer.get_context(ov_q_model):
-    y_hat = ov_q_model(x)
-```
-
-#### Quantization with Accuracy Control
-A set of arguments that helps to tune the results for both INC and POT quantization:
-
-- `calib_data`: A calibration dataloader is required for static post-training quantization. And for POT, it's also used for evaluation
-- `metric`: A metric of `torchmetric` to run evaluation and compare with baseline
-
-- `accuracy_criterion`: A dictionary to specify the acceptable accuracy drop, e.g. `{'relative': 0.01, 'higher_is_better': True}`
-
-    - `relative` / `absolute`: Drop type, the accuracy drop should be relative or absolute to baseline
-    - `higher_is_better`: Indicate if a larger value of metric means better accuracy
-- `max_trials`: Maximum trails on the search, if the algorithm can't find a satisfying model, it will exit and raise the error.
-
-**Accuracy Control with INC**
-There are a few arguments required only by INC, and you should not specify or modify any of them if you use `accelerator='openvino'`.
-- `tuning_strategy` (optional): it specifies the algorithm to search the tuning space. In most cases, you don't need to change it.
-- `timeout`: Timeout of your tuning. Defaults `0` means endless time for tuning.
-
-Here is an example to use INC with accuracy control as below. It will search for a model within 1% accuracy drop with 10 trials.
-```python
-from torchmetrics.classification import MulticlassAccuracy
-InferenceOptimizer.quantize(model,
-                            precision='int8',
-                            accelerator=None,
-                            calib_data=dataloader,
-                            metric=MulticlassAccuracy(num_classes=10)
-                            accuracy_criterion={'relative': 0.01, 'higher_is_better': True},
-                            approach='static',
-                            method='fx',
-                            tuning_strategy='bayesian',
-                            timeout=0,
-                            max_trials=10,
-                            )
-```
-**Accuracy Control with POT**
-Similar to INC, we can run quantization like:
-```python
-from torchmetrics.classification import Accuracy
-InferenceOptimizer.quantize(model,
-                            precision='int8',
-                            accelerator='openvino',
-                            calib_data=dataloader,
-                            metric=MulticlassAccuracy(num_classes=10)
-                            accuracy_criterion={'relative': 0.01, 'higher_is_better': True},
-                            approach='static',
-                            max_trials=10,
-                            )
-```
-
-### BFloat16 Quantization
-
-BigDL-Nano has support [mixed precision inference](https://pytorch.org/docs/stable/amp.html?highlight=mixed+precision) with BFloat16 and a series of additional performance tricks. BFloat16 Mixed Precison inference combines BFloat16 and FP32 during inference, which could lead to increased performance and reduced memory usage. Compared to FP16 mixed precison, BFloat16 mixed precision has better numerical stability.
-It's quite easy for you use BFloat16 Quantization as below:
-```python
-bf16_model = InferenceOptimizer.quantize(model,
-                                         precision='bf16')
-# run simple prediction with transparent acceleration
-with InferenceOptimizer.get_context(bf16_model):
-    y_hat = bf16_model(x)
-```
-
-```eval_rst
-.. note::
-    For BFloat16 quantization, make sure your inference is under ``with InferenceOptimizer.get_context(bf16_model):``. Otherwise, the whole inference process is actually FP32 precision.
-```
-
-#### Channels Last Memory Format
-You could experience Bfloat16 Quantization with `channels_last=True` to use the channels last memory format, i.e. NHWC (batch size, height, width, channels), as an alternative way to store tensors in classic/contiguous NCHW order.
-The usage for this is as below:
-```python
-bf16_model = InferenceOptimizer.quantize(model,
-                                         precision='bf16',
-                                         channels_last=True)
-# run simple prediction with transparent acceleration
-with InferenceOptimizer.get_context(bf16_model):
-    y_hat = bf16_model(x)
-```
-
-#### Intel® Extension for PyTorch
-[Intel Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch) (a.k.a. IPEX) extends PyTorch with optimizations for an extra performance boost on Intel hardware.
-
-BigDL-Nano integrates IPEX through `InferenceOptimizer.quantize()`. Users can turn on IPEX by setting `use_ipex=True`:
-```python
-bf16_model = InferenceOptimizer.quantize(model,
-                                         precision='bf16',
-                                         use_ipex=True,
-                                         channels_last=True)
-# run simple prediction with transparent acceleration
-with InferenceOptimizer.get_context(bf16_model):
-    y_hat = bf16_model(x)
-```
-
-#### TorchScript Acceleration
-The [TorchScript](https://pytorch.org/docs/stable/jit.html) can also be used for Bfloat16 quantization. We recommend you take advantage of IPEX with TorchScript for further optimizations. The following usage is for TorchScript:
-```python
-bf16_model = InferenceOptimizer.quantize(model,
-                                         precision='bf16',
-                                         accelerator='jit',
-                                         input_sample=x,
-                                         use_ipex=True,
-                                         channels_last=True)
-# run simple prediction with transparent acceleration
-with InferenceOptimizer.get_context(bf16_model):
-    y_hat = bf16_model(x)
-```
-
-## Multi-instance Acceleration
-
-BigDL-Nano also provides multi-instance inference. To use it, you should call `multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=n)` first, where `num_processes` specifies the number of processes you want to use.
-
-After calling it, `multi_model` will receive a `DataLoader` or a list of batches instead of a batch, and produce a list of inference result instead of a single result. You can use it as following:
-
-```python
-multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=4)
-
-# predict a DataLoader
-y_hat_list = multi_model(dataloader)
-
-# or predict a list of batches instead of entire DataLoader
-it = iter(dataloader)
-batch_list = []
-for i in range(10):
-    batch = next(it)
-    batch_list.append(batch)
-y_hat_list = multi_model(batch_list)
-
-# y_hat_list is a list of inference result, you can use it like this
-for y_hat in y_hat_list:
-    do_something(y_hat)
-```
-
-`InferenceOptimizer.to_multi_instance` also has a parameter named `cores_per_process` to specify the number of CPU cores used by each process, and a parameter named `cpu_for_each_process` to specify the CPU cores used by each process. Normally you don't need to set them manually, BigDL-Nano will find the best configuration automatically. But if you want, you can use them as following:
-```python
-# Use 4 processes to run inference,
-# each process will use 2 CPU cores
-multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=4, cores_per_process=2)
-
-# Use 4 processes to run inference,
-# the first process will use core 0, the second process will use core 1,
-# the third process will use core 2 and 3, the fourth process will use core 4 and 5
-multi_model = InferenceOptimizer.to_multi_instance(model, cpu_for_each_process=[[0], [1], [2,3], [4,5]])
-```
-
-```eval_rst
-.. note::
-    During multi-instance infernece, the context manager ``InferenceOptimizer.get_context(model=...)`` is not needed to be maunally added.
-```
-
-## Automatic Context Management
-BigDL-Nano provides ``InferenceOptimizer.get_context(model=...)`` API to enable automatic context management for PyTorch inference. With only one line of code change, BigDL-Nano will automatically provide suitable context management for each accelerated model optimized by ``InferenceOptimizer.trace``/``quantize``/``optimize``, it usually contains part of or all of following four types of context managers:
-
-1. ``torch.inference_mode(True)`` to disable gradients, which will be used for all models. For the case when ``torch <= 1.12``, ``torch.no_grad()`` will be used for PyTorch mixed precision inference as a replacement of ``torch.inference_mode(True)``
-
-2. ``torch.cpu.amp.autocast(dtype=torch.bfloat16)`` to run in mixed precision, which will be provided for bf16 related models
-
-3. ``torch.set_num_threads()`` to control thread number, which will be used only if you specify thread_num when applying ``InferenceOptimizer.trace``/``quantize``/``optimize``
-
-4. ``torch.jit.enable_onednn_fusion(True)`` to support ONEDNN fusion for jit when using jit as accelerator
-
-For model accelerated by ``InferenceOptimizer.trace``, usage now looks like below codes, here we just take ``ipex`` for example:
-```python
-from bigdl.nano.pytorch import InferenceOptimizer
-ipex_model = InferenceOptimizer.trace(model,
-                                      use_ipex=True,
-                                      thread_num=4)
-
-with InferenceOptimizer.get_context(ipex_model):
-    output = ipex_model(x)
-    assert torch.get_num_threads() == 4  # this line just to let you know Nano has provided thread control automatically : )
-```
-
-For ``InferenceOptimizer.quantize`` and ``InferenceOptimizer.optimize``, usage is the same.
-
-``InferenceOptimizer.get_context(model=...)`` can be used for muitiple models. If you have a model pipeline, you can also get a common context manager by passing multiple models to `get_context`.
-```python
-from torch import nn
-class Classifier(nn.Module):
-    def __init__(self):
-        super().__init__()
-        self.linear = nn.Linear(1000, 1)
-    
-    def forward(self, x):
-        return self.linear(x)
-
-classifer = Classifier()
-
-with InferenceOptimizer.get_context(ipex_model, classifer):
-    # a pipeline consists of backbone and classifier
-    x = ipex_model(input_sample)
-    output = classifer(x) 
-    assert torch.get_num_threads() == 4  # this line just to let you know Nano has provided thread control automatically : )
-```
-
-```eval_rst
-.. seealso::
-   You could refer to the related `how-to guide <../Howto/Inference/PyTorch/pytorch_context_manager.nblink>`_ for more detailed usage of the context manager.
-```
-## One-click Accleration Without Code Change
-```eval_rst
-.. note::
-    Neural Compressor >= 2.0 is needed for this function. You may call ``pip install --upgrade neural-compressor`` before using this functionality.
-```
-
-We also provides a no-code method for users to accelerate their pytorch inferencing workflow through Neural Coder. Neural Coder is a novel component under Intel® Neural Compressor to further simplify the deployment of deep learning models via one-click. BigDL-Nano is now a backend in Neural Coder. Users could call
-
-```bash
-python -m neural_coder -o <acceleration_name> example.py
-```
-
-For `example.py`, it could be a common pytorch inference script without any code changes needed. For `<acceleration_name>`, please check following table.
-
-| Optimization Set | `<acceleration_name>` | 
-| ------------- | ------------- | 
-| BF16 + Channels Last | `nano_bf16_channels_last` | 
-| BF16 + IPEX + Channels Last | `nano_bf16_ipex_channels_last` | 
-| BF16 + IPEX | `nano_bf16_ipex` | 
-| BF16 | `nano_bf16` | 
-| Channels Last | `nano_fp32_channels_last` | 
-| IPEX + Channels Last | `nano_fp32_ipex_channels_last` | 
-| IPEX | `nano_fp32_ipex` | 
-| INT8 | `nano_int8` | 
-| JIT + BF16 + Channels Last | `nano_jit_bf16_channels_last` | 
-| JIT + BF16 + IPEX + Channels Last | `nano_jit_bf16_ipex_channels_last` | 
-| JIT + BF16 + IPEX | `nano_jit_bf16_ipex` | 
-| JIT + BF16 | `nano_jit_bf16` | 
-| JIT + Channels Last | `nano_jit_fp32_channels_last` | 
-| JIT + IPEX + Channels Last | `nano_jit_fp32_ipex_channels_last` | 
-| JIT + IPEX | `nano_jit_fp32_ipex` | 
-| JIT | `nano_jit_fp32` | 
-| ONNX Runtime | `nano_onnxruntime_fp32` | 
-| ONNX Runtime + INT8 | `nano_onnxruntime_int8_qlinear` | 
-| OpenVINO | `nano_openvino_fp32` | 
-| OpenVINO + INT8 | `nano_openvino_int8` |
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Overview/pytorch_train.md b/docs/readthedocs/source/doc/Nano/Overview/pytorch_train.md
deleted file mode 100644
index c98b2768..00000000
--- a/docs/readthedocs/source/doc/Nano/Overview/pytorch_train.md
+++ /dev/null
@@ -1,315 +0,0 @@
-# PyTorch Training
-
-BigDL-Nano can be used to accelerate **PyTorch** or **PyTorch-Lightning** applications on training workloads. These optimizations are either enabled by default or can be easily turned on by setting a parameter or calling a method.
-
-The optimizations in BigDL-Nano are delivered through
-
-1) An extended version of PyTorch-Lightning `Trainer` (`bigdl.nano.pytorch.Trainer`) for `LightningModule` and easy `nn.Module`.
-
-2) An abstract `TorchNano` (`bigdl.nano.pytorch.TorchNano`) or a decorator `@nano` (`bigdl.nano.pytorch.nano`) to accelerate raw or complex `nn.Module`.
-
-We will briefly describe here the major features in BigDL-Nano for PyTorch training. You can find complete how to guides for acceleration of [PyTorch-Lightning](../Howto/index.html#pytorch-lightning) and [PyTorch](../Howto/index.html#pytorch).
-
-## Best Known Environment Variables
-When you successfully installed `bigdl-nano` (please refer to [installation guide](./install.html)) in a conda environment. You are **highly recommeneded** to run following command **once**.
-```bash
-source bigdl-nano-init
-```
-BigDL-Nano will export a few environment variables, such as `OMP_NUM_THREADS` and `KMP_AFFINITY`, according to your current hardware. Empirically, these environment variables work best for most PyTorch applications. After setting these environment variables, you can just run your applications as usual (e.g., `python app.py` or `jupyter notebook`).
-
-## Accelerate `nn.Module`'s training
-`nn.Module` is the abstraction used in PyTorch for AI Model. It's common that users' model is easy enough to be handled by a regular training loop. In other cases, users may have highly customized training loop. Nano could support the acceleration for both cases.
-
-### `nn.Module` with regular training loop
-Most of the AI model defined in `nn.Module` could be trained in a similar regular training loop. Any `nn.Module` that 
-- Have only one output
-- Need only 1 loss function and 1 optimizer (e.g., GAN might not applied)
-- Have no special customized checkpoint/evaluation logic
-
-could use `Trainer.compile` that takes in a PyTorch module, a loss, an optimizer, and other PyTorch objects and "compiles" them into a `LightningModule`. And then a `Trainer` instance could be used to train this compiled model.
-
-For example,
-
-```python
-from bigdl.nano.pytorch import Trainer
-
-lightning_module = Trainer.compile(pytorch_module, loss, optimizer)
-trainer = Trainer(max_epochs=10)
-trainer.fit(lightning_module, train_loader)
-```
-
-`trainer.fit` will apply all the acceleration methods that could generally be applied to any models. While there are some optional acceleration method for which you could easily enable.
-
-### `nn.Module` with customized training loop
-
-The `TorchNano` class is what we use to accelerate raw PyTorch code. By using it, we only need to make very few changes to accelerate custom training loop. For example,
-
-```python
-from bigdl.nano.pytorch import TorchNano
-
-class MyNano(TorchNano) :
-    def train(self, ...):
-        # copy your train loop here and make a few changes
-        ...
-
-MyNano().train(...)
-```
-
-```eval_rst
-.. important::
-   
-   Please refer to `here <../Howto/Training/PyTorch/convert_pytorch_training_torchnano.html#Convert-to-TorchNano>`_ for a detailed guide about how to make changes to your custom PyTorch training loop so that you could use ``TorchNano``.
-
-   Note that the most important change is to use ``self.setup`` method to set up model, optimizer(s), and dataloader(s) for acceleration inside the training loop.
-```
-
-If you have already defined a PyTorch training loop function with a model, optimizers, and dataloaders as parameters, you could use `@nano` decorator to gain acceleration in a simpler way. For example:
-
-```python
-from bigdl.nano.pytorch import nano
-
-@nano()
-def train(model, optimizer, train_loader, ...):
-    ...
-
-train(model, optimizer, train_loader, ...)
-```
-
-```eval_rst
-.. seealso::
-   
-   Please refer to `here <../Howto/Training/PyTorch/use_nano_decorator_pytorch_training.html>`_ for a detailed usage of ``@nano`` decorator.
-```
-
-## Accelerate `LightningModule`'s training
-
-The PyTorch `Trainer` extends PyTorch Lightning's `Trainer` and has a few more parameters and methods specific to BigDL-Nano. The Trainer can be directly used to train a `LightningModule`.
-
-For example,
-
-```python
-from pytorch_lightning import LightningModule
-from bigdl.nano.pytorch import Trainer
-
-class MyModule(LightningModule):
-    #  LightningModule definition
-
-lightning_module = MyModule()
-trainer = Trainer(max_epochs=10)
-trainer.fit(lightning_module, train_loader)
-```
-
-## Optional Acceleration Methods
-### Intel® Extension for PyTorch*
-
-[Intel Extension for PyTorch*](https://github.com/intel/intel-extension-for-pytorch) (a.k.a. IPEX) extends PyTorch with optimizations for an extra performance boost on Intel hardware.
-
-BigDL-Nano integrates IPEX in `Trainer`, `TorchNano` and `@nano` decorator. Users can turn on IPEX by setting `use_ipex=True`.
-
-```eval_rst
-
-.. tabs::
-
-    .. tab:: Trainer
-
-        .. code-block:: python
-
-            from bigdl.nano.pytorch import Trainer
-
-            trainer = Trainer(max_epochs=10, use_ipex=True)
-            trainer.fit(...)
-
-    .. tab:: TorchNano
-
-        .. code-block:: python
-
-            from bigdl.nano.pytorch import TorchNano
-
-            class MyNano(TorchNano) :
-                def train(self, ...):
-                    # copy your train loop here and make a few changes
-
-            MyNano(use_ipex=True).train(...)
-
-    .. tab:: @nano
-
-        .. code-block:: python
-
-            from bigdl.nano.pytorch import nano
-
-            @nano(use_ipex=True)
-            def train(model, optimizer, train_loader, ...):
-                ...
-
-            train(model, optimizer, train_loader, ...)
-
-```
-
-### Multi-instance Training
-
-When training on a server with dozens of CPU cores, it is often beneficial to use multiple training instances in a data-parallel fashion to make full use of the CPU cores. However, using PyTorch's DDP API is a little cumbersome and error-prone, and if not configured correctly, it will make the training even slow.
-
-You can just set the `num_processes` parameter in the `Trainer` or `TorchNano` constructor, or the `@nano` decorator, so that BigDL-Nano will launch the specific number of processes to perform data-parallel training. Each process will be automatically pinned to a different subset of CPU cores to avoid conflict and maximize training throughput.
-
-```eval_rst
-
-.. tabs::
-
-    .. tab:: Trainer
-
-        .. code-block:: python
-
-            from bigdl.nano.pytorch import Trainer
-
-            trainer = Trainer(max_epochs=10, num_processes=4)
-            trainer.fit(...)
-
-    .. tab:: TorchNano
-
-        .. code-block:: python
-
-            from bigdl.nano.pytorch import TorchNano
-
-            class MyNano(TorchNano) :
-                def train(self, ...):
-                    # copy your train loop here and make a few changes
-
-            MyNano(num_processes=4).train(...)
-
-    .. tab:: @nano
-
-        .. code-block:: python
-
-            from bigdl.nano.pytorch import nano
-
-            @nano(num_processes=4)
-            def train(model, optimizer, train_loader, ...):
-                ...
-
-            train(model, optimizer, train_loader, ...)
-
-```
-```eval_rst
-.. note::
-   
-   The effective batch size in multi-instance training is the ``batch_size`` in your ``dataloader`` times ``num_processes``. So, the number of iterations of each epoch will be reduced ``num_processes`` fold. To achieve the same effect as single instance training, a common practice to compensate is to gradually increase the learning rate to ``num_processes`` times. 
-
-   BigDL-Nano supports this practice by default through ``auto_lr=Ture``, which will scale the learning rate linearly by ``num_processes`` times.
-
-   To get more details about this 'learning rate warmup' trick, you could refer to `this paper <https://arxiv.org/abs/1706.02677>`_ published by Facebook.
-```
-
-### BFloat16 Mixed Precision
-BFloat16 Mixed Precison combines BFloat16 and FP32 during training, which could lead to increased performance and reduced memory usage. Compared to FP16 mixed precison, BFloat16 mixed precision has better numerical stability.
-
-You could instantiate a BigDL-Nano `Trainer` or `TorchNano` with `precision='bf16'`, or set `precision='bf16'` in the `@nano` decorator to use BFloat16 mixed precision for training.
-
-
-```eval_rst
-
-.. tabs::
-
-    .. tab:: Trainer
-
-        .. code-block:: python
-
-            from bigdl.nano.pytorch import Trainer
-
-            trainer = Trainer(max_epochs=5, precision='bf16')
-            trainer.fit(...)
-
-    .. tab:: TorchNano
-
-        .. code-block:: python
-
-            from bigdl.nano.pytorch import TorchNano
-
-            class MyNano(TorchNano) :
-                def train(self, ...):
-                    # copy your train loop here and make a few changes
-
-            MyNano(precision='bf16').train(...)
-
-    .. tab:: @nano
-
-        .. code-block:: python
-
-            from bigdl.nano.pytorch import nano
-
-            @nano(precision='bf16')
-            def train(model, optimizer, train_loader, ...):
-                ...
-
-            train(model, optimizer, train_loader, ...)
-
-```
-
-### Channels Last Memory Format
-You could instantiate a BigDL-Nano `Trainer` or `TorchNano` with `channels_last=True`, or set `channels_last=True` in the `@nano` decorator to use the channels last memory format, i.e. NHWC (batch size, height, width, channels), as an alternative way to store tensors in classic/contiguous NCHW order.
-
-```eval_rst
-
-.. tabs::
-
-    .. tab:: Trainer
-
-        .. code-block:: python
-
-            from bigdl.nano.pytorch import Trainer
-
-            trainer = Trainer(max_epochs=5, channels_last=True)
-            trainer.fit(...)
-
-    .. tab:: TorchNano
-
-        .. code-block:: python
-
-            from bigdl.nano.pytorch import TorchNano
-
-            class MyNano(TorchNano) :
-                def train(self, ...):
-                    # copy your train loop here and make a few changes
-
-            MyNano(channels_last=True).train(...)
-
-    .. tab:: @nano
-
-        .. code-block:: python
-
-            from bigdl.nano.pytorch import nano
-
-            @nano(channels_last=True)
-            def train(model, optimizer, train_loader, ...):
-                ...
-
-            train(model, optimizer, train_loader, ...)
-
-```
-
-
-## Accelerate `torchvision` data processing
-
-Computer Vision task often needs a data processing pipeline that sometimes constitutes a non-trivial part of the whole training pipeline.
-
-Leveraging OpenCV and libjpeg-turbo, BigDL-Nano can accelerate computer vision data pipelines by providing a drop-in replacement of `torchvision`'s components such as `datasets` and `transforms`. Nano provides a patch API `patch_torch` to accelerate these functions.
-
-
-```python
-from bigdl.nano.pytorch import patch_torch
-patch_torch()
-
-from torchvision.datasets import ImageFolder
-from torchvision import transforms
-
-data_transform = transforms.Compose([
-    transforms.Resize(256),
-    transforms.ColorJitter(),
-    transforms.RandomCrop(224),
-    transforms.RandomHorizontalFlip(),
-    transforms.Resize(128),
-    transforms.ToTensor()
-])
-
-train_set = ImageFolder(train_path, data_transform)
-train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True)
-```
diff --git a/docs/readthedocs/source/doc/Nano/Overview/support.md b/docs/readthedocs/source/doc/Nano/Overview/support.md
deleted file mode 100644
index 3fb03315..00000000
--- a/docs/readthedocs/source/doc/Nano/Overview/support.md
+++ /dev/null
@@ -1,60 +0,0 @@
-# BigDL-Nano Features
-
-| Feature               | Meaning                                                                 |
-| --------------------- | ----------------------------------------------------------------------- |
-| **Intel-openmp**      | Use Intel-openmp library to improve performance of multithread programs |
-| **Jemalloc**          | Use jemalloc as allocator                                               |
-| **Tcmalloc**          | Use tcmalloc as allocator                                               |
-| **Neural-Compressor** | Neural-Compressor int8 quantization                                     |
-| **OpenVINO**          | OpenVINO fp32/bf16/fp16/int8 acceleration on CPU/GPU/VPU                |
-| **ONNXRuntime**       | ONNXRuntime fp32/int8 acceleration                                      |
-| **CUDA patch**        | Run CUDA code even without GPU                                          |
-| **JIT**               | PyTorch JIT optimization                                                |
-| **Channel last**      | Channel last memory format                                              |
-| **BF16**              | BFloat16 mixed precision training and inference                         |
-| **IPEX**              | Intel-extension-for-pytorch optimization                                |
-| **Multi-instance**    | Multi-process training and inference                                    |
-| **ray**               | Use ray as multi-process backend                                        |
-
-## Common Feature Support (Can be used in both PyTorch and TensorFlow)
-
-| Feature               | Ubuntu (20.04/22.04) | CentOS7 | MacOS (Intel chip) | MacOS (M-series chip) | Windows |
-| --------------------- | -------------------- | ------- | ------------------ | --------------------- | ------- |
-| **Intel-openmp**      | ✅                    | ✅       | ✅                  | ②                     | ✅       |
-| **Jemalloc**          | ✅                    | ✅       | ✅                  | ❌                     | ❌       |
-| **Tcmalloc**          | ✅                    | ❌       | ❌                  | ❌                     | ❌       |
-| **Neural-Compressor** | ✅                    | ✅       | ❌                  | ❌                     | ?       |
-| **OpenVINO**          | ✅                    | ①       | ❌                  | ❌                     | ④       |
-| **ONNXRuntime**       | ✅                    | ①       | ✅                  | ❌                     | ✅       |
-| **ray**               | ✅                    | ?       | ?                  | ?                     | ④       |
-
-## PyTorch Feature Support
-
-| Feature            | Ubuntu (20.04/22.04) | CentOS7 | MacOS (Intel chip) | MacOS (M-series chip) | Windows |
-| ------------------ | -------------------- | ------- | ------------------ | --------------------- | ------- |
-| **CUDA patch**     | ✅                    | ✅       | ✅                  | ?                     | ✅       |
-| **JIT**            | ✅                    | ✅       | ✅                  | ?                     | ✅       |
-| **Channel last**   | ✅                    | ✅       | ✅                  | ?                     | ✅       |
-| **BF16**           | ✅                    | ✅       | ⭕                  | ⭕                     | ✅       |
-| **IPEX**           | ✅                    | ✅       | ❌                  | ❌                     | ❌       |
-| **Multi-instance** | ✅                    | ✅       | ②                  | ②                     | ②       |
-
-## TensorFlow Feature Support
-
-| Feature            | Ubuntu (20.04/22.04) | CentOS7 | MacOS (Intel chip) | MacOS (M-series chip) | Windows |
-| ------------------ | -------------------- | ------- | ------------------ | --------------------- | ------- |
-| **BF16**           | ✅                    | ✅       | ⭕                  | ⭕                     | ✅       |
-| **Multi-instance** | ③                    | ③       | ②③                 | ②③                    | ❌       |
-
-## Symbol Meaning
-
-| Symbol | Meaning                                                                                                  |
-| ------ | -------------------------------------------------------------------------------------------------------- |
-| ✅      | Supported                                                                                                |
-| ❌      | Not supported                                                                                            |
-| ⭕      | All Mac machines (Intel/M-series chip) do not support bf16 instruction set, so this feature is pointless |
-| ①      | This feature is only supported when used together with jemalloc                                          |
-| ②      | This feature is supported but without any performance guarantee                                          |
-| ③      | Only Multi-instance training is supported for now                                                        |
-| ④      | This feature is only supported when using PyTorch                                                        |
-| ?      | Not tested                                                                                               |
diff --git a/docs/readthedocs/source/doc/Nano/Overview/tensorflow_inference.md b/docs/readthedocs/source/doc/Nano/Overview/tensorflow_inference.md
deleted file mode 100644
index 9ee4172a..00000000
--- a/docs/readthedocs/source/doc/Nano/Overview/tensorflow_inference.md
+++ /dev/null
@@ -1,215 +0,0 @@
-# TensorFlow Inference
-
-BigDL-Nano provides several APIs which can help users easily apply optimizations on inference pipelines to improve latency and throughput. Currently, performance accelerations are achieved by integrating extra runtimes as inference backend engines or using quantization methods on full-precision trained models to reduce computation during inference. InferenceOptimizer(`bigdl.nano.tf.keras.InferenceOptimizer`) provides the APIs for all optimizations that you need for inference.
-
-
-## Automatically Choose the Best Optimization
-
-We recommend you to use `InferenceOptimizer.optimize` to compare different optimization methods and choose the best one.
-
-Taking MobileNetV2 as an example, you can use runtime acceleration as below:
-
-```python
-import tensorflow as tf
-from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2
-import numpy as np
-from bigdl.nano.tf.keras import InferenceOptimizer
-
-# step 1: create your model
-model = MobileNetV2(weights=None, input_shape=[40, 40, 3], classes=10)
-
-# step 2: prepare your data and dataset
-train_examples = np.random.random((100, 40, 40, 3))
-train_labels = np.random.randint(0, 10, size=(100,))
-train_dataset = tf.data.Dataset.from_tensor_slices((train_examples, train_labels))
-
-# (Optional) step 3: Something else, like training ...
-model.fit(train_dataset)
-
-# step 4: try all supproted optimizations
-opt = InferenceOptimizer()
-opt.optimize(model, x=train_dataset)
-
-# get the best optimization
-best_model, _option = opt.get_best_model()
-
-# use the quantized model as before
-y_hat = best_model(train_examples)
-best_model.predict(train_dataset)
-```
-
-`InferenceOptimizer.optimize` will try all supported optimizations and choose the best one. e.g. the output may like this
-
-```
-==========================Optimization Results==========================
- -------------------------------- ---------------------- --------------
-|             method             |        status        | latency(ms)  |
- -------------------------------- ---------------------- --------------
-|            original            |      successful      |    82.109    |
-|              int8              |      successful      |    4.398     |
-|         openvino_fp32          |      successful      |    3.847     |
-|         openvino_int8          |      successful      |    2.177     |
-|        onnxruntime_fp32        |      successful      |     3.28     |
-|    onnxruntime_int8_qlinear    |      successful      |    3.071     |
-|    onnxruntime_int8_integer    |   fail to convert    |     None     |
- -------------------------------- ---------------------- --------------
-```
-
-```eval_rst
-.. tip::
-    It also uses parameter ``x`` and ``y`` to receive calibration data like ``InferenceOptimizer.quantize``.
-
-    There are some other useful parameters
-
-    - ``includes``: A str list. If set, ``optimize`` will only try optimizations in this parameter.
-    - ``excludes``: A str list. If set, ``optimize`` will try all optimizations (or optimizations specified by ``includes``) except for those in this parameter.
-
-    See its `API document <../../PythonAPI/Nano/tensorflow.html#bigdl.nano.tf.keras.InferenceOptimizer.optimize>`_ for more advanced usage.
-```
-
-Before you go ahead with these APIs, you have to make sure BigDL-Nano is correctly installed for TensorFlow. If not, please follow [this](./install.md) to set up your environment.
-
-```eval_rst
-.. note::
-    You can install all required dependencies by
-
-    ::
-
-        pip install bigdl-nano[tensorflow,inference]
-
-    This will install all dependencies required by BigDL-Nano TensorFlow inference.
-
-    Or if you just want to use one of supported optimizations:
-
-    - INC (Intel Neural Compressor): ``pip install neural-compressor``
-    - OpenVINO: ``pip install openvino-dev``
-    - ONNXRuntime: ``pip install onnx onnxruntime onnxruntime-extensions tf2onnx neural-compressor``
-
-    We recommand installing all dependencies by ``pip install bigdl-nano[tensorflow,inference]``, because you may run into version issues if you install dependencies manually.
-```
-
-## Manually Chose Optimizations
-
-### Runtime Acceleration
-
-For runtime acceleration, BigDL-Nano has enabled two kinds of runtime (OpenVINO and ONNXRuntime) for users in `InferenceOptimizer.trace()`.
-
-```eval_rst
-.. warning::
-    ``model.trace`` will be deprecated in future release.
-
-    Please use ``bigdl.nano.tf.keras.InferenceOptimizer.trace`` instead.
-```
-
-All available runtime accelerations are integrated in `InferenceOptimizer.trace(accelerator='onnxruntime'/'openvino')` with different accelerator values.
-
-Taking the example in [Automatically Choose the Best Optimization](#automatically-choose-the-best-optimization), you can use runtime acceleration as following:
-
-```python
-# execute quantization using `OpenVINO` acceleration
-traced_model = InferenceOptimizer.trace(model, accelerator="openvino")
-# execute quantization using `ONNXRuntime` acceleration
-traced_model = InferenceOptimizer.trace(model, accelerator="onnxruntime")
-
-# run simple prediction
-y_hat = traced_model(train_examples)
-
-# predict also support acceleration
-traced_model.predict(train_dataset)
-```
-
-### Quantization
-
-Quantization is widely used to compress models to a lower precision, which not only reduces the model size but also accelerates inference. BigDL-Nano provides `InferenceOptimizer.quantize()` API for users to quickly obtain a quantized model with accuracy control by specifying a few arguments.
-
-BigDL-Nano currently provides only post-training quantization in `InferenceOptimizer.quantize()` for users to infer with models of 8-bit precision. Quantization-Aware Training is not available for now. Model conversion to 16-bit like BF16 and FP16 is coming soon.
-
-```eval_rst
-.. warning::
-    ``model.quantize`` will be deprecated in future release.
-
-    Please use ``bigdl.nano.tf.keras.InferenceOptimizer.quantize`` instead.
-```
-
-To use INC as your quantization engine, you can choose `accelerator=None/'onnxruntime'`. Otherwise, `accelerator='openvino'` means using OpenVINO POT (Post-training Optimization) to do quantization.
-
-#### Quantization without Accuracy Control
-
-Taking the example in [Runtime Acceleration](#runtime-acceleration), you can use quantization as following:
-
-```python
-# use Intel Neural Compressor quantization
-q_model = InferenceOptimizer.quantize(model, x=train_dataset)
-
-# or use ONNXRuntime quantization
-q_model = InferenceOptimizer.quantize(model, x=train_dataset, accelerator="onnxruntime")
-
-# or use OpenVINO quantization
-q_model = InferenceOptimizer.quantize(model, x=train_dataset, accelerator="openvino")
-
-# you can also use features and labels instead of dataset for quantization
-q_model = InferenceOptimizer.quantize(model, x=train_examples, y=train_labels)
-
-# you can use quantized model as before
-y_hat = q_model(train_examples)
-q_model.predict(train_dataset)
-```
-
-This is a most basic usage to quantize a model with defaults, INT8 precision, and without search tuning space to control accuracy drop.
-
-```eval_rst
-.. note::
-    Now BigDL-Nano only support static quantization, which needs training data to do calibration. Parameter `x` and `y` are used to receive calibration data.
-
-    - ``x``: Input data which is used for training. It could be
-
-      - A Numpy array (or array-like), or a list of arrays (in case the model has multiple inputs).
-      - A TensorFlow tensor, or a list of tensors (in case the model has multiple inputs).
-      - An unbatched ``tf.data.Dataset``. Should return tuples of (inputs, targets). (In this case there is no need to pass the parameter ``y``)
-
-    - ``y``: Target data. Like the input data ``x``, it could be either Numpy array(s) or TensorFlow tensor(s). Its length should be consistent with ``x``. If ``x`` is a ``Dataset``, ``y`` will be ignored (since targets will be obtained from ``x``)
-```
-
-#### Quantization with Accuracy Control
-
-By default, `InferenceOptimizer.quantize()` doesn't search the tuning space and returns the fully-quantized model without considering the accuracy drop. If you need to search quantization tuning space for a model with accuracy control, you may need to specify a few parameters.
-
-Following parameters can help you tune the results for both INC and POT quantization:
-
-- `metric`:  A `tensorflow.keras.metrics.Metric` object for evaluation.
-- `accuracy_criterion`: A dictionary to specify the acceptable accuracy drop, e.g. `{'relative': 0.01, 'higher_is_better': True}`
-
-    - `relative` / `absolute`: Drop type, the accuracy drop should be relative or absolute to baseline
-    - `higher_is_better`: Indicate if a larger value of metric means better accuracy
-
-- `max_trials`: Maximum trails on the search, if the algorithm can't find a satisfying model, it will exit and raise the error.
-- `batch`: Specify the batch size of the dataset. This will only take effect on evaluation. If it's not set, then we use `batch=1` for evaluation.
-
-**Accuracy Control with INC**
-
-There are a few arguments that only take effect when using INC.
-- `tuning_strategy` (optional): it specifies the algorithm to search the tuning space. In most cases, you don't need to change it.
-- `timeout`: Timeout of your tuning. Default: 0, means endless time for tuning.
-- `inputs`:      A list of input names. Default: None, automatically get names from the graph.
-- `outputs`:     A list of output names. Default: None, automatically get names from the graph.
-Here is an example to use INC with accuracy control as below. It will search for a model within 1% accuracy drop with 10 trials.
-
-Here is an example to use INC with accuracy control as below. It will search for a model within 1% accuracy drop with 10 trials.
-```python
-from torchmetrics.classification import MulticlassAccuracy
-
-q_model = InferenceOptimizer.quantize(model,
-                                      x=train_dataset,
-                                      accelerator=None,
-                                      metric=MulticlassAccuracy(num_classes=10),
-                                      accuracy_criterion={'relative': 0.01,
-                                                          'higher_is_better': True},
-                                      approach='static',
-                                      tuning_strategy='bayesian',
-                                      timeout=0,
-                                      max_trials=10)
-
-# use the quantized model as before
-y_hat = q_model(train_examples)
-q_model.predict(train_dataset)
-```
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Overview/tensorflow_train.md b/docs/readthedocs/source/doc/Nano/Overview/tensorflow_train.md
deleted file mode 100644
index 24cd58e4..00000000
--- a/docs/readthedocs/source/doc/Nano/Overview/tensorflow_train.md
+++ /dev/null
@@ -1,90 +0,0 @@
-# TensorFlow Training
-
-BigDL-Nano can be used to accelerate TensorFlow Keras applications on training workloads. The optimizations in BigDL-Nano are delivered through
-
-- BigDL-Nano's `Model` and `Sequential` classes, which have identical APIs with `tf.keras.Model` and `tf.keras.Sequential` with an enhanced `fit` method.
-- BigDL-Nano's decorator `nano` (potentially with the help of `nano_multiprocessing` and `nano_multiprocessing_loss`) to handle keras model with customized training loop.
-
-We will briefly describe here the major features in BigDL-Nano for TensorFlow training.
-
-### Best Known Configurations
-When you install BigDL-Nano by `pip install bigdl-nano[tensorflow]`, `intel-tensorflow` will be installed in your environment, which has intel's oneDNN optimizations enabled by default; and when you run `source bigdl-nano-init`, it will export a few environment variables, such as `OMP_NUM_THREADS` and `KMP_AFFINITY`, according to your current hardware. Empirically, these environment variables work best for most TensorFlow applications. After setting these environment variables, you can just run your applications as usual (`python app.py`) and no additional changes are required.
-
-### Multi-Instance Training
-When training on a server with dozens of CPU cores, it is often beneficial to use multiple training instances in a data-parallel fashion to make full use of the CPU cores. However
-
-- Naively using TensorFlow's `MultiWorkerMirroredStrategy` can cause conflict in CPU cores and often cannot provide performance benefits.
-- Customized training loop could be hard to use together with `MultiWorkerMirroredStrategy`
-
-BigDL-Nano makes it very easy to conduct multi-instance training correctly for default/customized training loop models.
-
-#### Keras Model with default training loop
- You can just set the `num_processes` parameter in the `fit` method in your `Model` or `Sequential` object and BigDL-Nano will launch the specific number of processes to perform data-parallel training. Each process will be automatically pinned to a different subset of CPU cores to avoid conflict and maximize training throughput.
-
-```python
-import tensorflow as tf
-from tensorflow.keras import layers
-from bigdl.nano.tf.keras import Sequential
-
-model = Sequential([
-    layers.Rescaling(1. / 255, input_shape=(img_height, img_width, 3)),
-    layers.Conv2D(16, 3, padding='same', activation='relu'),
-    layers.MaxPooling2D(),
-    layers.Conv2D(32, 3, padding='same', activation='relu'),
-    layers.MaxPooling2D(),
-    layers.Conv2D(64, 3, padding='same', activation='relu'),
-    layers.MaxPooling2D(),
-    layers.Flatten(),
-    layers.Dense(128, activation='relu'),
-    layers.Dense(num_classes)
-])
-
-model.compile(optimizer='adam',
-              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
-              metrics=['accuracy'])
-
-model.fit(train_ds, epochs=3, validation_data=val_ds, num_processes=2)
-```
-
-#### Keras Model with customized training loop
-
-To make them run in a multi-process way, you may only add 2 lines of code.
-
-- add `nano_multiprocessing` to the `train_step` function with gradient calculation and applying process.
-- add `@nano(num_processes=...)` to the training loop function with iteration over full dataset.
-
-```python
-from bigdl.nano.tf.keras import nano_multiprocessing, nano
-import tensorflow as tf
-
-tf.random.set_seed(0)
-global_batch_size = 32
-
-model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(1,))])
-optimizer = tf.keras.optimizers.SGD()
-loss_object = tf.keras.losses.BinaryCrossentropy(from_logits=True)
-
-dataset = tf.data.Dataset.from_tensors(([1.], [1.])).repeat(128).batch(
-    global_batch_size)
-
-@nano_multiprocessing  # <-- Just remove this line to run on 1 process
-@tf.function
-def train_step(inputs, model, loss_object, optimizer):
-    features, labels = inputs
-    with tf.GradientTape() as tape:
-        predictions = model(features, training=True)
-        loss = loss_object(labels, predictions)
-    gradients = tape.gradient(loss, model.trainable_variables)
-    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
-    return loss
-
-@nano(num_processes=4)  # <-- Just remove this line to run on 1 process
-def train_whole_data(model, dataset, loss_object, optimizer, train_step):
-    for inputs in dataset:
-        print(train_step(inputs, model, loss_object, optimizer))
-```
-
-
-Note that, different from the conventions in [BigDL-Nano PyTorch multi-instance training](./pytorch_train.html#multi-instance-training), the effective batch size will not change in TensorFlow multi-instance training, which means it is still the batch size you specify in your dataset. This is because TensorFlow's `MultiWorkerMirroredStrategy` will try to split the batch into multiple sub-batches for different workers. We chose this behavior to match the semantics of TensorFlow distributed training.
-
-When you do want to increase your effective `batch_size`, you can do so by directly changing it in your dataset definition and you may also want to gradually increase the learning rate linearly to the `batch_size`, as described in this [paper](https://arxiv.org/abs/1706.02677) published by Facebook.
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Overview/troubshooting.md b/docs/readthedocs/source/doc/Nano/Overview/troubshooting.md
deleted file mode 100644
index 7f5a5edb..00000000
--- a/docs/readthedocs/source/doc/Nano/Overview/troubshooting.md
+++ /dev/null
@@ -1,77 +0,0 @@
-# Troubleshooting Guide for BigDL-Nano
-
-Refer to this section for common issues faced while using BigDL-Nano.
-
-## Installation
-### Why I fail to install openvino-dev==2022.2 when ``pip install bigdl-nano[inference]``?
-
-Please check your system first as openvino-dev 2022.2 does not support centos. Refer [this](https://pypi.org/project/openvino-dev/) for more details. You can install bigdl-nano[inference] >= 2.2 instead, as bigdl-nano[inference] >= 2.2 use openvino-dev >= 2022.3 which supports centos again.
-
-### How to solve SYNK issue caused by onnx when ``pip install bigdl-nano[inference]``?
-
-We are trying to solve this issue by upgrading onnx to 1.13. But there exists conflict between onnx 1.13 and other dependencies (such as intel-tensorflow and pytorch-lightning), mainly because of protobuf version. If you are concerned about security of onnx, we recommend to ``pip install onnx==1.13`` after ``pip install bigdl-nano[inference]``.
-
-## Inference
-
-### ``could not create a primitive descriptor iterator`` when using bf16 related methods
-
-Please make sure you use context manager provided by ``InferenceOptimizer.get_context``, you can refer this [howto guide for context manager]() for more details.
-
-### ``assert precision in list(self.cur_config['ops'].keys())`` when using ipex quantization with inc on machine with BF16 instruction set
-
-It's known issue for [Intel® Neural Compressor](https://github.com/intel/neural-compressor) that they don't deal with BF16 op well at version 1.13.1 . This has been fixed in version 2.0. You can install bigdl-nano[inference] >= 2.2 to fix this problem.
-
-### Why my output is not bf16 dtype when using bf16+ipex related methods?
-
-Please check your torch version and ipex version first. Now we only have CI/CD for ipex>=1.12, and we can't guarantee 100% normal operation when the version is lower than this.
-
-### ``TypeError: cannot serialize xxx object`` when ``InferenceOptimizer.optimize()`` calling all ipex related methods or when ``InferenceOptimizer.trace(use_ipex=True)`` / ``InferenceOptimizer.quantize(use_ipex=True)``
-
-In ``InferenceOptimizer.optimize()``, actually we use ``ipex.optimize(model, dtype=torch.bfloat16, inplace=False)`` to make sure not change original model. If your model can't be deepcopy, you should use ``InferenceOptimizer.trace(use_ipex=True, xxx, inplace=True)`` or ``InferenceOptimizer.quatize(use_ipex=True, xxx, inplace=True)`` instead and make sure setting ``inplace=True``.
-
-### error message like ``set strict=False`` when ``InferenceOptimizer.trace(accelerator='jit')`` or ``InferenceOptimizer.quantize(accelerator='jit')``
-
-You can set ``strict=False`` for ``torch.jit.trace`` by setting ``jit_strict=False`` in ``InferenceOptimizer.trace(accelerator='jit', xxx, jit_strict=False)`` or ``InferenceOptimizer.quantize(accelerator='jit', xxx, jit_strict=False)``. 
-Refer [API usage of torch.jit.trace](https://pytorch.org/docs/stable/generated/torch.jit.trace.html#torch.jit.trace) for more details.
-
-
-### Why ``channels_last`` option fails for my computer vision model?
-
-Please check the shape of your input data first, we don't support ``channels_last`` for 3D input now. If your model is a 3D model or your input data is not 4D Tensor, normally ``channels_last`` option will fail.
-
-### Why ``InferenceOptimizer.load(dir)`` fails to load my model saved by ``InferenceOptimizer.save(model, dir)``
-
-If you accelerate the model with ``accelerator=None`` by ``InferenceOptimizer.trace``/``InferenceOptimizer.quantize`` or it's just a normal torch.nn.Module, you have to pass original FP32 model to load pytorch model by ``InferenceOptimizer.load(dir, model=model)``.
-
-### Why my bf16 model is slower than fp32 model?
-
-You can first check whether your machine supports the bf16 instruction set first by ``lscpu | grep "bf16"``. If there is no ``avx512_bf16`` or ``amx_bf16`` in the output, then, without instruction set support, the performance of bf16 cannot be guaranteed, and generally, its performance will deteriorate.
-
-### ``INVALID_ARGUMENT : Got invalid dimensions for input`` or ``[ PARAMETER_MISMATCH ] Can not clone with new dims.`` when do inference with OpenVINO / ONNXRuntime accelerated model
-
-This error usually occurs when your dataset produces data with dynamic shape, and such case needs you to manually set ``dynamic_axes`` parameter and pass ``dynamic_axes`` to ``trace/quantize``. 
-
-For examples, if your forward function looks like ``def forward(x: torch.Tensor):``, and it recieves 4d Tensor as input. However, your input data's shape will vary during inference, it will be (1, 3, 224, 224) or (1, 3, 256, 256), then in such case, you should:
-```
-dynamic_axes['x'] = {0: 'batch_size', 2: 'width', 3: 'height'}  # this means the 0/2/3 dim of your input data may vary during inference
-input_sample = torch.randn(1, 3, 224, 224)
-acce_model = trace(model=model, input_sample=x, dynamic_axes=dynamic_axes)
-```
-
-You can refer to [API usage of torch.onnx.export](https://pytorch.org/docs/stable/onnx.html#functions) for more details.
-
-### Why jit didn't work on my model?
-
-Please check first if you use `patch_cuda(disable_jit=True)` command of Nano, if you have used it to disable cuda operation, it will disable jit at the same time by `torch.jit._state.disable()`, so jit has no effect now.
-
-### How to cope with out-of-memory during workload with Intel® Extension for PyTorch*
-
-If you found the workload runs with Intel® Extension for PyTorch* occupies a remarkably large amount of memory, you can try to reduce the occupied memory size by setting `weights_prepack=False` when calling `InferenceOptimizer.trace` \ `InferenceOptimizer.quantize`.
-
-### RuntimeError: Check 'false' failed at src/frontends/common/src/frontend.cpp
-
-You may see this error when you do inference with accelerator=`OpenVINO` in keras. It only occurs when you use `intel-tensorflow` >= 2.8 and you forget `source bigdl-nano-init`. The way to solve this problem is just `source bigdl-nano-init` or `source bigdl-nano-init -j`.
-
-### TypeError: deprecated() got an unexpected keyword argument 'name'
-
-If a version problem caused by too low cryptography version. You can fix it by just `pip install cryptography==38.0.0` .
diff --git a/docs/readthedocs/source/doc/Nano/Overview/userguide.rst b/docs/readthedocs/source/doc/Nano/Overview/userguide.rst
deleted file mode 100644
index e69de29b..00000000
diff --git a/docs/readthedocs/source/doc/Nano/QuickStart/index.md b/docs/readthedocs/source/doc/Nano/QuickStart/index.md
deleted file mode 100644
index d3cd9e0f..00000000
--- a/docs/readthedocs/source/doc/Nano/QuickStart/index.md
+++ /dev/null
@@ -1,115 +0,0 @@
-# Nano Tutorial
-- [**BigDL-Nano PyTorch Trainer Quickstart**](./pytorch_train_quickstart.html)
-
-    > ![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub][Nano_pytorch_training]
-
-    In this guide we will describe how to scale out PyTorch programs using Nano Trainer
-
----------------------------
-
-- [**BigDL-Nano PyTorch TorchNano Quickstart**](./pytorch_nano.html)
-
-    > ![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub][Nano_pytorch_nano]
-
-    In this guide we will describe how to use BigDL-Nano to accelerate custom training loop easily with very few changes
-
----------------------------
-
-- [**BigDL-Nano TensorFlow Training Quickstart**](./tensorflow_train_quickstart.html)
-
-    > ![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub][Nano_tensorflow_training]
-
-    In this guide we will describe how to accelerate TensorFlow Keras applications on training workloads with BigDL-Nano
-
----------------------------
-- [**BigDL-Nano PyTorch ONNXRuntime Acceleration Quickstart**](./pytorch_onnxruntime.html)
-
-    > ![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub][Nano_pytorch_onnxruntime]
-
-    In this guide we will describe how to apply ONNXRuntime Acceleration on inference pipeline with the APIs delivered by BigDL-Nano
-
----------------------------
-
-- [**BigDL-Nano PyTorch OpenVINO Acceleration Quickstart**](./pytorch_openvino.html)
-
-    > ![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub][Nano_pytorch_openvino]
-
-    In this guide we will describe how to apply OpenVINO Acceleration on inference pipeline with the APIs delivered by BigDL-Nano
-
----------------------------
-
-- [**BigDL-Nano PyTorch Quantization with INC Quickstart**](./pytorch_quantization_inc.html)
-
-    > ![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub][Nano_pytorch_Quantization_inc]
-
-    In this guide we will describe how to obtain a quantized model with the APIs delivered by BigDL-Nano
-
----------------------------
-
-- [**BigDL-Nano PyTorch Quantization with ONNXRuntime accelerator Quickstart**](./pytorch_quantization_inc_onnx.html)
-
-    > ![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub][Nano_pytorch_quantization_inc_onnx]
-
-    In this guide we will describe how to obtain a quantized model running inference in the ONNXRuntime engine with the APIs delivered by BigDL-Nano
-
----------------------------
-
-- [**BigDL-Nano PyTorch Quantization with POT Quickstart**](./pytorch_quantization_openvino.html)
-
-    > ![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub][Nano_pytorch_quantization_openvino]
-
-    In this guide we will describe how to obtain a quantized model with the APIs delivered by BigDL-Nano
-
-
----------------------------
-
-- [**BigDL-Nano TensorFlow Quantization with INC Quickstart**](./tensorflow_quantization_quickstart.html)
-    > ![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub][Nano_tensorflow_quantization_inc]
-
-    In this guide we will demonstrates how to apply Post-training quantization on a keras model with BigDL-Nano.
-
-
----------------------------
-
-- [**BigDL-Nano TensorFlow SparseEmbedding and SparseAdam**](./tensorflow_embedding.html)
-
-    > ![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub][Nano_tensorflow_embedding]
-
-    In this guide we demonstrates how to use SparseEmbedding and SparseAdam to obtain stroger performance with sparse gradient
-
-
--------------------------
-
-
-- [**BigDL-Nano Hyperparameter Tuning (Tensorflow Sequential/Functional API) Quickstart**](../Tutorials/seq_and_func.html)
-
-
-    > ![](../../../../image/colab_logo_32px.png)[Run in Google Colab][Nano_hpo_tf_seq_func_colab] &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub][Nano_hpo_tf_seq_func]
-
-
-    In this guide we will describe how to use Nano's built-in HPO utils to do hyperparameter tuning for models defined using Tensorflow Sequential or Functional API.
-
-
----------------------------
-
-- [**BigDL-Nano Hyperparameter Tuning (Tensorflow Subclassing Model) Quickstart**](../Tutorials/custom.html)
-
-    > ![](../../../../image/colab_logo_32px.png)[Run in Google Colab][Nano_hpo_tf_subclassing_colab] &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub][Nano_hpo_tf_subclassing]
-
-    In this guide we will describe how to use Nano's built-in HPO utils to do hyperparameter tuning for models defined by subclassing tf.keras.Model.
-
-
-[Nano_pytorch_training]: <https://github.com/intel-analytics/BigDL/blob/main/python/nano/notebooks/pytorch/tutorial/pytorch_train.ipynb>
-[Nano_pytorch_nano]: <https://github.com/intel-analytics/BigDL/blob/main/python/nano/notebooks/pytorch/tutorial/pytorch_nano.ipynb>
-[Nano_tensorflow_training]: <https://github.com/intel-analytics/BigDL/blob/main/python/nano/notebooks/tensorflow/tutorial/tensorflow_fit.ipynb>
-[Nano_pytorch_onnxruntime]: <https://github.com/intel-analytics/BigDL/blob/main/python/nano/notebooks/pytorch/tutorial/pytorch_inference_onnx.ipynb>
-[Nano_pytorch_openvino]: <https://github.com/intel-analytics/BigDL/blob/main/python/nano/notebooks/pytorch/tutorial/pytorch_inference_openvino.ipynb>
-[Nano_pytorch_Quantization_inc]: <https://github.com/intel-analytics/BigDL/blob/main/python/nano/notebooks/pytorch/tutorial/pytorch_quantization_inc.ipynb>
-[Nano_pytorch_quantization_inc_onnx]: <https://github.com/intel-analytics/BigDL/blob/main/python/nano/notebooks/pytorch/tutorial/pytorch_quantization_inc.ipynb>
-[Nano_pytorch_quantization_openvino]: <https://github.com/intel-analytics/BigDL/blob/main/python/nano/notebooks/pytorch/tutorial/pytorch_quantization_openvino.ipynb>
-[Nano_tensorflow_quantization_inc]: <https://github.com/intel-analytics/BigDL/blob/main/python/nano/notebooks/tensorflow/tutorial/tensorflow_quantization.ipynb>
-[Nano_tensorflow_embedding]: <https://github.com/intel-analytics/BigDL/blob/main/python/nano/notebooks/tensorflow/tutorial/tensorflow_embedding.ipynb>
-[Nano_hpo_tf_seq_func]: <https://github.com/intel-analytics/BigDL/blob/main/python/nano/notebooks/hpo/seq_and_func.ipynb>
-[Nano_hpo_tf_seq_func_colab]: <https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/nano/notebooks/hpo/seq_and_func.ipynb>
-[Nano_hpo_tf_subclassing]: <https://github.com/intel-analytics/BigDL/blob/main/python/nano/notebooks/hpo/custom.ipynb>
-[Nano_hpo_tf_subclassing_colab]: <https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/nano/notebooks/hpo/custom.ipynb>
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_nano.md b/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_nano.md
deleted file mode 100644
index 2c4feded..00000000
--- a/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_nano.md
+++ /dev/null
@@ -1,174 +0,0 @@
-# BigDL-Nano Pytorch TorchNano Quickstart
-
-**In this guide we'll demonstrate how to use BigDL-Nano to accelerate custom train loop easily with very few changes.**
-
-### Step 0: Prepare Environment
-
-We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../../UserGuide/python.md) for more details.
-
-```bash
-conda create py37 python==3.7.10 setuptools==58.0.4
-conda activate py37
-# nightly bulit version
-pip install --pre --upgrade bigdl-nano[pytorch]
-# set env variables for your conda environment
-source bigdl-nano-init
-```
-
-### Step 1: Load the Data
-
-Import Cifar10 dataset from torch_vision and modify the train transform. You could access [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) for a view of the whole dataset.
-
-Leveraging OpenCV and libjpeg-turbo, BigDL-Nano can accelerate computer vision data pipelines by providing a drop-in replacement of torch_vision's `datasets` and `transforms`.
-
-```python
-from torch.utils.data import DataLoader, Subset
-
-from bigdl.nano.pytorch.vision import transforms
-from bigdl.nano.pytorch.vision.datasets import CIFAR10
-
-def create_dataloader(data_path, batch_size):
-    train_transform = transforms.Compose([
-        transforms.Resize(256),
-        transforms.ColorJitter(),
-        transforms.RandomCrop(224),
-        transforms.RandomHorizontalFlip(),
-        transforms.Resize(128),
-        transforms.ToTensor()
-    ])
-
-    full_dataset = CIFAR10(root=data_path, train=True,
-                           download=True, transform=train_transform)
-
-    # use a subset of full dataset to shorten the training time
-    train_dataset = Subset(dataset=full_dataset, indices=list(range(len(full_dataset) // 40)))
-
-    train_loader = DataLoader(train_dataset, batch_size=batch_size,
-                              shuffle=True, num_workers=0)
-
-    return train_loader
-```
-
-### Step 2: Define the Model
-
-You may define your model in the same way as the standard PyTorch models.
-
-```python
-from torch import nn
-
-from bigdl.nano.pytorch.vision.models import vision
-
-class ResNet18(nn.Module):
-    def __init__(self, num_classes, pretrained=True, include_top=False, freeze=True):
-        super().__init__()
-        backbone = vision.resnet18(pretrained=pretrained, include_top=include_top, freeze=freeze)
-        output_size = backbone.get_output_size()
-        head = nn.Linear(output_size, num_classes)
-        self.model = nn.Sequential(backbone, head)
-
-    def forward(self, x):
-        return self.model(x)
-```
-
-### Step 3: Define Train Loop
-
-Suppose the custom train loop is as follows:
-
-```python
-import os
-import torch
-
-data_path = os.environ.get("DATA_PATH", ".")
-batch_size = 256
-max_epochs = 10
-lr = 0.01
-
-model = ResNet18(10, pretrained=False, include_top=False, freeze=True)
-loss_func = nn.CrossEntropyLoss()
-optimizer = torch.optim.Adam(model.parameters(), lr=lr)
-train_loader = create_dataloader(data_path, batch_size)
-
-model.train()
-
-for _i in range(max_epochs):
-    total_loss, num = 0, 0
-    for X, y in train_loader:
-        optimizer.zero_grad()
-        loss = loss_func(model(X), y)
-        loss.backward()
-        optimizer.step()
-        
-        total_loss += loss.sum()
-        num += 1
-    print(f'avg_loss: {total_loss / num}')
-```
-
-The `TorchNano` (`bigdl.nano.pytorch.TorchNano`) class is what we use to accelerate raw pytorch code. By using it, we only need to make very few changes to accelerate custom training loop.
-
-We only need the following steps:
-
-- define a class `MyNano` derived from our `TorchNano`
-- copy all lines of code into the `train` method of `MyNano`
-- add one line to setup model, optimizer and dataloader
-- replace the `loss.backward()` with `self.backward(loss)`
-
-```python
-import os
-import torch
-
-from bigdl.nano.pytorch import TorchNano
-
-class MyNano(TorchNano):
-    def train(self):
-        # copy all lines of code into this method
-        data_path = os.environ.get("DATA_PATH", ".")
-        batch_size = 256
-        max_epochs = 10
-        lr = 0.01
-
-        model = ResNet18(10, pretrained=False, include_top=False, freeze=True)
-        loss_func = nn.CrossEntropyLoss()
-        optimizer = torch.optim.Adam(model.parameters(), lr=lr)
-        train_loader = create_dataloader(data_path, batch_size)
-
-        # add this line to setup model, optimizer and dataloaders
-        model, optimizer, train_loader = self.setup(model, optimizer, train_loader)
-
-        model.train()
-
-        for _i in range(max_epochs):
-            total_loss, num = 0, 0
-            for X, y in train_loader:
-                optimizer.zero_grad()
-                loss = loss_func(model(X), y)
-                self.backward(loss)     # modify this line
-                optimizer.step()
-                
-                total_loss += loss.sum()
-                num += 1
-            print(f'avg_loss: {total_loss / num}')
-```
-
-### Step 4: Run with Nano TorchNano
-
-```python
-MyNano().train()
-```
-
-At this stage, you may already experience some speedup due to the optimized environment variables set by source bigdl-nano-init. Besides, you can also enable optimizations delivered by BigDL-Nano by setting a paramter or calling a method to accelerate PyTorch application on training workloads.
-
-#### Increase the number of processes in distributed training to accelerate training.
-
-```python
-MyNano(num_processes=2, distributed_backend="subprocess").train()
-```
-
-- Note: BigDL-Nano now support 'spawn', 'subprocess' and 'ray' backends for distributed training, but only the 'subprocess' backend can be used in interactive environment.
-
-#### Intel Extension for Pytorch (a.k.a. [IPEX](https://github.com/intel/intel-extension-for-pytorch))
-
-IPEX extends Pytorch with optimizations on intel hardware. BigDL-Nano also integrates IPEX into the `TorchNano`, you can turn on IPEX optimization by setting `use_ipex=True`.
-
-```python
-MyNano(use_ipex=True, num_processes=2, distributed_backend="subprocess").train()
-```
diff --git a/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_onnxruntime.md b/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_onnxruntime.md
deleted file mode 100644
index 3f7a6b52..00000000
--- a/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_onnxruntime.md
+++ /dev/null
@@ -1,92 +0,0 @@
-# BigDL-Nano PyTorch ONNXRuntime Acceleration Quickstart
-
-**In this guide we will describe how to apply ONNXRuntime Acceleration on inference pipeline with the APIs delivered by BigDL-Nano in 4 simple steps**
-
-### Step 0: Prepare Environment
-We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../../UserGuide/python.md) for more details.
-
-```bash
-conda create -n py37 python==3.7.10 setuptools==58.0.4
-conda activate py37
-# nightly bulit version
-pip install --pre --upgrade bigdl-nano[pytorch]
-# set env variables for your conda environment
-source bigdl-nano-init
-```
-
-Before you start with ONNXRuntime accelerator, you need to install some ONNX packages as follows to set up your environment with ONNXRuntime acceleration.
-```bash
-pip install onnx onnxruntime
-```
-### Step 1: Load the data
-```python
-import torch
-from torchvision.io import read_image
-from torchvision import transforms
-from torchvision.datasets import OxfordIIITPet
-from torch.utils.data.dataloader import DataLoader
-
-train_transform = transforms.Compose([transforms.Resize(256),
-                                      transforms.RandomCrop(224),
-                                      transforms.RandomHorizontalFlip(),
-                                      transforms.ColorJitter(brightness=.5, hue=.3),
-                                      transforms.ToTensor(),
-                                      transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
-val_transform = transforms.Compose([transforms.Resize([224, 224]), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
-# Apply data augmentation to the tarin_dataset
-train_dataset = OxfordIIITPet(root = ".", transform=train_transform)
-val_dataset = OxfordIIITPet(root=".", transform=val_transform)
-# obtain training indices that will be used for validation
-indices = torch.randperm(len(train_dataset))
-val_size = len(train_dataset) // 4
-train_dataset = torch.utils.data.Subset(train_dataset, indices[:-val_size])
-val_dataset = torch.utils.data.Subset(val_dataset, indices[-val_size:])
-# prepare data loaders
-train_dataloader = DataLoader(train_dataset, batch_size=32)
-```
-
-### Step 2: Prepare the Model
-```python
-import torch
-from torchvision.models import resnet18
-from bigdl.nano.pytorch import Trainer
-model_ft = resnet18(pretrained=True)
-num_ftrs = model_ft.fc.in_features
-
-# Here the size of each output sample is set to 37.
-model_ft.fc = torch.nn.Linear(num_ftrs, 37)
-loss_ft = torch.nn.CrossEntropyLoss()
-optimizer_ft = torch.optim.SGD(model_ft.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
-
-# Compile our model with loss function, optimizer.
-model = Trainer.compile(model_ft, loss_ft, optimizer_ft)
-trainer = Trainer(max_epochs=5)
-trainer.fit(model, train_dataloader=train_dataloader)
-
-# Inference/Prediction
-x = torch.stack([val_dataset[0][0], val_dataset[1][0]])
-model_ft.eval()
-y_hat = model_ft(x)
-y_hat.argmax(dim=1)
-```
-
-### Step 3: Apply ONNXRumtime Acceleration
-When you're ready, you can simply append the following part to enable your ONNXRuntime acceleration.
-```python
-# trace your model as an ONNXRuntime model
-# The argument `input_sample` is not required in the following cases:
-# you have run `trainer.fit` before trace
-# Model has `example_input_array` set
-# Model is a LightningModule with any dataloader attached.
-from bigdl.nano.pytorch import InferenceOptimizer
-ort_model = InferenceOptimizer.trace(model_ft, accelerator="onnxruntime", input_sample=torch.rand(1, 3, 224, 224))
-
-# The usage is almost the same with any PyTorch module
-y_hat = ort_model(x)
-y_hat.argmax(dim=1)
-```
-
-```eval_rst
-.. note:: 
-     ``ort_model`` is not trainable any more, so you cannot use it in ``fit`` such as ``trainer.fit(ort_model, dataloader)``.
-```
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_openvino.md b/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_openvino.md
deleted file mode 100644
index 894dd917..00000000
--- a/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_openvino.md
+++ /dev/null
@@ -1,89 +0,0 @@
-# BigDL-Nano PyTorch OpenVINO Acceleration Quickstart
-
-**In this guide we will describe how to apply OpenVINO Acceleration on inference pipeline with the APIs delivered by BigDL-Nano in 4 simple steps**
-
-### Step 0: Prepare Environment
-We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../../UserGuide/python.md) for more details.
-
-```bash
-conda create py37 python==3.7.10 setuptools==58.0.4
-conda activate py37
-# nightly bulit version
-pip install --pre --upgrade bigdl-nano[pytorch]
-# set env variables for your conda environment
-source bigdl-nano-init
-```
-
-To use OpenVINO acceleration, you have to install the OpenVINO toolkit:
-```bash
-pip install openvino-dev
-```
-
-### Step 1: Load the data
-```python
-import torch
-from torchvision.io import read_image
-from torchvision import transforms
-from torchvision.datasets import OxfordIIITPet
-from torch.utils.data.dataloader import DataLoader
-
-train_transform = transforms.Compose([transforms.Resize(256),
-                                      transforms.RandomCrop(224),
-                                      transforms.RandomHorizontalFlip(),
-                                      transforms.ColorJitter(brightness=.5, hue=.3),
-                                      transforms.ToTensor(),
-                                      transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
-val_transform = transforms.Compose([transforms.Resize([224, 224]), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
-# Apply data augmentation to the tarin_dataset
-train_dataset = OxfordIIITPet(root = ".", transform=train_transform, target_transform=transforms.Lambda(lambda label: torch.tensor(label, dtype=torch.long))) # Quantization using POT expect a tensor as label for now
-val_dataset = OxfordIIITPet(root=".", transform=val_transform)
-# obtain training indices that will be used for validation
-indices = torch.randperm(len(train_dataset))
-val_size = len(train_dataset) // 4
-train_dataset = torch.utils.data.Subset(train_dataset, indices[:-val_size])
-val_dataset = torch.utils.data.Subset(val_dataset, indices[-val_size:])
-# prepare data loaders
-train_dataloader = DataLoader(train_dataset, batch_size=32)
-```
-
-### Step 2: Prepare the Model
-```python
-import torch
-from torchvision.models import resnet18
-from bigdl.nano.pytorch import Trainer
-model_ft = resnet18(pretrained=True)
-num_ftrs = model_ft.fc.in_features
-
-# Here the size of each output sample is set to 37.
-model_ft.fc = torch.nn.Linear(num_ftrs, 37)
-loss_ft = torch.nn.CrossEntropyLoss()
-optimizer_ft = torch.optim.SGD(model_ft.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
-
-# Compile our model with loss function, optimizer.
-model = Trainer.compile(model_ft, loss_ft, optimizer_ft)
-trainer = Trainer(max_epochs=5)
-trainer.fit(model, train_dataloader=train_dataloader)
-
-# Inference/Prediction
-x = torch.stack([val_dataset[0][0], val_dataset[1][0]])
-model_ft.eval()
-y_hat = model_ft(x)
-y_hat.argmax(dim=1)
-```
-
-### Step 3: Apply OpenVINO Acceleration
-When you're ready, you can simply append the following part to enable your OpenVINO acceleration.
-```python
-# trace your model as an OpenVINO model
-# The argument `input_sample` is not required in the following cases:
-# you have run `trainer.fit` before trace
-# The Model has `example_input_array` set
-from bigdl.nano.pytorch import InferenceOptimizer
-ov_model = InferenceOptimizer.trace(model_ft, accelerator="openvino", input_sample=torch.rand(1, 3, 224, 224))
-
-# The usage is almost the same with any PyTorch module
-y_hat = ov_model(x)
-y_hat.argmax(dim=1)
-```
-- Note
-    The `ov_model` is not trainable any more, so you can't use like trainer.fit(ov_model, dataloader)
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_quantization_inc.md b/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_quantization_inc.md
deleted file mode 100644
index 289eaf34..00000000
--- a/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_quantization_inc.md
+++ /dev/null
@@ -1,89 +0,0 @@
-# BigDL-Nano PyTorch Quantization with INC Quickstart
-
-**In this guide we will describe how to obtain a quantized model with the APIs delivered by BigDL-Nano in 4 simple steps**
-
-### Step 0: Prepare Environment
-We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../../UserGuide/python.md) for more details.
-
-```bash
-conda create py37 python==3.7.10 setuptools==58.0.4
-conda activate py37
-# nightly bulit version
-pip install --pre --upgrade bigdl-nano[pytorch]
-# set env variables for your conda environment
-source bigdl-nano-init
-```
-
-By default, Intel Neural Compressor is not installed with BigDL-Nano. So if you determine to use it as your quantization backend, you'll need to install it first:
-```bash
-pip install neural-compressor==1.11
-```
-### Step 1: Load the data
-```python
-import torch
-from torchvision.io import read_image
-from torchvision import transforms
-from torchvision.datasets import OxfordIIITPet
-from torch.utils.data.dataloader import DataLoader
-
-train_transform = transforms.Compose([transforms.Resize(256),
-                                      transforms.RandomCrop(224),
-                                      transforms.RandomHorizontalFlip(),
-                                      transforms.ColorJitter(brightness=.5, hue=.3),
-                                      transforms.ToTensor(),
-                                      transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
-val_transform = transforms.Compose([transforms.Resize([224, 224]), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
-# Apply data augmentation to the tarin_dataset
-train_dataset = OxfordIIITPet(root = ".", transform=train_transform)
-val_dataset = OxfordIIITPet(root=".", transform=val_transform)
-
-# obtain training indices that will be used for validation
-indices = torch.randperm(len(train_dataset))
-val_size = len(train_dataset) // 4
-train_dataset = torch.utils.data.Subset(train_dataset, indices[:-val_size])
-val_dataset = torch.utils.data.Subset(val_dataset, indices[-val_size:])
-
-# prepare data loaders
-train_dataloader = DataLoader(train_dataset, batch_size=32)
-```
-
-### Step 2: Prepare the Model
-```python
-import torch
-from torchvision.models import resnet18
-from bigdl.nano.pytorch import Trainer
-from torchmetrics.classification import MulticlassAccuracy
-model_ft = resnet18(pretrained=True)
-num_ftrs = model_ft.fc.in_features
-
-# Here the size of each output sample is set to 37.
-model_ft.fc = torch.nn.Linear(num_ftrs, 37)
-loss_ft = torch.nn.CrossEntropyLoss()
-optimizer_ft = torch.optim.SGD(model_ft.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
-
-# Compile our model with loss function, optimizer.
-model = Trainer.compile(model_ft, loss_ft, optimizer_ft, metrics=[MulticlassAccuracy(num_classes=37)])
-trainer = Trainer(max_epochs=5)
-trainer.fit(model, train_dataloader=train_dataloader)
-
-# Inference/Prediction
-x = torch.stack([val_dataset[0][0], val_dataset[1][0]])
-model_ft.eval()
-y_hat = model_ft(x)
-y_hat.argmax(dim=1)
-```
-
-### Step 3: Quantization using Intel Neural Compressor
-Quantization is widely used to compress models to a lower precision, which not only reduces the model size but also accelerates inference. BigDL-Nano provides `InferenceOptimizer.quantize()` API for users to quickly obtain a quantized model with accuracy control by specifying a few arguments.
-
-Without extra accelerator, `InferenceOptimizer.quantize()` returns a pytorch module with desired precision and accuracy. You can add quantization as below:
-```python
-from bigdl.nano.pytorch import InferenceOptimizer
-from torchmetrics.classification import MulticlassAccuracy
-q_model = InferenceOptimizer.quantize(model, calib_data=train_dataloader, metric=MulticlassAccuracy(num_classes=37))
-
-# run simple prediction
-y_hat = q_model(x)
-y_hat.argmax(dim=1)
-```
-This is a most basic usage to quantize a model with defaults, INT8 precision, and without search tuning space to control accuracy drop. 
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_quantization_inc_onnx.md b/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_quantization_inc_onnx.md
deleted file mode 100644
index 51c2dc15..00000000
--- a/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_quantization_inc_onnx.md
+++ /dev/null
@@ -1,87 +0,0 @@
-# BigDL-Nano PyTorch Quantization with ONNXRuntime accelerator Quickstart
-
-**In this guide we will describe how to obtain a quantized model running inference in the ONNXRuntime engine with the APIs delivered by BigDL-Nano in 4 simple steps**
-
-### Step 0: Prepare Environment
-We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../../UserGuide/python.md) for more details.
-
-```bash
-conda create py37 python==3.7.10 setuptools==58.0.4
-conda activate py37
-# nightly bulit version
-pip install --pre --upgrade bigdl-nano[pytorch]
-# set env variables for your conda environment
-source bigdl-nano-init
-```
-
-To quantize model using ONNXRuntime as backend, it is required to install Intel Neural Compressor, onnxruntime-extensions as a dependency of INC and some onnx packages as below
-```python
-pip install neural-compressor==1.11
-pip install onnx onnxruntime onnxruntime-extensions
-```
-### Step 1: Load the data
-```python
-import torch
-from torchvision.io import read_image
-from torchvision import transforms
-from torchvision.datasets import OxfordIIITPet
-from torch.utils.data.dataloader import DataLoader
-
-train_transform = transforms.Compose([transforms.Resize(256),
-                                      transforms.RandomCrop(224),
-                                      transforms.RandomHorizontalFlip(),
-                                      transforms.ColorJitter(brightness=.5, hue=.3),
-                                      transforms.ToTensor(),
-                                      transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
-val_transform = transforms.Compose([transforms.Resize([224, 224]), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
-# Apply data augmentation to the tarin_dataset
-train_dataset = OxfordIIITPet(root = ".", transform=train_transform)
-val_dataset = OxfordIIITPet(root=".", transform=val_transform)
-# obtain training indices that will be used for validation
-indices = torch.randperm(len(train_dataset))
-val_size = len(train_dataset) // 4
-train_dataset = torch.utils.data.Subset(train_dataset, indices[:-val_size])
-val_dataset = torch.utils.data.Subset(val_dataset, indices[-val_size:])
-
-train_dataloader = DataLoader(train_dataset, batch_size=32)
-```
-
-### Step 2: Prepare your Model
-```python
-import torch
-from torchvision.models import resnet18
-from bigdl.nano.pytorch import Trainer
-from torchmetrics.classification import MulticlassAccuracy
-model_ft = resnet18(pretrained=True)
-num_ftrs = model_ft.fc.in_features
-
-# Here the size of each output sample is set to 37.
-model_ft.fc = torch.nn.Linear(num_ftrs, 37)
-loss_ft = torch.nn.CrossEntropyLoss()
-optimizer_ft = torch.optim.SGD(model_ft.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
-
-# Compile our model with loss function, optimizer.
-model = Trainer.compile(model_ft, loss_ft, optimizer_ft, metrics=[MulticlassAccuracy(num_classes=37)])
-trainer = Trainer(max_epochs=5)
-trainer.fit(model, train_dataloader=train_dataloader)
-
-# Inference/Prediction
-x = torch.stack([val_dataset[0][0], val_dataset[1][0]])
-model_ft.eval()
-y_hat = model_ft(x)
-y_hat.argmax(dim=1)
-```
-
-### Step 3: Quantization with ONNXRuntime accelerator
-With the ONNXRuntime accelerator, `InferenceOptimizer.quantize()` will return a model with compressed precision but running inference in the ONNXRuntime engine.
-
-you can add quantization as below:
-```python
-from bigdl.nano.pytorch import InferenceOptimizer
-from torchmetrics.classification import MulticlassAccuracy
-ort_q_model = InferenceOptimizer.quantize(model, accelerator='onnxruntime', calib_data=train_dataloader, metric=MulticlassAccuracy(num_classes=37))
-
-# run simple prediction
-y_hat = ort_q_model(x)
-y_hat.argmax(dim=1)
-```
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_quantization_openvino.md b/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_quantization_openvino.md
deleted file mode 100644
index d59accd5..00000000
--- a/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_quantization_openvino.md
+++ /dev/null
@@ -1,85 +0,0 @@
-# BigDL-Nano PyTorch Quantization with POT Quickstart
-
-**In this guide we will describe how to obtain a quantized model with the APIs delivered by BigDL-Nano in 4 simple steps**
-
-### Step 0: Prepare Environment
-We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../../UserGuide/python.md) for more details.
-
-```bash
-conda create py37 python==3.7.10 setuptools==58.0.4
-conda activate py37
-# nightly bulit version
-pip install --pre --upgrade bigdl-nano[pytorch]
-# set env variables for your conda environment
-source bigdl-nano-init
-```
-
-The POT(Post-training Optimization Tools) is provided by OpenVINO toolkit. To use POT, you need to install OpenVINO
-```python
-pip install openvino-dev
-```
-
-### Step 1: Load the data
-```python
-import torch
-from torchvision.io import read_image
-from torchvision import transforms
-from torchvision.datasets import OxfordIIITPet
-from torch.utils.data.dataloader import DataLoader
-
-train_transform = transforms.Compose([transforms.Resize(256),
-                                      transforms.RandomCrop(224),
-                                      transforms.RandomHorizontalFlip(),
-                                      transforms.ColorJitter(brightness=.5, hue=.3),
-                                      transforms.ToTensor(),
-                                      transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
-val_transform = transforms.Compose([transforms.Resize([224, 224]), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
-# Apply data augmentation to the tarin_dataset
-train_dataset = OxfordIIITPet(root = ".",
-                              transform=train_transform,
-                              target_transform=transforms.Lambda(lambda label: torch.tensor(label, dtype=torch.long)))   # Quantization using POT expect a tensor as label
-val_dataset = OxfordIIITPet(root=".", transform=val_transform)
-# obtain training indices that will be used for validation
-indices = torch.randperm(len(train_dataset))
-val_size = len(train_dataset) // 4
-train_dataset = torch.utils.data.Subset(train_dataset, indices[:-val_size])
-val_dataset = torch.utils.data.Subset(val_dataset, indices[-val_size:])
-# prepare data loaders
-train_dataloader = DataLoader(train_dataset, batch_size=32)
-```
-
-### Step 2: Prepare the Model
-```python
-import torch
-from torchvision.models import resnet18
-from bigdl.nano.pytorch import Trainer
-model_ft = resnet18(pretrained=True)
-num_ftrs = model_ft.fc.in_features
-
-# Here the size of each output sample is set to 37.
-model_ft.fc = torch.nn.Linear(num_ftrs, 37)
-loss_ft = torch.nn.CrossEntropyLoss()
-optimizer_ft = torch.optim.SGD(model_ft.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
-
-# Compile our model with loss function, optimizer.
-model = Trainer.compile(model_ft, loss_ft, optimizer_ft)
-trainer = Trainer(max_epochs=5)
-trainer.fit(model, train_dataloader=train_dataloader)
-
-# Inference/Prediction
-x = torch.stack([val_dataset[0][0], val_dataset[1][0]])
-model_ft.eval()
-y_hat = model_ft(x)
-y_hat.argmax(dim=1)
-```
-
-### Step 3: Quantization using Post-training Optimization Tools
-Accelerator='openvino' means using OpenVINO POT to do quantization. The quantization can be added as below:
-```python
-from bigdl.nano.pytorch import InferenceOptimizer
-ov_q_model = InferenceOptimizer.quantize(model, accelerator="openvino", calib_data=data_loader)
-
-# run simple prediction
-batch = torch.stack([data_set[0][0], data_set[1][0]])
-ov_q_model(batch)
-```
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_train_quickstart.md b/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_train_quickstart.md
deleted file mode 100644
index 37efd61b..00000000
--- a/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_train_quickstart.md
+++ /dev/null
@@ -1,129 +0,0 @@
-# BigDL-Nano PyTorch Trainer Quickstart
-
-**In this guide we will describe how to scale out PyTorch programs using Nano Trainer in 5 simple steps**
-
-### Step 0: Prepare Environment
-
-We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../../UserGuide/python.md) for more details.
-
-```bash
-conda create py37 python==3.7.10 setuptools==58.0.4
-conda activate py37
-# nightly bulit version
-pip install --pre --upgrade bigdl-nano[pytorch]
-# set env variables for your conda environment
-source bigdl-nano-init
-pip install lightning-bolts
-```
-
-### Step 1: Import BigDL-Nano
-The PyTorch Trainer (`bigdl.nano.pytorch.Trainer`) is the place where we integrate most optimizations. It extends PyTorch Lightning's Trainer and has a few more parameters and methods specific to BigDL-Nano. The Trainer can be directly used to train a `LightningModule`.
-```python
-from bigdl.nano.pytorch import Trainer
-```
-Computer Vision task often needs a data processing pipeline that sometimes constitutes a non-trivial part of the whole training pipeline. Leveraging OpenCV and libjpeg-turbo, BigDL-Nano can accelerate computer vision data pipelines by providing a drop-in replacement of torch_vision's `datasets` and `transforms`.
-```python
-from bigdl.nano.pytorch.vision import transforms
-```
-
-### Step 2: Load the Data
-You can define the datamodule using standard [LightningDataModule](https://pytorch-lightning.readthedocs.io/en/latest/data/datamodule.html)
-```python
-from pl_bolts.datamodules import CIFAR10DataModule
-train_transforms = transforms.Compose(
-    [
-        transforms.RandomCrop(32, 4),
-        transforms.RandomHorizontalFlip(),
-        transforms.ToTensor()
-    ]
-)
-cifar10_dm = CIFAR10DataModule(
-    data_dir=os.environ.get('DATA_PATH', '.'),
-    batch_size=64,
-    train_transforms=train_transforms
-)
-return cifar10_dm
-```
-
-### Step 3: Define the Model
-
-You may define your model, loss and optimizer in the same way as in any standard PyTorch Lightning program.
-
-```python
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-import torchvision
-from pytorch_lightning import LightningModule
-
-def create_model():
-    model = torchvision.models.resnet18(pretrained=False, num_classes=10)
-    model.conv1 = nn.Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
-    model.maxpool = nn.Identity()
-    return model
-
-class LitResnet(LightningModule):
-    def __init__(self, learning_rate=0.05, num_processes=1):
-        super().__init__()
-
-        self.save_hyperparameters('learning_rate', 'num_processes')
-        self.model = create_model()
-
-    def forward(self, x):
-        out = self.model(x)
-        return F.log_softmax(out, dim=1)
-
-    def training_step(self, batch, batch_idx):
-        x, y = batch
-        logits = self(x)
-        loss = F.nll_loss(logits, y)
-        self.log("train_loss", loss)
-        return loss
-
-    def configure_optimizers(self):
-        optimizer = torch.optim.SGD(
-            self.parameters(),
-            lr=self.hparams.learning_rate,
-            momentum=0.9,
-            weight_decay=5e-4,
-        )
-        steps_per_epoch = 45000 // BATCH_SIZE // self.hparams.num_processes
-        scheduler_dict = {
-            "scheduler": OneCycleLR(
-                optimizer,
-                0.1,
-                epochs=self.trainer.max_epochs,
-                steps_per_epoch=steps_per_epoch,
-            ),
-            "interval": "step",
-        }
-        return {"optimizer": optimizer, "lr_scheduler": scheduler_dict}
-```
-For regular PyTorch modules, we also provide a "compile" method, that takes in a PyTorch module, an optimizer, and other PyTorch objects and "compiles" them into a `LightningModule`. You can find more information from [here](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Nano/pytorch.html#bigdl-nano-pytorch)
-
-
-### Step 4: Fit with Nano PyTorch Trainer
-```python
-model = LitResnet(learning_rate=0.05)
-single_trainer = Trainer(max_epochs=30)
-single_trainer.fit(model, datamodule=cifar10_dm)
-```
-At this stage, you may already experience some speedup due to the optimized environment variables set by source bigdl-nano-init. Besides, you can also enable optimizations delivered by BigDL-Nano by setting a paramter or calling a method to accelerate PyTorch or PyTorch Lightning application on training workloads.
-#### Increase the number of processes in distributed training to accelerate training.
-```python
-model = LitResnet(learning_rate=0.1, num_processes=4)
-single_trainer = Trainer(max_epochs=30, num_processes=4)
-single_trainer.fit(model, datamodule=cifar10_dm)
-```
-- Note: Here we use linear scaling rule to imporve the performance of model on distributed training. You can find more useful tricks on distributed computing from the [paper](https://arxiv.org/abs/1706.02677) published by Facebook AI research(FAIR).<br>
-- Note: If you're using a step related `lr_scheduler`, the value of lr_scheduler's pre_epoch_steps need to be modified accordingly, or the learning rate may not changes as expected. The change in learning_rate is shown in the following figure, where the blue line is the excepted change and the red one is the case when the pre_epoch_steps remain unchanged.
-![](../Image/learning_rate.png)
-#### Intel Extension for Pytorch (a.k.a. IPEX) link extends PyTorch with optimizations for an extra performance boost on Intel hardware. BigDL-Nano integrates IPEX through the Trainer. Users can turn on IPEX by setting use_ipex=True.
-```python
-model = LitResnet(learning_rate=0.1, num_processes=4)
-single_trainer = Trainer(max_epochs=30, num_processes=4, use_ipex=True)
-single_trainer.fit(model, datamodule=cifar10_dm)
-```
-Get more information about the optimizations from [here](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Nano/pytorch.html#bigdl-nano-pytorch)
-
-You can find the detailed result of training from [here](https://github.com/intel-analytics/BigDL/blob/main/python/nano/notebooks/pytorch/tutorial/pytorch_train.ipynb)
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/QuickStart/tensorflow_embedding.md b/docs/readthedocs/source/doc/Nano/QuickStart/tensorflow_embedding.md
deleted file mode 100644
index 3e2bea78..00000000
--- a/docs/readthedocs/source/doc/Nano/QuickStart/tensorflow_embedding.md
+++ /dev/null
@@ -1,130 +0,0 @@
-# BigDL-Nano TensorFlow SparseEmbedding and SparseAdam
-**In this guide we demonstrates how to use `SparseEmbedding` and `SparseAdam` to obtain stroger performance with sparse gradient.**
-
-### Step 0: Prepare Environment
-
-We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../../UserGuide/python.md) for more details.
-
-```bash
-conda create py37 python==3.7.10 setuptools==58.0.4
-conda activate py37
-# nightly bulit version
-pip install --pre --upgrade bigdl-nano[tensorflow]
-# set env variables for your conda environment
-source bigdl-nano-init
-pip install tensorflow-datasets
-```
-
-### Step 1: Import BigDL-Nano
-The optimizations in BigDL-Nano are delivered through BigDL-Nano’s `Model` and `Sequential` classes. For most cases, you can just replace your `tf.keras.Model` to `bigdl.nano.tf.keras.Model` and `tf.keras.Sequential` to `bigdl.nano.tf.keras.Sequential` to benefits from BigDL-Nano.
-```python
-from bigdl.nano.tf.keras import Model, Sequential
-```
-
-### Step 2: Load the data
-We demonstrate with imdb_reviews, a large movie review dataset.
-```python
-import tensorflow_datasets as tfds
-(raw_train_ds, raw_val_ds, raw_test_ds), info = tfds.load(
-    "imdb_reviews",
-    split=['train[:80%]', 'train[80%:]', 'test'],
-    as_supervised=True,
-    batch_size=32,
-    shuffle_files=False,
-    with_info=True
-)
-```
-
-### Step 3: Parepre the Data
-In particular, we remove <br /> tags.
-```python
-import tensorflow as tf
-from tensorflow.keras.layers import TextVectorization
-import string
-import re
-
-def custom_standardization(input_data):
-    lowercase = tf.strings.lower(input_data)
-    stripped_html = tf.strings.regex_replace(lowercase, "<br />", " ")
-    return tf.strings.regex_replace(
-        stripped_html, f"[{re.escape(string.punctuation)}]", ""
-    )
-
-max_features = 20000
-embedding_dim = 128
-sequence_length = 500
-
-vectorize_layer = TextVectorization(
-    standardize=custom_standardization,
-    max_tokens=max_features,
-    output_mode="int",
-    output_sequence_length=sequence_length,
-)
-
-# Let's make a text-only dataset (no labels):
-text_ds = raw_train_ds.map(lambda x, y: x)
-# Let's call `adapt`:
-vectorize_layer.adapt(text_ds)
-
-def vectorize_text(text, label):
-    text = tf.expand_dims(text, -1)
-    return vectorize_layer(text), label
-
-
-# Vectorize the data.
-train_ds = raw_train_ds.map(vectorize_text)
-val_ds = raw_val_ds.map(vectorize_text)
-test_ds = raw_test_ds.map(vectorize_text)
-
-# Do async prefetching / buffering of the data for best performance on GPU.
-train_ds = train_ds.cache().prefetch(buffer_size=10)
-val_ds = val_ds.cache().prefetch(buffer_size=10)
-test_ds = test_ds.cache().prefetch(buffer_size=10)
-```
-
-### Step 4: Build Model
-`bigdl.nano.tf.keras.Embedding` is a slightly modified version of `tf.keras.Embedding` layer, this embedding layer only applies regularizer to the output of the embedding layer, so that the gradient to embeddings is sparse. `bigdl.nano.tf.optimzers.Adam` is a variant of the `Adam` optimizer that handles sparse updates more efficiently. 
-Here we create two models, one using normal Embedding layer and Adam optimizer, the other using `SparseEmbedding` and `SparseAdam`.
-```python
-from tensorflow.keras import layers
-from bigdl.nano.tf.keras.layers import Embedding
-from bigdl.nano.tf.optimizers import SparseAdam
-
-from tensorflow.keras import layers
-from bigdl.nano.tf.keras.layers import Embedding
-from bigdl.nano.tf.optimizers import SparseAdam
-
-inputs = tf.keras.Input(shape=(None,), dtype="int64")
-
-# Embedding layer can only be used as the first layer in a model,
-# you need to provide the argument inputShape (a Single Shape, does not include the batch dimension).
-x = Embedding(max_features, embedding_dim)(inputs)
-x = layers.Dropout(0.5)(x)
-
-# Conv1D + global max pooling
-x = layers.Conv1D(128, 7, padding="valid", activation="relu", strides=3)(x)
-x = layers.Conv1D(128, 7, padding="valid", activation="relu", strides=3)(x)
-x = layers.GlobalMaxPooling1D()(x)
-
-# We add a vanilla hidden layer:
-x = layers.Dense(128, activation="relu")(x)
-x = layers.Dropout(0.5)(x)
-
-# We project onto a single unit output layer, and squash it with a sigmoid:
-predictions = layers.Dense(1, activation="sigmoid", name="predictions")(x)
-
-model = Model(inputs, predictions)
-
-# Compile the model with binary crossentropy loss and an SparseAdam optimizer.
-model.compile(loss="binary_crossentropy", optimizer=SparseAdam(), metrics=["accuracy"])
-```
-
-### Step 5: Training
-```python
-# Fit the model using the train and val datasets.
-model.fit(train_ds, validation_data=val_ds, epochs=3)
-
-model.evaluate(test_ds)
-```
-
-You can find the detailed result of training from [here](https://github.com/intel-analytics/BigDL/blob/main/python/nano/notebooks/tensorflow/tutorial/tensorflow_embedding.ipynb)
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/QuickStart/tensorflow_quantization_quickstart.md b/docs/readthedocs/source/doc/Nano/QuickStart/tensorflow_quantization_quickstart.md
deleted file mode 100644
index 7a6fcf0b..00000000
--- a/docs/readthedocs/source/doc/Nano/QuickStart/tensorflow_quantization_quickstart.md
+++ /dev/null
@@ -1,89 +0,0 @@
-## BigDL-Nano TensorFLow Quantization Quickstart
-**In this guide we will demonstrates how to apply post-training quantization on a keras model with BigDL-Nano in 4 simple steps.**
-
-### Step 0: Prepare Environment
-
-We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../../UserGuide/python.md) for more details.
-
-```bash
-conda create py37 python==3.7.10 setuptools==58.0.4
-conda activate py37
-# nightly bulit version
-pip install --pre --upgrade bigdl-nano[tensorflow]
-# set env variables for your conda environment
-source bigdl-nano-init
-```
-
-By default, [Intel Neural Compressor](https://github.com/intel/neural-compressor) is not installed with BigDL-Nano. So if you determine to use it as your quantization backend, you'll need to install it first:
-```bash
-pip install neural-compressor==1.11.0
-```
-
-BigDL-Nano provides several APIs which can help users easily apply optimizations on inference pipelines to improve latency and throughput. The Keras Model(`bigdl.nano.tf.keras.Model`) and InferenceOptimizer(`bigdl.nano.tf.keras.InferenceOptimizer`) provides the APIs for all optimizations you need for inference.
-
-```python
-from bigdl.nano.tf.keras import Model, InferenceOptimizer
-```
-
-### Step 1: Loading Data
-
-Here we load data from tensorflow_datasets. The [Imagenette](https://github.com/fastai/imagenette) is a subset of 10 easily classified classes from the Imagenet dataset.
-
-```python
-import tensorflow_datasets as tfds
-DATANAME = 'imagenette/320px-v2'
-(train_ds, test_ds), info = tfds.load(DATANAME, data_dir='../data/',
-                                     split=['train', 'validation'],
-                                     with_info=True,
-                                     as_supervised=True)
-```
-
-#### Prepare Inputs
-Here we resize the input image to uniform `IMG_SIZE` and the labels are put into one_hot.
-
-```python
-import tensorflow as tf
-img_size = 224
-num_classes = info.features['label'].num_classes
-train_ds = train_ds.map(lambda img, label: (tf.image.resize(img, (img_size, img_size)), tf.one_hot(label, num_classes))).batch(32)
-test_ds = test_ds.map(lambda img, label: (tf.image.resize(img, (img_size, img_size)), tf.one_hot(label, num_classes))).batch(32)
-```
-
-### Step 2: Build Model
-Here we initialize the ResNet50 from `tf.keras.applications` with pre-trained ImageNet weights.
-```python
-from tensorflow.keras.applications import ResNet50
-from tensorflow.keras import layers
-inputs = tf.keras.layers.Input(shape=(224, 224, 3))
-x = tf.cast(inputs, tf.float32)
-x = tf.keras.applications.resnet50.preprocess_input(x)
-backbone = ResNet50(weights='imagenet')
-backbone.trainable = False
-x = backbone(x)
-x = layers.Dense(512, activation='relu')(x)
-outputs = layers.Dense(num_classes, activation='softmax')(x)
-
-model = Model(inputs=inputs, outputs=outputs)
-model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])
-
-# fit
-model.fit(train_ds, epochs=1)
-```
-
-### Step 3: Quantization with Intel Neural Compressor
-[`InferenceOptimizer.quantize()`](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Nano/tensorflow.html#bigdl.nano.tf.keras.InferenceOptimizer.quantize) return a Keras module with desired precision and accuracy. Taking Resnet50 as an example, you can add quantization as below.
-
-```python
-from tensorflow.keras.metrics import CategoricalAccuracy
-q_model = InferenceOptimizer.quantize(model,
-                                      calib_dataset=dataset,
-                                      metric=CategoricalAccuracy(),
-                                      tuning_strategy='basic'
-                                      )
-```
-The quantized model can be called to do inference as normal keras model.
-```python
-# run simple prediction with transparent acceleration
-for img, _ in dataset:
-    q_model(img)
-```
diff --git a/docs/readthedocs/source/doc/Nano/QuickStart/tensorflow_train_quickstart.md b/docs/readthedocs/source/doc/Nano/QuickStart/tensorflow_train_quickstart.md
deleted file mode 100644
index 6d40180f..00000000
--- a/docs/readthedocs/source/doc/Nano/QuickStart/tensorflow_train_quickstart.md
+++ /dev/null
@@ -1,130 +0,0 @@
-# BigDL-Nano TensorFlow Training Quickstart
-**In this guide we will describe how to accelerate TensorFlow Keras application on training workloads using BigDL-Nano in 5 simple steps**
-
-### Step 0: Prepare Environment
-
-We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../../UserGuide/python.md) for more details.
-
-```bash
-conda create py37 python==3.7.10 setuptools==58.0.4
-conda activate py37
-# nightly bulit version
-pip install --pre --upgrade bigdl-nano[tensorflow]
-# set env variables for your conda environment
-source bigdl-nano-init
-pip install tensorflow-datasets
-```
-
-### Step 1: Import BigDL-Nano
-The optimizations in BigDL-Nano are delivered through BigDL-Nano’s `Model` and `Sequential` classes. For most cases, you can just replace your `tf.keras.Model` to `bigdl.nano.tf.keras.Model` and `tf.keras.Sequential` to `bigdl.nano.tf.keras.Sequential` to benefits from BigDL-Nano.
-```python
-from bigdl.nano.tf.keras import Model, Sequential
-```
-
-### Step 2: Load the Data
-Here we load data from tensorflow_datasets(hereafter [TFDS](https://www.tensorflow.org/datasets)). The [Stanford Dogs](http://vision.stanford.edu/aditya86/ImageNetDogs/main.html) dataset contains images of 120 breeds of dogs around the world. There are 20,580 images, out of which 12,000 are used for training and 8580 for testing.
-```python
-import tensorflow_datasets as tfds
-(ds_train, ds_test), ds_info = tfds.load(
-    "stanford_dogs",
-    data_dir="../data/",
-    split=['train', 'test'],
-    with_info=True,
-    as_supervised=True
-)
-```
-#### Prepare Inputs
-When the dataset include images with various size, we need to resize them into a shared size. The labels are put into one-hot. The dataset is batched.
-```python
-import tensorflow as tf
-img_size = 224
-num_classes = ds_info.features['label'].num_classes
-batch_size = 64
-def preprocessing(img, label):
-    return tf.image.resize(img, (img_size, img_size)), tf.one_hot(label, num_classes)
-AUTOTUNE = tf.data.AUTOTUNE
-ds_train = ds_train.cache().repeat().shuffle(1000).map(preprocessing).batch(batch_size, drop_remainder=True).prefetch(AUTOTUNE)
-ds_test = ds_test.map(preprocessing).batch(batch_size, drop_remainder=True).prefetch(AUTOTUNE)
-```
-
-### Step 3: Build Model
-BigDL-Nano's `Model` (`bigdl.nano.tf.keras.Model`) and `Sequential` (`bigdl.nano.tf.keras.Sequential`) classes have identical APIs with `tf.keras.Model` and `tf.keras.Sequential`.
-Here we initialize the model with pre-trained ImageNet weights, and we fine-tune it on the Stanford Dogs dataset.
-```python
-from tensorflow.keras import layers
-from tensorflow.keras.applications import EfficientNetB0
-data_augmentation = Sequential([
-        layers.RandomRotation(factor=0.15),
-        layers.RandomTranslation(height_factor=0.1, width_factor=0.1),
-        layers.RandomFlip(),
-        layers.RandomContrast(factor=0.1),
-    ])
-def make_model(learning_rate=1e-2):
-    inputs = layers.Input(shape = (img_size, img_size, 3))
-
-    x = data_augmentation(inputs)
-    backbone = EfficientNetB0(include_top=False, input_tensor=x)
-
-    backbone.trainable = False
-
-    x = layers.GlobalAveragePooling2D(name='avg_pool')(backbone.output)
-    x = layers.BatchNormalization()(x)
-
-    top_dropout_rate = 0.2
-    x = layers.Dropout(top_dropout_rate, name="top_dropout")(x)
-    outputs = layers.Dense(num_classes, activation="softmax", name="pred")(x)
-
-    model = Model(inputs, outputs, name='EfficientNet')
-    optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
-    model.compile(
-        loss="categorical_crossentropy", optimizer=optimizer, metrics=['accuracy']
-    )
-    return model
-
-def unfreeze_model(model):
-    for layer in model.layers[-20:]:
-        if not isinstance(layer, layers.BatchNormalization):
-            layer.trainable = True
-    
-    optimizer = tf.keras.optimizers.Adam(learning_rate=1e-4)
-    model.compile(
-        loss="categorical_crossentropy", optimizer=optimizer, metrics=['accuracy']
-    )
-```
-
-### Step 4: Training
-```python
-steps_per_epoch = ds_info.splits['train'].num_examples // batch_size
-model_default = make_model()
-
-model_default.fit(ds_train,
-                  epochs=15,
-                  validation_data=ds_test,
-                  steps_per_epoch=steps_per_epoch)
-unfreeze_model(model_default)
-his_default = model_default.fit(ds_train,
-                                epochs=10,
-                                validation_data=ds_test,
-                                steps_per_epoch=steps_per_epoch)
-```
-#### Multi-Instance Training
-BigDL-Nano makes it very easy to conduct multi-instance training correctly. You can just set the `num_processes` parameter in the `fit` method in your `Model` or `Sequential` object and BigDL-Nano will launch the specific number of processes to perform data-parallel training.
-```python
-model_multi = make_model()
-
-model_multi.fit(ds_train,
-                epochs=15, 
-                validation_data=ds_test, 
-                steps_per_epoch=steps_per_epoch,
-                num_processes=4, 
-                backend='multiprocessing')
-unfreeze_model(model_multi)
-his_multi = model_multi.fit(ds_train,
-                epochs=10,
-                validation_data=ds_test, 
-                steps_per_epoch=steps_per_epoch,
-                num_processes=4, 
-                backend='multiprocessing')
-```
-
-You can find the detailed result of training from [here](https://github.com/intel-analytics/BigDL/blob/main/python/nano/notebooks/tensorflow/tutorial/tensorflow_fit.ipynb)
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Tutorials/custom.nblink b/docs/readthedocs/source/doc/Nano/Tutorials/custom.nblink
deleted file mode 100644
index f3150216..00000000
--- a/docs/readthedocs/source/doc/Nano/Tutorials/custom.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../python/nano/notebooks/hpo/custom.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Tutorials/seq_and_func.nblink b/docs/readthedocs/source/doc/Nano/Tutorials/seq_and_func.nblink
deleted file mode 100644
index 065776d1..00000000
--- a/docs/readthedocs/source/doc/Nano/Tutorials/seq_and_func.nblink
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "path": "../../../../../../python/nano/notebooks/hpo/seq_and_func.ipynb"
-}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/index.rst b/docs/readthedocs/source/doc/Nano/index.rst
deleted file mode 100644
index dfde59f2..00000000
--- a/docs/readthedocs/source/doc/Nano/index.rst
+++ /dev/null
@@ -1,63 +0,0 @@
-BigDL-Nano
-=========================
-
-**BigDL-Nano** (or **Nano** for short) is a Python package to transparently accelerate PyTorch and TensorFlow applications on Intel XPU. It provides a unified and easy-to-use API for several optimization techniques and tools, so that users can only apply a few lines of code changes to make their PyTorch or TensorFlow code run faster.
-
--------
-
-
-.. grid:: 1 2 2 2
-    :gutter: 2
-
-    .. grid-item-card::
-
-        **Get Started**
-        ^^^
-
-        Documents in these sections helps you getting started quickly with Nano.
-
-        +++
-        :bdg-link:`Nano in 5 minutes <./Overview/nano.html>` |
-        :bdg-link:`Installation <./Overview/install.html>` |
-        :bdg-link:`Tutorials <./QuickStart/index.html>`
-
-    .. grid-item-card::
-
-        **Key Features Guide**
-        ^^^
-
-        Each guide in this section provides you with in-depth information, concepts and knowledges about Nano key features.
-
-        +++
-
-        :bdg:`PyTorch` :bdg-link:`Infer <./Overview/pytorch_inference.html>` :bdg-link:`Train <./Overview/pytorch_train.html>` |
-        :bdg:`TensorFlow` :bdg-link:`Infer <./Overview/tensorflow_inference.html>` :bdg-link:`Train <./Overview/tensorflow_train.html>`
-
-    .. grid-item-card::
-
-        **How-to Guide**
-        ^^^
-
-        How-to Guide provides bite-sized, actionable examples of how to use specific Nano features, different from our tutorials
-        which are full-length examples each implementing a full usage scenario.
-
-        +++
-
-        :bdg-link:`How-to-Guide <./Howto/index.html>`
-
-    .. grid-item-card::
-
-        **API Document**
-        ^^^
-
-        API Document provides detailed description of Nano APIs.
-
-        +++
-
-        :bdg-link:`API Document <../PythonAPI/Nano/index.html>`
-
-
-..  toctree::
-    :hidden:
-
-    BigDL-Nano Document <self>
diff --git a/docs/readthedocs/source/doc/Orca/Howto/autoestimator-pytorch-quickstart.md b/docs/readthedocs/source/doc/Orca/Howto/autoestimator-pytorch-quickstart.md
deleted file mode 100644
index 9fe8380f..00000000
--- a/docs/readthedocs/source/doc/Orca/Howto/autoestimator-pytorch-quickstart.md
+++ /dev/null
@@ -1,161 +0,0 @@
-# Enable AutoML for PyTorch
-
----
-
-![](../../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/autoestimator_pytorch_lenet_mnist.ipynb) &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/autoestimator_pytorch_lenet_mnist.ipynb)
-
----
-
-**In this guide we will describe how to enable automated hyper-parameter search for PyTorch using Orca `AutoEstimator`.**
-
-### Step 0: Prepare Environment
-
-[Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) is needed to prepare the Python environment for running this example. Please refer to the [install guide](https://bigdl.readthedocs.io/en/latest/doc/Orca/Overview/distributed-tuning.html#install) for more details.
-
-```bash
-conda create -n bigdl-orca-automl python=3.7 # bigdl-orca-automl is conda environment name, you can use any name you like.
-conda activate bigdl-orca-automl
-pip install bigdl-orca[automl]
-pip install torch==1.8.1 torchvision==0.9.1
-```
-
-### Step 1: Init Orca Context
-```python
-from bigdl.orca import init_orca_context, stop_orca_context
-
-if cluster_mode == "local":
-    init_orca_context(cores=4, memory="2g", init_ray_on_spark=True) # run in local mode
-elif cluster_mode == "k8s":
-    init_orca_context(cluster_mode="k8s", num_nodes=2, cores=4, init_ray_on_spark=True) # run on K8s cluster
-elif cluster_mode == "yarn":
-    init_orca_context(
-      cluster_mode="yarn-client", cores=4, num_nodes=2, memory="2g", init_ray_on_spark=True, 
-      driver_memory="10g", driver_cores=1) # run on Hadoop YARN cluster
-```
-
-This is the only place where you need to specify local or distributed mode. View [Orca Context](./../Overview/orca-context.md) for more details.
-
-**Note:** You should `export HADOOP_CONF_DIR=/path/to/hadoop/conf/dir` when running on Hadoop YARN cluster. View [Hadoop User Guide](./../../UserGuide/hadoop.md) for more details.
-
-### Step 2: Define the Model
-
-You may define your model, loss and optimizer in the same way as in any standard PyTorch program.
-
-```python
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-class LeNet(nn.Module):
-    def __init__(self, fc1_hidden_size=500):
-        super(LeNet, self).__init__()
-        self.conv1 = nn.Conv2d(1, 20, 5, 1)
-        self.conv2 = nn.Conv2d(20, 50, 5, 1)
-        self.fc1 = nn.Linear(4*4*50, fc1_hidden_size)
-        self.fc2 = nn.Linear(fc1_hidden_size, 10)
-
-    def forward(self, x):
-        x = F.relu(self.conv1(x))
-        x = F.max_pool2d(x, 2, 2)
-        x = F.relu(self.conv2(x))
-        x = F.max_pool2d(x, 2, 2)
-        x = x.view(-1, 4*4*50)
-        x = F.relu(self.fc1(x))
-        x = self.fc2(x)
-        return F.log_softmax(x, dim=1)
-
-criterion = nn.NLLLoss()
-```
-After defining your model, you need to define a *Model Creator Function* that returns an instance of your model, and a *Optimizer Creator Function* that returns a PyTorch optimizer. Note that both the *Model Creator Function* and the *Optimizer Creator Function* should take `config` as input and get the hyper-parameter values from `config`.
-
-```python
-def model_creator(config):
-    model = LeNet(fc1_hidden_size=config["fc1_hidden_size"])
-    return model
-
-def optim_creator(model, config):
-    return torch.optim.Adam(model.parameters(), lr=config["lr"])
-```
-
-### Step 3: Define Dataset
-
-You can define the train and validation datasets using *Data Creator Function* that takes `config` as input and returns a PyTorch `DataLoader`.
-
-```python
-import torch
-from torchvision import datasets, transforms
-
-torch.manual_seed(0)
-dir = './dataset'
-test_batch_size = 640
-
-def train_loader_creator(config):
-    train_loader = torch.utils.data.DataLoader(
-        datasets.MNIST(dir, train=True, download=True,
-                       transform=transforms.Compose([
-                           transforms.ToTensor(),
-                           transforms.Normalize((0.1307,), (0.3081,))
-                       ])),
-        batch_size=config["batch_size"], shuffle=True)
-    return train_loader
-
-def test_loader_creator(config):
-    test_loader = torch.utils.data.DataLoader(
-        datasets.MNIST(dir, train=False, download=True,
-                       transform=transforms.Compose([
-                           transforms.ToTensor(),
-                           transforms.Normalize((0.1307,), (0.3081,))
-                       ])),
-        batch_size=test_batch_size, shuffle=False)
-    return test_loader
-```
-
-### Step 4: Define Search Space
-You should define a dictionary as your hyper-parameter search space.
-
-The keys are hyper-parameter names which should be the same with those in your creators, and you can specify how you want to sample each hyper-parameter in the values of the search space. See [automl.hp](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/AutoML/automl.html#orca-automl-hp) for more details.
-
-```python
-from bigdl.orca.automl import hp
-
-search_space = {
-    "fc1_hidden_size": hp.choice([500, 600]),
-    "lr": hp.choice([0.001, 0.003]),
-    "batch_size": hp.choice([160, 320, 640]),
-}
-```
-
-### Step 5: Automatically Fit and Search with Orca AutoEstimator
-
-First, create an `AutoEstimator`. You can refer to [AutoEstimator API doc](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/AutoML/automl.html#orca-automl-auto-estimator) for more details.
-
-```python
-from bigdl.orca.automl.auto_estimator import AutoEstimator
-
-auto_est = AutoEstimator.from_torch(model_creator=model_creator,
-                                    optimizer=optim_creator,
-                                    loss=criterion,
-                                    logs_dir="/tmp/orca_automl_logs",
-                                    resources_per_trial={"cpu": 2},
-                                    name="lenet_mnist")
-```
-
-Next, use the `AutoEstimator` to fit and search for the best hyper-parameter set.
-
-```python
-auto_est.fit(data=train_loader_creator,
-             validation_data=test_loader_creator,
-             search_space=search_space,
-             n_sampling=2,
-             epochs=1,
-             metric="accuracy")
-```
-
-Finally, you can get the best learned model and the best hyper-parameters.
-
-```python
-best_model = auto_est.get_best_model()
-best_config = auto_est.get_best_config()
-```
-
-**Note:** You should call `stop_orca_context()` when your application finishes.
diff --git a/docs/readthedocs/source/doc/Orca/Howto/autoxgboost-quickstart.md b/docs/readthedocs/source/doc/Orca/Howto/autoxgboost-quickstart.md
deleted file mode 100644
index d71f2383..00000000
--- a/docs/readthedocs/source/doc/Orca/Howto/autoxgboost-quickstart.md
+++ /dev/null
@@ -1,82 +0,0 @@
-# Enable AutoML for XGBoost
-
----
-
-![](../../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/autoxgboost_regressor_sklearn_boston.ipynb) &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/autoxgboost_regressor_sklearn_boston.ipynb)
-
----
-
-**In this guide we will describe how to use Orca AutoXGBoost for automated xgboost tuning**
-
-Orca AutoXGBoost enables distributed automated hyper-parameter tuning for XGBoost, which includes `AutoXGBRegressor` and `AutoXGBClassifier` for sklearn`XGBRegressor` and `XGBClassifier` respectively. See more about [xgboost scikit-learn API](https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn).
-### Step 0: Prepare Environment
-
-[Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) is needed to prepare the Python environment for running this example. Please refer to the [install guide](https://bigdl.readthedocs.io/en/latest/doc/Orca/Overview/distributed-tuning.html#install) for more details.
-
-
-### Step 1: Init Orca Context
-```python
-from bigdl.orca import init_orca_context, stop_orca_context
-
-if cluster_mode == "local":
-    init_orca_context(cores=6, memory="2g", init_ray_on_spark=True) # run in local mode
-elif cluster_mode == "k8s":
-    init_orca_context(cluster_mode="k8s", num_nodes=2, cores=4, init_ray_on_spark=True) # run on K8s cluster
-elif cluster_mode == "yarn":
-    init_orca_context(
-      cluster_mode="yarn-client", cores=4, num_nodes=2, memory="2g", init_ray_on_spark=True, 
-      driver_memory="10g", driver_cores=1) # run on Hadoop YARN cluster
-```
-
-This is the only place where you need to specify local or distributed mode. View [Orca Context](./../Overview/orca-context.md) for more details.
-
-**Note:** You should `export HADOOP_CONF_DIR=/path/to/hadoop/conf/dir` when running on Hadoop YARN cluster. View [Hadoop User Guide](./../../UserGuide/hadoop.md) for more details.
-
-### Step 2: Define Search space
-
-You should define a dictionary as your hyper-parameter search space.
-
-The keys are hyper-parameter names you want to search for `XGBRegressor`, and you can specify how you want to sample each hyper-parameter in the values of the search space. See [automl.hp](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/AutoML/automl.html#orca-automl-hp) for more details.
-
-```python
-from bigdl.orca.automl import hp
-
-search_space = {
-    "n_estimators": hp.grid_search([50, 100, 200]),
-    "max_depth": hp.choice([2, 4, 6]),
-}
-```
-
-### Step 3: Automatically fit and search with Orca AutoXGBoost
-
-First create an `AutoXGBRegressor`.
-
-```python
-from bigdl.orca.automl.xgboost import AutoXGBRegressor
-
-auto_xgb_reg = AutoXGBRegressor(cpus_per_trial=2, 
-                                name="auto_xgb_classifier",
-                                min_child_weight=3,
-                                random_state=2)
-```
-
-Next, use the `AutoXGBRegressor` to fit and search for the best hyper-parameter set.
-
-```python
-auto_xgb_reg.fit(data=(X_train, y_train),
-                 validation_data=(X_test, y_test),
-                 search_space=search_space,
-                 n_sampling=2,
-                 metric="rmse")
-```
-
-### Step 4: Get best model and hyper parameters
-
-You can get the best learned model and the best hyper-parameter set for further deployment. The best model is an sklearn `XGBRegressor` instance.
-
-```python
-best_model = auto_xgb_reg.get_best_model()
-best_config = auto_xgb_reg.get_best_config()
-```
-
-**Note:** You should call `stop_orca_context()` when your application finishes.
diff --git a/docs/readthedocs/source/doc/Orca/Howto/index.rst b/docs/readthedocs/source/doc/Orca/Howto/index.rst
deleted file mode 100644
index 88171b7f..00000000
--- a/docs/readthedocs/source/doc/Orca/Howto/index.rst
+++ /dev/null
@@ -1,26 +0,0 @@
-Orca How-to Guides
-=========================
-
-**TensorFlow:**
-
-* `Scale TensorFlow 2 Applications <tf2keras-quickstart.html>`__
-* `Scale TensorFlow 1.15 Applications <tf1-quickstart.html>`__
-* `Scale Keras 2.3 Applications <tf1keras-quickstart.html>`__
-
-**PyTorch:**
-
-* `Scale PyTorch Applications <pytorch-quickstart.html>`__
-
-**Ray:**
-
-* `Run Ray programs on Big Data clusters <ray-quickstart.html>`__
-
-**Data Processing:**
-
-* `Use Spark DataFrames for Deep Learning <spark-dataframe.html>`__
-* `Use Distributed Pandas for Deep Learning <xshards-pandas.html>`__
-
-**AutoML:**
-
-* `Enable AutoML for PyTorch <autoestimator-pytorch-quickstart.html>`__
-* `Enable AutoML for XGBoost <autoxgboost-quickstart.html>`__
diff --git a/docs/readthedocs/source/doc/Orca/Howto/pytorch-quickstart-bigdl.md b/docs/readthedocs/source/doc/Orca/Howto/pytorch-quickstart-bigdl.md
deleted file mode 100644
index d52063df..00000000
--- a/docs/readthedocs/source/doc/Orca/Howto/pytorch-quickstart-bigdl.md
+++ /dev/null
@@ -1,149 +0,0 @@
-# PyTorch Quickstart
-
----
-
-![](../../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/pytorch_lenet_mnist_bigdl.ipynb) &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/pytorch_lenet_mnist_bigdl.ipynb)
-
----
-
-**In this guide we will describe how to scale out _PyTorch_ programs using Orca in 4 simple steps.**
-
-### Step 0: Prepare Environment
-
-[Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) is needed to prepare the Python environment for running this example. Please refer to the [install guide](../../UserGuide/python.md) for more details.
-
-
-```bash
-conda create -n py37 python=3.7  # "py37" is conda environment name, you can use any name you like.
-conda activate py37
-pip install bigdl-orca
-pip install torch==1.7.1 torchvision==0.8.2
-pip install six cloudpickle
-pip install jep==3.9.0
-```
-
-### Step 1: Init Orca Context
-```python
-from bigdl.orca import init_orca_context, stop_orca_context
-
-cluster_mode = "local"
-if cluster_mode == "local":  # For local machine
-    init_orca_context(cores=4, memory="10g")
-elif cluster_mode == "k8s":  # For K8s cluster
-    init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2, memory="10g", driver_memory="10g", driver_cores=1)
-elif cluster_mode == "yarn":  # For Hadoop/YARN cluster
-    init_orca_context(
-    cluster_mode="yarn", cores=2, num_nodes=2, memory="10g",
-    driver_memory="10g", driver_cores=1,
-    conf={"spark.rpc.message.maxSize": "1024",
-        "spark.task.maxFailures": "1",
-        "spark.driver.extraJavaOptions": "-Dbigdl.failure.retryTimes=1"})
-```
-
-This is the only place where you need to specify local or distributed mode. View [Orca Context](../Overview/orca-context.md) for more details.
-
-**Note:** You should `export HADOOP_CONF_DIR=/path/to/hadoop/conf/dir` when running on Hadoop YARN cluster. View [Hadoop User Guide](../../UserGuide/hadoop.md) for more details.
-
-### Step 2: Define the Model
-
-You may define your model, loss and optimizer in the same way as in any standard (single node) PyTorch program.
-
-```python
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-class LeNet(nn.Module):
-    def __init__(self):
-        super(LeNet, self).__init__()
-        self.conv1 = nn.Conv2d(1, 20, 5, 1)
-        self.conv2 = nn.Conv2d(20, 50, 5, 1)
-        self.fc1 = nn.Linear(4*4*50, 500)
-        self.fc2 = nn.Linear(500, 10)
-        
-    def forward(self, x):
-        x = F.relu(self.conv1(x))
-        x = F.max_pool2d(x, 2, 2)
-        x = F.relu(self.conv2(x))
-        x = F.max_pool2d(x, 2, 2)
-        x = x.view(-1, 4*4*50)
-        x = F.relu(self.fc1(x))
-        x = self.fc2(x)
-        return F.log_softmax(x, dim=1)
-
-model = LeNet()
-model.train()
-criterion = nn.NLLLoss()
-adam = torch.optim.Adam(model.parameters(), 0.001)
-```
-
-### Step 3: Define Train Dataset
-
-You can define the dataset using standard [Pytorch DataLoader](https://pytorch.org/docs/stable/data.html). 
-
-```python
-import torch
-from torchvision import datasets, transforms
-
-torch.manual_seed(0)
-dir='./'
-
-batch_size=64
-test_batch_size=64
-train_loader = torch.utils.data.DataLoader(
-    datasets.MNIST(dir, train=True, download=True,
-                   transform=transforms.Compose([
-                       transforms.ToTensor(),
-                       transforms.Normalize((0.1307,), (0.3081,))
-                   ])),
-    batch_size=batch_size, shuffle=True)
-test_loader = torch.utils.data.DataLoader(
-    datasets.MNIST(dir, train=False,
-                   transform=transforms.Compose([
-                       transforms.ToTensor(),
-                       transforms.Normalize((0.1307,), (0.3081,))
-                   ])),
-    batch_size=test_batch_size, shuffle=False)
-```
-
-Alternatively, we can also use a [Data Creator Function](https://github.com/intel-analytics/BigDL/blob/main/docs/docs/colab-notebook/orca/quickstart/pytorch_lenet_mnist_data_creator_func.ipynb) or [Orca XShards](../Overview/data-parallel-processing) as the input data, especially when the data size is very large)
-
-### Step 4: Fit with Orca Estimator
-
-First, Create an Estimator
-
-```python
-from bigdl.orca.learn.pytorch import Estimator 
-from bigdl.orca.learn.metrics import Accuracy
-
-est = Estimator.from_torch(model=model, optimizer=adam, loss=criterion, metrics=[Accuracy()])
-```
-
-Next, fit and evaluate using the Estimator
-
-```python
-from bigdl.orca.learn.trigger import EveryEpoch 
-
-est.fit(data=train_loader, epochs=10, validation_data=test_loader,
-        checkpoint_trigger=EveryEpoch())
-
-result = est.evaluate(data=test_loader)
-for r in result:
-    print(r, ":", result[r])
-```
-
-### Step 5: Save and Load the Model
-
-Save the Estimator states (including model and optimizer) to the provided model path.
-
-```python
-est.save("mnist_model")
-```
-
-Load the Estimator states (model and possibly with optimizer) from the provided model path.
-
-```python
-est.load("mnist_model")
-```
-
-**Note:** You should call `stop_orca_context()` when your application finishes.
diff --git a/docs/readthedocs/source/doc/Orca/Howto/pytorch-quickstart-ray.md b/docs/readthedocs/source/doc/Orca/Howto/pytorch-quickstart-ray.md
deleted file mode 100644
index 6720d8de..00000000
--- a/docs/readthedocs/source/doc/Orca/Howto/pytorch-quickstart-ray.md
+++ /dev/null
@@ -1,147 +0,0 @@
-# Use `torch.distributed` in Orca
-
----
-
-![](../../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/pytorch_lenet_mnist_ray.ipynb) &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/pytorch_lenet_mnist_ray.ipynb)
-
----
-
-**In this guide we will describe how to scale out _PyTorch_ programs using the `torch.distributed` package in Orca.**
-
-### Step 0: Prepare Environment
-
-[Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) is needed to prepare the Python environment for running this example. Please refer to the [install guide](../../UserGuide/python.md) for more details.
-
-```bash
-conda create -n py37 python=3.7  # "py37" is conda environment name, you can use any name you like.
-conda activate py37
-pip install bigdl-orca[ray]
-pip install torch==1.7.1 torchvision==0.8.2
-```
-
-### Step 1: Init Orca Context
-```python
-from bigdl.orca import init_orca_context, stop_orca_context
-
-if cluster_mode == "local":  # For local machine
-    init_orca_context(cores=4, memory="10g")
-elif cluster_mode == "k8s":  # For K8s cluster
-    init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2, memory="10g", driver_memory="10g", driver_cores=1)
-elif cluster_mode == "yarn":  # For Hadoop/YARN cluster
-    init_orca_context(cluster_mode="yarn", cores=2, num_nodes=2, memory="10g", driver_memory="10g", driver_cores=1)
-```
-
-This is the only place where you need to specify local or distributed mode. View [Orca Context](../Overview/orca-context.md) for more details.
-
-**Note:** You should `export HADOOP_CONF_DIR=/path/to/hadoop/conf/dir` when running on Hadoop YARN cluster. View [Hadoop User Guide](../../UserGuide/hadoop.md) for more details.
-
-### Step 2: Define the Model
-
-You may define your model, loss and optimizer in the same way as in any standard (single node) PyTorch program.
-
-```python
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-class LeNet(nn.Module):
-    def __init__(self):
-        super(LeNet, self).__init__()
-        self.conv1 = nn.Conv2d(1, 20, 5, 1)
-        self.conv2 = nn.Conv2d(20, 50, 5, 1)
-        self.fc1 = nn.Linear(4*4*50, 500)
-        self.fc2 = nn.Linear(500, 10)
-
-    def forward(self, x):
-        x = F.relu(self.conv1(x))
-        x = F.max_pool2d(x, 2, 2)
-        x = F.relu(self.conv2(x))
-        x = F.max_pool2d(x, 2, 2)
-        x = x.view(-1, 4*4*50)
-        x = F.relu(self.fc1(x))
-        x = self.fc2(x)
-        return F.log_softmax(x, dim=1)
-
-criterion = nn.NLLLoss()
-```
-After defining your model, you need to define a *Model Creator Function* that returns an instance of your model, and a *Optimizer Creator Function* that returns a PyTorch optimizer.
-
-```python
-def model_creator(config):
-    model = LeNet()
-    return model
-
-def optim_creator(model, config):
-    return torch.optim.Adam(model.parameters(), lr=0.001)
-```
-
-### Step 3: Define Train Dataset
-
-You can define the dataset using a *Data Creator Function* that returns a PyTorch `DataLoader`. Orca also supports [Orca SparkXShards](../Overview/data-parallel-processing).
-
-```python
-import torch
-from torchvision import datasets, transforms
-
-torch.manual_seed(0)
-batch_size = 320
-test_batch_size = 320
-dir = './dataset'
-
-def train_loader_creator(config, batch_size):
-    train_loader = torch.utils.data.DataLoader(
-        datasets.MNIST(dir, train=True, download=True,
-                       transform=transforms.Compose([
-                           transforms.ToTensor(),
-                           transforms.Normalize((0.1307,), (0.3081,))
-                       ])),
-        batch_size=batch_size, shuffle=True)
-    return train_loader
-
-def test_loader_creator(config, batch_size):
-    test_loader = torch.utils.data.DataLoader(
-        datasets.MNIST(dir, train=False,
-                       transform=transforms.Compose([
-                           transforms.ToTensor(),
-                           transforms.Normalize((0.1307,), (0.3081,))
-                       ])),
-        batch_size=batch_size, shuffle=False)
-    return test_loader
-```
-
-### Step 4: Fit with Orca Estimator
-
-First, Create an Estimator
-
-```python
-from bigdl.orca.learn.pytorch import Estimator 
-from bigdl.orca.learn.metrics import Accuracy
-
-est = Estimator.from_torch(model=model_creator, optimizer=optim_creator, loss=criterion, metrics=[Accuracy()],
-                           backend="ray")
-```
-
-Next, fit and evaluate using the Estimator
-
-```python
-est.fit(data=train_loader_creator, epochs=1, batch_size=batch_size)
-result = est.evaluate(data=test_loader_creator, batch_size=test_batch_size)
-for r in result:
-    print(r, ":", result[r])
-```
-
-### Step 5: Save and Load the Model
-
-Save the Estimator states (including model and optimizer) to the provided model path.
-
-```python
-est.save("mnist_model")
-```
-
-Load the Estimator states (model and possibly with optimizer) from provided model path. 
-
-```python
-est.load("mnist_model") 
-```
-
-**Note:** You should call `stop_orca_context()` when your application finishes.
diff --git a/docs/readthedocs/source/doc/Orca/Howto/pytorch-quickstart.md b/docs/readthedocs/source/doc/Orca/Howto/pytorch-quickstart.md
deleted file mode 100644
index b9b4c12e..00000000
--- a/docs/readthedocs/source/doc/Orca/Howto/pytorch-quickstart.md
+++ /dev/null
@@ -1,148 +0,0 @@
-# Scale PyTorch Applications
-
----
-
-![](../../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/pytorch_lenet_mnist_spark.ipynb) &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/pytorch_lenet_mnist_spark.ipynb)
-
----
-
-**In this guide we will describe how to scale out _PyTorch_ programs using Orca in 5 simple steps.**
-
-### Step 0: Prepare Environment
-
-We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../Overview/install.md) for more details.
-
-```bash
-conda create -n py37 python=3.7  # "py37" is conda environment name, you can use any name you like.
-conda activate py37
-
-pip install bigdl-orca 
-pip install torch torchvision
-pip install tqdm
-```
-
-### Step 1: Init Orca Context
-```python
-from bigdl.orca import init_orca_context, stop_orca_context
-
-if cluster_mode == "local":  # For local machine
-    init_orca_context(cores=4, memory="4g")
-elif cluster_mode == "k8s":  # For K8s cluster
-    init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2, memory="4g", master=..., container_image=...)
-elif cluster_mode == "yarn":  # For Hadoop/YARN cluster
-    init_orca_context(cluster_mode="yarn", num_nodes=2, cores=2, memory="4g")
-```
-
-This is the only place where you need to specify local or distributed mode. View [Orca Context](../Overview/orca-context.md) for more details.
-
-Please check the tutorials if you want to run on [Kubernetes](../Tutorial/k8s.md) or [Hadoop/YARN](../Tutorial/yarn.md) clusters.
-
-### Step 2: Define the Model
-
-You may define your model, loss and optimizer in the same way as in any standard (single node) PyTorch program.
-
-```python
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-
-class LeNet(nn.Module):
-    def __init__(self):
-        super(LeNet, self).__init__()
-        self.conv1 = nn.Conv2d(1, 20, 5, 1)
-        self.conv2 = nn.Conv2d(20, 50, 5, 1)
-        self.fc1 = nn.Linear(4*4*50, 500)
-        self.fc2 = nn.Linear(500, 10)
-
-    def forward(self, x):
-        x = F.relu(self.conv1(x))
-        x = F.max_pool2d(x, 2, 2)
-        x = F.relu(self.conv2(x))
-        x = F.max_pool2d(x, 2, 2)
-        x = x.view(-1, 4*4*50)
-        x = F.relu(self.fc1(x))
-        x = self.fc2(x)
-        return F.log_softmax(x, dim=1)
-
-loss = nn.NLLLoss()
-```
-
-You need to define a *Model Creator Function* that takes the parameter `config` and returns an instance of your PyTorch model, and an *Optimizer Creator Function* that takes two parameters `model` and `config` and returns an instance of your PyTorch optimizer.
-
-```python
-def model_creator(config):
-    model = LeNet()
-    return model
-
-def optim_creator(model, config):
-    return torch.optim.Adam(model.parameters(), lr=config.get("lr", 0.001))
-```
-
-### Step 3: Define Train Dataset
-
-You can define the dataset using a *Data Creator Function* that has two parameters `config` and `batch_size` and returns a [Pytorch DataLoader](https://pytorch.org/docs/stable/data.html). Orca also supports [Spark DataFrame](./spark-dataframe.md) and [Orca XShards](./xshards-pandas.md).
-
-```python
-from torchvision import datasets, transforms
-
-dir = '/tmp/dataset'
-
-def train_loader_creator(config, batch_size):
-    train_loader = torch.utils.data.DataLoader(
-        datasets.MNIST(dir, train=True, download=True,
-                       transform=transforms.Compose([
-                           transforms.ToTensor(),
-                           transforms.Normalize((0.1307,), (0.3081,))
-                       ])),
-        batch_size=batch_size, shuffle=True)
-    return train_loader
-
-def test_loader_creator(config, batch_size):
-    test_loader = torch.utils.data.DataLoader(
-        datasets.MNIST(dir, train=False,
-                       transform=transforms.Compose([
-                           transforms.ToTensor(),
-                           transforms.Normalize((0.1307,), (0.3081,))
-                       ])),
-        batch_size=batch_size, shuffle=False)
-    return test_loader
-```
-
-### Step 4: Fit with Orca Estimator
-
-First, Create an Orca Estimator for PyTorch.
-
-```python
-from bigdl.orca.learn.pytorch import Estimator 
-from bigdl.orca.learn.metrics import Accuracy
-
-est = Estimator.from_torch(model=model_creator, optimizer=optim_creator, loss=loss,
-                           metrics=[Accuracy()], use_tqdm=True)
-```
-
-Next, fit and evaluate using the Estimator.
-
-```python
-batch_size = 64
-
-train_stats = est.fit(data=train_loader_creator, epochs=1, batch_size=batch_size)
-eval_stats = est.evaluate(data=test_loader_creator, batch_size=batch_size)
-print(eval_stats)
-```
-
-### Step 5: Save and Load the Model
-
-Save the Estimator states (including model and optimizer) to the provided model path.
-```python
-est.save("mnist_model")
-```
-
-Load the Estimator states (including model and optimizer) from the provided model path.
-
-```python
-est.load("mnist_model")
-```
-
-**Note:** You should call `stop_orca_context()` when your application finishes.
-
-That's it, the same code can run seamlessly on your local laptop and scale to [Kubernetes](../Tutorial/k8s.md) or [Hadoop/YARN](../Tutorial/yarn.md) clusters.
diff --git a/docs/readthedocs/source/doc/Orca/Howto/ray-quickstart.md b/docs/readthedocs/source/doc/Orca/Howto/ray-quickstart.md
deleted file mode 100644
index d71884cf..00000000
--- a/docs/readthedocs/source/doc/Orca/Howto/ray-quickstart.md
+++ /dev/null
@@ -1,129 +0,0 @@
-# Run Ray programs on Big Data clusters
-
----
-
-![](../../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/ray_parameter_server.ipynb) &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/ray_parameter_server.ipynb)
-
----
-
-**In this guide, we will describe how to use RayOnSpark to directly run Ray programs on Big Data clusters in 2 simple steps.**
-
-### Step 0: Prepare Environment
-
-We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../Overview/install.md) for more details.
-
-```bash
-conda create -n py37 python=3.7  # "py37" is conda environment name, you can use any name you like.
-conda activate py37
-
-pip install bigdl-orca[ray]
-```
-
-### Step 1: Init Orca Context
-
-The Ray cluster would be launched automatically by specifying `init_ray_on_spark=True` in `init_orca_context`.
-
-```python
-from bigdl.orca import init_orca_context, stop_orca_context
-
-if cluster_mode == "local":  # For local machine
-    sc = init_orca_context(cluster_mode="local", cores=4, memory="4g", init_ray_on_spark=True)
-elif cluster_mode == "k8s":  # For K8s cluster
-    sc = init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2, memory="4g", init_ray_on_spark=True, master=..., container_image=...)
-elif cluster_mode == "yarn":  # For Hadoop/YARN cluster
-    sc = init_orca_context(cluster_mode="yarn", num_nodes=2, cores=2, memory="4g", init_ray_on_spark=True)
-```
-
-This is the only place where you need to specify local or distributed mode. See [here](../Overview/ray.md#initialize) for more RayOnSpark related arguments when you `init_orca_context`.
-
-By default, the Ray cluster would be launched using Spark barrier execution mode, you can turn it off via the configurations of `OrcaContext`:
-
-```python
-from bigdl.orca import OrcaContext
-
-OrcaContext.barrier_mode = False
-```
-
-You can retrieve the information of the Ray cluster via `OrcaContext`:
-
-```python
-from bigdl.orca import OrcaContext
-
-ray_ctx = OrcaContext.get_ray_context()
-address_info = ray_ctx.address_info  # The dictionary information of the ray cluster, including node_ip_address, object_store_address, webui_url, etc.
-redis_address = ray_ctx.redis_address  # The redis address of the ray cluster.
-```
-
-View [Orca Context](../Overview/orca-context.md) for more details.
-
-Please check the tutorials if you want to run on [Kubernetes](../Tutorial/k8s.md) or [Hadoop/YARN](../Tutorial/yarn.md) clusters.
-
-
-### Step 2: Run Ray Applications
-
-After the initialization, you can directly write Ray code inline with your Spark code, and run Ray programs on the underlying existing Big Data clusters. Ray [tasks](https://docs.ray.io/en/master/walkthrough.html#remote-functions-tasks) and [actors](https://docs.ray.io/en/master/actors.html) would be launched across the cluster.
-
-The following example uses actor handles to implement a parameter server example for distributed asynchronous stochastic gradient descent. This is a simple Ray example for demonstration purpose. You can write other Ray applications as you wish in a similar way.
-
-A parameter server is simply an object that stores the parameters (or "weights") of a machine learning model (this could be a neural network, a linear model, or something else). It exposes two methods: one for getting the parameters and one for updating the parameters.
-
-By adding the `@ray.remote` decorator, the `ParameterServer` class becomes a Ray actor.
-
-```python
-import ray
-import numpy as np
-
-dim = 10
-@ray.remote
-class ParameterServer(object):
-    def __init__(self, dim):
-        self.parameters = np.zeros(dim)
-
-    def get_parameters(self):
-        return self.parameters
-
-    def update_parameters(self, update):
-        self.parameters += update
-
-ps = ParameterServer.remote(dim)
-```
-
-In a typical machine learning training application, worker processes will run in an infinite loop that does the following:
-
-1. Get the latest parameters from the parameter server.
-2. Compute an update to the parameters (using the current parameters and some data).
-3. Send the update to the parameter server.
-
-By adding the `@ray.remote` decorator, the `worker` function becomes a Ray remote function.
-
-```python
-import time
-
-@ray.remote
-def worker(ps, dim, num_iters):
-    for _ in range(num_iters):
-        # Get the latest parameters.
-        parameters = ray.get(ps.get_parameters.remote())
-        # Compute an update.
-        update = 1e-3 * parameters + np.ones(dim)
-        # Update the parameters.
-        ps.update_parameters.remote(update)
-        # Sleep a little to simulate a real workload.
-        time.sleep(0.5)
-
-# Test that worker is implemented correctly.
-ray.get(worker.remote(ps, dim, 1))
-
-# Start two workers.
-worker_results = [worker.remote(ps, dim, 100) for _ in range(2)]
-```
-
-As the worker tasks are executing, you can query the parameter server from the driver and see the parameters changing in the background.
-
-```
-print(ray.get(ps.get_parameters.remote()))
-```
-
-**Note:** You should call `stop_orca_context()` when your program finishes.
-
-That's it, the same code can run seamlessly on your local laptop and scale to [Kubernetes](../Tutorial/k8s.md) or [Hadoop/YARN](../Tutorial/yarn.md) clusters.
diff --git a/docs/readthedocs/source/doc/Orca/Howto/spark-dataframe.md b/docs/readthedocs/source/doc/Orca/Howto/spark-dataframe.md
deleted file mode 100644
index a12ae143..00000000
--- a/docs/readthedocs/source/doc/Orca/Howto/spark-dataframe.md
+++ /dev/null
@@ -1,111 +0,0 @@
-# Use Spark DataFrames for Deep Learning
-
----
-
-![](../../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/ncf_dataframe.ipynb) &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/ncf_dataframe.ipynb)
-
----
-
-**In this guide we will describe how to use Apache Spark Dataframes to scale-out data processing for distributed deep learning.**
-
-The dataset used in this guide is [movielens-1M](https://grouplens.org/datasets/movielens/1m/), which contains 1 million ratings of 5 levels from 6000 users on 4000 movies. We will read the data into Spark Dataframe and directly use the Spark Dataframe as the input to the distributed training.
-
-### 1. Read input data into Spark DataFrame
-
-First, read the input data into Spark Dataframes.
-
-```python
-from bigdl.orca import OrcaContext
-
-spark = OrcaContext.get_spark_session()
-# read csv with specifying column names
-df = spark.read.csv(new_rating_files, sep=':', inferSchema=True).toDF(
-  "user", "item", "label", "timestamp")
-```
-
-### 2. Process data using Spark Dataframe
-
-Next, process the data using Spark Dataframe operations.
-
-```python
-# update label starting from 0. That's because ratings go from 1 to 5, while the matrix column index goes from 0 to 4
-df = df.withColumn('label', df.label-1)
-
-# split to train/test dataset
-train_data, test_data = df.randomSplit([0.8, 0.2], 100)
-```
-
-### 3. Define NCF model
-
-This example defines NCF model in the _Creator Function_ using TensorFlow 2 APIs as follows.
-
-```python
-from tensorflow import keras
-import tensorflow as tf
-
-def model_creator(config):
-    embedding_size=16
-    user = keras.layers.Input(dtype=tf.int32, shape=(None,))
-    item = keras.layers.Input(dtype=tf.int32, shape=(None,))
-    label = keras.layers.Input(dtype=tf.int32, shape=(None,))
-
-    with tf.name_scope("GMF"):
-        user_embed_GMF = keras.layers.Embedding(max_user_id + 1, embedding_size)(user)
-        item_embed_GMF = keras.layers.Embedding(max_item_id + 1, embedding_size)(item)
-        GMF = keras.layers.Multiply()([user_embed_GMF, item_embed_GMF])
-
-    with tf.name_scope("MLP"):
-        user_embed_MLP = keras.layers.Embedding(max_user_id + 1, embedding_size)(user)
-        item_embed_MLP = keras.layers.Embedding(max_item_id + 1, embedding_size)(item)
-        interaction = concat([user_embed_MLP, item_embed_MLP], axis=-1)
-        layer1_MLP = keras.layers.Dense(units=embedding_size * 2, activation='relu')(interaction)
-        layer1_MLP = keras.layers.Dropout(rate=0.2)(layer1_MLP)
-        layer2_MLP = keras.layers.Dense(units=embedding_size, activation='relu')(layer1_MLP)
-        layer2_MLP = keras.layers.Dropout(rate=0.2)(layer2_MLP)
-        layer3_MLP = keras.layers.Dense(units=embedding_size // 2, activation='relu')(layer2_MLP)
-        layer3_MLP = keras.layers.Dropout(rate=0.2)(layer3_MLP)
-
-    # Concate the two parts together
-    with tf.name_scope("concatenation"):
-        concatenation = tf.concat([GMF, layer3_MLP], axis=-1)
-        outputs = keras.layers.Dense(units=5, activation='softmax')(concatenation)
-
-    model = keras.Model(inputs=[user, item], outputs=outputs)
-    model.compile(optimizer="adam",
-                  loss="sparse_categorical_crossentropy",
-                  metrics=['accuracy'])
-    return model
-```
-
-### 4. Fit with Orca Estimator
-
-Finally, run distributed model training/inference on the Spark Dataframes directly.
-
-```python
-from bigdl.orca.learn.tf2 import Estimator
-
-# create an Estimator
-est = Estimator.from_keras(model_creator=model_creator) # the model accept two inputs and one label
-
-# fit with Estimator
-stats = est.fit(train_data,
-                epochs=epochs,
-                batch_size=batch_size,
-                feature_cols=['user', 'item'], # specifies which column(s) to be used as inputs
-                label_cols=['label'], # specifies which column(s) to be used as labels
-                steps_per_epoch=800000 // batch_size,
-                validation_data=test_data,
-                validation_steps=200000 // batch_size)
-
-checkpoint_path = os.path.join(model_dir, "NCF.ckpt")
-est.save(checkpoint_path)
-
-# evaluate with Estimator
-stats = est.evaluate(test_data,
-                     feature_cols=['user', 'item'], # specifies which column(s) to be used as inputs
-                     label_cols=['label'], # specifies which column(s) to be used as labels
-                     num_steps=100000 // batch_size)
-est.shutdown()
-print(stats)
-```
-
diff --git a/docs/readthedocs/source/doc/Orca/Howto/tf1-quickstart.md b/docs/readthedocs/source/doc/Orca/Howto/tf1-quickstart.md
deleted file mode 100644
index baaf670a..00000000
--- a/docs/readthedocs/source/doc/Orca/Howto/tf1-quickstart.md
+++ /dev/null
@@ -1,120 +0,0 @@
-# Scale TensorFlow 1.15 Applications
-
----
-
-![](../../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/tf_lenet_mnist.ipynb) &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/tf_lenet_mnist.ipynb)
-
----
-
-**In this guide we will describe how to scale out _TensorFlow 1.15_ programs using Orca in 4 simple steps.**
-
-### Step 0: Prepare Environment
-
-We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../Overview/install.md) for more details.
-
-```bash
-conda create -n py37 python=3.7  # "py37" is conda environment name, you can use any name you like.
-conda activate py37
-
-pip install bigdl-orca
-pip install tensorflow==1.15
-pip install tensorflow-datasets==2.0
-pip install psutil
-```
-
-### Step 1: Init Orca Context
-```python
-from bigdl.orca import init_orca_context, stop_orca_context
-
-if cluster_mode == "local":  # For local machine
-    init_orca_context(cluster_mode="local", cores=4, memory="4g")
-elif cluster_mode == "k8s":  # For K8s cluster
-    init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2, memory="4g", master=..., container_image=...)
-elif cluster_mode == "yarn":  # For Hadoop/YARN cluster
-    init_orca_context(cluster_mode="yarn", num_nodes=2, cores=2, memory="4g")
-```
-
-This is the only place where you need to specify local or distributed mode. View [Orca Context](../Overview/orca-context.md) for more details.
-
-Please check the tutorials if you want to run on [Kubernetes](../Tutorial/k8s.md) or [Hadoop/YARN](../Tutorial/yarn.md) clusters.
-
-### Step 2: Define the Model
-
-You may define your model, loss and metrics in the same way as in any standard (single node) TensorFlow program.
-
-```python
-import tensorflow as tf
-
-def accuracy(logits, labels):
-    predictions = tf.argmax(logits, axis=1, output_type=labels.dtype)
-    is_correct = tf.cast(tf.equal(predictions, labels), dtype=tf.float32)
-    return tf.reduce_mean(is_correct)
-
-def lenet(images):
-    with tf.variable_scope('LeNet', [images]):
-        net = tf.layers.conv2d(images, 32, (5, 5), activation=tf.nn.relu, name='conv1')
-        net = tf.layers.max_pooling2d(net, (2, 2), 2, name='pool1')
-        net = tf.layers.conv2d(net, 64, (5, 5), activation=tf.nn.relu, name='conv2')
-        net = tf.layers.max_pooling2d(net, (2, 2), 2, name='pool2')
-        net = tf.layers.flatten(net)
-        net = tf.layers.dense(net, 1024, activation=tf.nn.relu, name='fc3')
-        logits = tf.layers.dense(net, 10)
-        return logits
-
-# tensorflow inputs
-images = tf.placeholder(dtype=tf.float32, shape=(None, 28, 28, 1))
-# tensorflow labels
-labels = tf.placeholder(dtype=tf.int32, shape=(None,))
-
-logits = lenet(images)
-loss = tf.reduce_mean(tf.losses.sparse_softmax_cross_entropy(logits=logits, labels=labels))
-acc = accuracy(logits, labels)
-```
-### Step 3: Define Train Dataset
-
-You can define the dataset using standard [tf.data.Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset). Orca also supports [Spark DataFrame](./spark-dataframe.md) and [Orca XShards](./xshards-pandas.md).
-
-```python
-import tensorflow_datasets as tfds
-
-def preprocess(data):
-    data['image'] = tf.cast(data["image"], tf.float32) / 255.
-    return data['image'], data['label']
-
-# get DataSet
-mnist_train = tfds.load(name="mnist", split="train", data_dir=...)
-mnist_test = tfds.load(name="mnist", split="test", data_dir=...)
-
-mnist_train = mnist_train.map(preprocess)
-mnist_test = mnist_test.map(preprocess)
-```
-
-### Step 4: Fit with Orca Estimator
-
-First, create an Orca Estimator for TensorFlow.
-
-```python
-from bigdl.orca.learn.tf.estimator import Estimator
-
-est = Estimator.from_graph(inputs=images,
-                           outputs=logits,
-                           labels=labels,
-                           loss=loss,
-                           optimizer=tf.train.AdamOptimizer(),
-                           metrics={"acc": acc})
-```
-
-Next, fit and evaluate using the Estimator.
-```python
-est.fit(data=mnist_train,
-        batch_size=320,
-        epochs=5,
-        validation_data=mnist_test)
-
-result = est.evaluate(mnist_test)
-print(result)
-```
-
-**Note:** You should call `stop_orca_context()` when your program finishes.
-
-That's it, the same code can run seamlessly on your local laptop and scale to [Kubernetes](../Tutorial/k8s.md) or [Hadoop/YARN](../Tutorial/yarn.md) clusters.
diff --git a/docs/readthedocs/source/doc/Orca/Howto/tf1keras-quickstart.md b/docs/readthedocs/source/doc/Orca/Howto/tf1keras-quickstart.md
deleted file mode 100644
index aa66da28..00000000
--- a/docs/readthedocs/source/doc/Orca/Howto/tf1keras-quickstart.md
+++ /dev/null
@@ -1,111 +0,0 @@
-# Scale Keras 2.3 Applications
-
----
-
-![](../../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/keras_lenet_mnist.ipynb) &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/keras_lenet_mnist.ipynb)
-
----
-
-**In this guide we will describe how to scale out _Keras 2.3_ programs using Orca in 4 simple steps.**
-
-
-### Step 0: Prepare Environment
-
-We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../Overview/install.md) for more details.
-
-```bash
-conda create -n py37 python=3.7  # "py37" is conda environment name, you can use any name you like.
-conda activate py37
-
-pip install bigdl-orca
-pip install tensorflow==1.15.0
-pip install tensorflow-datasets==2.1.0
-pip install psutil
-pip install pandas
-pip install scikit-learn
-```
-
-### Step 1: Init Orca Context
-```python
-from bigdl.orca import init_orca_context, stop_orca_context
-
-if cluster_mode == "local":  # For local machine
-    init_orca_context(cluster_mode="local", cores=4, memory="4g")
-elif cluster_mode == "k8s":  # For K8s cluster
-    init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2, memory="4g", master=..., container_image=...)
-elif cluster_mode == "yarn":  # For Hadoop/YARN cluster
-    init_orca_context(cluster_mode="yarn", num_nodes=2, cores=2, memory="4g")
-```
-
-This is the only place where you need to specify local or distributed mode. View [Orca Context](../Overview/orca-context.md) for more details.
-
-Please check the tutorials if you want to run on [Kubernetes](../Tutorial/k8s.md) or [Hadoop/YARN](../Tutorial/yarn.md) clusters.
-
-### Step 2: Define the Model
-
-You may define your model, loss and metrics in the same way as in any standard (single node) Keras program.
-
-```python
-from tensorflow import keras
-
-model = keras.Sequential(
-    [keras.layers.Conv2D(20, kernel_size=(5, 5), strides=(1, 1), activation='tanh',
-                         input_shape=(28, 28, 1), padding='valid'),
-     keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='valid'),
-     keras.layers.Conv2D(50, kernel_size=(5, 5), strides=(1, 1), activation='tanh',
-                         padding='valid'),
-     keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='valid'),
-     keras.layers.Flatten(),
-     keras.layers.Dense(500, activation='tanh'),
-     keras.layers.Dense(10, activation='softmax'),
-     ]
-)
-
-model.compile(optimizer=keras.optimizers.RMSprop(),
-              loss='sparse_categorical_crossentropy',
-              metrics=['accuracy'])
-```
-### Step 3: Define Train Dataset
-
-You can define the dataset using standard [tf.data.Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset). Orca also supports [Spark DataFrame](./spark-dataframe.md) and [Orca XShards](./xshards-pandas.md).
-
-```python
-import tensorflow as tf
-import tensorflow_datasets as tfds
-
-def preprocess(data):
-    data['image'] = tf.cast(data["image"], tf.float32) / 255.
-    return data['image'], data['label']
-
-# get DataSet
-mnist_train = tfds.load(name="mnist", split="train", data_dir=...)
-mnist_test = tfds.load(name="mnist", split="test", data_dir=...)
-
-mnist_train = mnist_train.map(preprocess)
-mnist_test = mnist_test.map(preprocess)
-```
-
-### Step 4: Fit with Orca Estimator
-
-First, create an Orca Estimator for TensorFlow.
-
-```python
-from bigdl.orca.learn.tf.estimator import Estimator
-
-est = Estimator.from_keras(keras_model=model)
-```
-
-Next, fit and evaluate using the Estimator.
-```python
-est.fit(data=mnist_train,
-        batch_size=320,
-        epochs=5,
-        validation_data=mnist_test)
-
-result = est.evaluate(mnist_test)
-print(result)
-```
-
-**Note:** You should call `stop_orca_context()` when your program finishes.
-
-That's it, the same code can run seamlessly on your local laptop and scale to [Kubernetes](../Tutorial/k8s.md) or [Hadoop/YARN](../Tutorial/yarn.md) clusters.
diff --git a/docs/readthedocs/source/doc/Orca/Howto/tf2keras-quickstart.md b/docs/readthedocs/source/doc/Orca/Howto/tf2keras-quickstart.md
deleted file mode 100644
index 86cd15f7..00000000
--- a/docs/readthedocs/source/doc/Orca/Howto/tf2keras-quickstart.md
+++ /dev/null
@@ -1,147 +0,0 @@
-# Scale TensorFlow 2 Applications
-
----
-
-![](../../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/tf2_keras_lenet_mnist.ipynb) &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/tf2_keras_lenet_mnist.ipynb)
-
----
-
-**In this guide we will describe how to to scale out _TensorFlow 2_ programs using Orca in 4 simple steps.**
-
-### Step 0: Prepare Environment
-
-We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../Overview/install.md) for more details.
-
-```bash
-conda create -n py37 python=3.7  # "py37" is conda environment name, you can use any name you like.
-conda activate py37
-
-pip install bigdl-orca[ray]
-pip install tensorflow
-```
-
-### Step 1: Init Orca Context
-```python
-from bigdl.orca import init_orca_context, stop_orca_context
-
-if cluster_mode == "local":  # For local machine
-    init_orca_context(cluster_mode="local", cores=4, memory="4g")
-elif cluster_mode == "k8s":  # For K8s cluster
-    init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2, memory="4g", master=..., container_image=...)
-elif cluster_mode == "yarn":  # For Hadoop/YARN cluster
-    init_orca_context(cluster_mode="yarn", num_nodes=2, cores=2, memory="4g")
-```
-
-This is the only place where you need to specify local or distributed mode. View [Orca Context](../Overview/orca-context.md) for more details.
-
-Please check the tutorials if you want to run on [Kubernetes](../Tutorial/k8s.md) or [Hadoop/YARN](../Tutorial/yarn.md) clusters.
-
-### Step 2: Define the Model
-
-You can then define and compile the Keras model in the _Creator Function_ using the standard TensorFlow 2 Keras APIs.
-
-```python
-import tensorflow as tf
-
-def model_creator(config):
-    model = tf.keras.Sequential(
-        [tf.keras.layers.Conv2D(20, kernel_size=(5, 5), strides=(1, 1), activation='tanh',
-                                input_shape=(28, 28, 1), padding='valid'),
-         tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='valid'),
-         tf.keras.layers.Conv2D(50, kernel_size=(5, 5), strides=(1, 1), activation='tanh',
-                                padding='valid'),
-         tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='valid'),
-         tf.keras.layers.Flatten(),
-         tf.keras.layers.Dense(500, activation='tanh'),
-         tf.keras.layers.Dense(10, activation='softmax'),
-         ]
-    )
-
-    model.compile(optimizer=tf.keras.optimizers.RMSprop(),
-                  loss='sparse_categorical_crossentropy',
-                  metrics=['accuracy'])
-    return model
-```
-### Step 3: Define the Dataset
-
-You can define the dataset in the _Creator Function_ using standard [tf.data.Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset) APIs. Orca also supports [Spark DataFrame](./spark-dataframe.md) and [Orca XShards](./xshards-pandas.md).
-
-
-```python
-def preprocess(x, y):
-    x = tf.cast(tf.reshape(x, (28, 28, 1)), dtype=tf.float32) / 255.0
-    return x, y
-
-def train_data_creator(config, batch_size):
-    (train_feature, train_label), _ = tf.keras.datasets.mnist.load_data()
-    dataset = tf.data.Dataset.from_tensor_slices((train_feature, train_label))
-    dataset = dataset.repeat()
-    dataset = dataset.map(preprocess)
-    dataset = dataset.shuffle(1000)
-    dataset = dataset.batch(batch_size)
-    return dataset
-
-def val_data_creator(config, batch_size):
-    _, (val_feature, val_label) = tf.keras.datasets.mnist.load_data()
-    dataset = tf.data.Dataset.from_tensor_slices((val_feature, val_label))
-    dataset = dataset.repeat()
-    dataset = dataset.map(preprocess)
-    dataset = dataset.batch(batch_size)
-    return dataset
-```
-
-### Step 4: Fit with Orca Estimator
-
-First, create an Orca Estimator for TensorFlow 2.
-
-```python
-from bigdl.orca.learn.tf2 import Estimator
-
-est = Estimator.from_keras(model_creator=model_creator, workers_per_node=2)
-```
-
-Next, fit and evaluate using the Estimator. 
-```python
-batch_size = 320
-train_stats = est.fit(train_data_creator,
-                      epochs=5,
-                      batch_size=batch_size,
-                      steps_per_epoch=60000 // batch_size,
-                      validation_data=val_data_creator,
-                      validation_steps=10000 // batch_size)
-
-eval_stats = est.evaluate(val_data_creator, num_steps=10000 // batch_size)
-print(eval_stats)
-```
-
-### Step 5: Save and Load the Model
-
-Orca TensorFlow 2 Estimator supports two formats to save and load the entire model (**TensorFlow SavedModel and Keras H5 Format**). The recommended format is SavedModel, which is the default format when you use `estimator.save()`.
-
-You could also save the model to Keras H5 format by passing `save_format='h5'` or a filename that ends in `.h5` or `.keras` to `estimator.save()`.
-
-**Note that if you run on Apache Hadoop/YARN cluster, you are recommended to save the model to HDFS and load from HDFS as well.**
-
-**1. SavedModel Format**
-
-```python
-# save model in SavedModel format
-est.save("lenet_model")
-
-# load model
-est.load("lenet_model")
-```
-
-**2. HDF5 format**
-
-```python
-# save model in H5 format
-est.save("lenet_model.h5", save_format='h5')
-
-# load model
-est.load("lenet_model.h5")
-```
-
-**Note:** You should call `stop_orca_context()` when your program finishes.
-
-That's it, the same code can run seamlessly on your local laptop and scale to [Kubernetes](../Tutorial/k8s.md) or [Hadoop/YARN](../Tutorial/yarn.md) clusters.
diff --git a/docs/readthedocs/source/doc/Orca/Howto/xshards-pandas.md b/docs/readthedocs/source/doc/Orca/Howto/xshards-pandas.md
deleted file mode 100644
index f862f741..00000000
--- a/docs/readthedocs/source/doc/Orca/Howto/xshards-pandas.md
+++ /dev/null
@@ -1,121 +0,0 @@
-# Use Distributed Pandas for Deep Learning
-
----
-
-![](../../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/ncf_xshards_pandas.ipynb) &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/ncf_xshards_pandas.ipynb)
-
----
-
-**In this guide we will describe how to use [XShards](../Orca/Overview/data-parallel-processing.md) to scale-out Pandas data processing for distributed deep learning.** 
-
-### 1. Read input data into XShards of Pandas DataFrame
-
-First, read CVS, JSON or Parquet files into an `XShards` of Pandas Dataframe (i.e., a distributed and sharded dataset where each partition contained a Pandas Dataframe), as shown below:
-
-```python
-from bigdl.orca.data.pandas import read_csv
-full_data = read_csv(new_rating_files, sep=':', header=None,
-                     names=['user', 'item', 'label'], usecols=[0, 1, 2],
-                     dtype={0: np.int32, 1: np.int32, 2: np.int32})
-```
-
-### 2. Process Pandas Dataframes using XShards
-
-Next, use XShards to efficiently process large-size Pandas Dataframes in a distributed and data-parallel fashion. You may run standard Python code on each partition in a data-parallel fashion using `XShards.transform_shard`, as shown below:
-
-```python
-# update label starting from 0. That's because ratings go from 1 to 5, while the matrix columns go from 0 to 4
-def update_label(df):
-  df['label'] = df['label'] - 1
-  return df
-
-full_data = full_data.transform_shard(update_label)
-```
-
-```python
-from sklearn.model_selection import train_test_split
-
-# split to train/test dataset
-def split_train_test(data):
-  train, test = train_test_split(data, test_size=0.2, random_state=100)
-  return train, test
-
-train_data, test_data = full_data.transform_shard(split_train_test).split()
-```
-
-### 3. Define NCF model
-
-Define the NCF model using TensorFlow 1.15 APIs:
-
-```python
-import tensorflow as tf
-
-class NCF(object):
-    def __init__(self, embed_size, user_size, item_size):
-        self.user = tf.placeholder(dtype=tf.int32, shape=(None,))
-        self.item = tf.placeholder(dtype=tf.int32, shape=(None,))
-        self.label = tf.placeholder(dtype=tf.int32, shape=(None,))
-        
-        with tf.name_scope("GMF"):
-            user_embed_GMF = tf.contrib.layers.embed_sequence(self.user, vocab_size=user_size + 1,
-                                                              embed_dim=embed_size)
-            item_embed_GMF = tf.contrib.layers.embed_sequence(self.item, vocab_size=item_size + 1,
-                                                              embed_dim=embed_size)
-            GMF = tf.multiply(user_embed_GMF, item_embed_GMF)
-
-        with tf.name_scope("MLP"):
-            user_embed_MLP = tf.contrib.layers.embed_sequence(self.user, vocab_size=user_size + 1,
-                                                              embed_dim=embed_size)
-            item_embed_MLP = tf.contrib.layers.embed_sequence(self.item, vocab_size=item_size + 1,
-                                                              embed_dim=embed_size)
-            interaction = tf.concat([user_embed_MLP, item_embed_MLP], axis=-1)
-            layer1_MLP = tf.layers.dense(inputs=interaction, units=embed_size * 2)
-            layer1_MLP = tf.layers.dropout(layer1_MLP, rate=0.2)
-            layer2_MLP = tf.layers.dense(inputs=layer1_MLP, units=embed_size)
-            layer2_MLP = tf.layers.dropout(layer2_MLP, rate=0.2)
-            layer3_MLP = tf.layers.dense(inputs=layer2_MLP, units=embed_size // 2)
-            layer3_MLP = tf.layers.dropout(layer3_MLP, rate=0.2)
-
-        # Concate the two parts together
-        with tf.name_scope("concatenation"):
-            concatenation = tf.concat([GMF, layer3_MLP], axis=-1)
-            self.logits = tf.layers.dense(inputs=concatenation, units=5)
-            self.logits_softmax = tf.nn.softmax(self.logits)
-            self.class_number = tf.argmax(self.logits_softmax, 1)
-
-        with tf.name_scope("loss"):
-            self.loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
-                labels=self.label, logits=self.logits, name='loss'))
-
-        with tf.name_scope("optimzation"):
-            self.optim = tf.train.AdamOptimizer(1e-3, name='Adam')
-            self.optimizer = self.optim.minimize(self.loss)
-
-embedding_size=16
-model = NCF(embedding_size, max_user_id, max_item_id)
-```
-### 4. Fit with Orca Estimator
-
-Finally, directly run distributed model training/inference on the XShards of Pandas DataFrames.
-
-```python
-from bigdl.orca.learn.tf.estimator import Estimator
-
-# create an Estimator.
-estimator = Estimator.from_graph(
-            inputs=[model.user, model.item], # the model accept two inputs and one label
-            outputs=[model.class_number],
-            labels=[model.label],
-            loss=model.loss,
-            optimizer=model.optim,
-            model_dir=model_dir,
-            metrics={"loss": model.loss})
-
-# fit the Estimator
-estimator.fit(data=train_data,
-              batch_size=1280,
-              epochs=1,
-              feature_cols=['user', 'item'], # specifies which column(s) to be used as inputs
-              label_cols=['label'], # specifies which column(s) to be used as labels
-              validation_data=test_data)
-```
diff --git a/docs/readthedocs/source/doc/Orca/Overview/data-parallel-processing.md b/docs/readthedocs/source/doc/Orca/Overview/data-parallel-processing.md
deleted file mode 100644
index 8fb138c5..00000000
--- a/docs/readthedocs/source/doc/Orca/Overview/data-parallel-processing.md
+++ /dev/null
@@ -1,143 +0,0 @@
-# Distributed Data Processing 
-
----
-
-**Orca provides efficient support of distributed data-parallel processing pipeline, a critical component for large-scale AI applications.**
-
-### 1. TensorFlow Dataset and PyTorch DataLoader
-
-Orca will seamlessly parallelize the standard `tf.data.Dataset` or `torch.utils.data.DataLoader` pipelines across a large cluster in a data-parallel fashion, which can be directly used for distributed deep learning training, as shown below:
-
-TensorFlow Dataset:
-```python
-import tensorflow as tf
-import tensorflow_datasets as tfds
-from bigdl.orca.learn.tf.estimator import Estimator
-
-def preprocess(data):
-    data['image'] = tf.cast(data["image"], tf.float32) / 255.
-    return data['image'], data['label']
-
-dataset = tfds.load(name="mnist", split="train", data_dir=dataset_dir)
-dataset = dataset.map(preprocess)
-dataset = dataset.shuffle(1000)
-
-est = Estimator.from_keras(keras_model=model)
-est.fit(data=dataset)
-```
-
-Pytorch DataLoader:
-```python
-import torch
-from torchvision import datasets, transforms
-from bigdl.orca.learn.pytorch import Estimator
-
-train_loader = torch.utils.data.DataLoader(
-        datasets.MNIST("/tmp/mnist", train=True, download=True,
-                       transform=transforms.Compose([
-                           transforms.ToTensor(),
-                           transforms.Normalize((0.1307,), (0.3081,))
-                       ])),
-        batch_size=batch_size, shuffle=True)
-
-est = Estimator.from_torch(model=torch_model, optimizer=torch_optim, loss=torch_criterion)
-est.fit(data=train_loader)
-```
-
-Under the hood, Orca will automatically replicate the _TensorFlow Dataset_ or _PyTorch DataLoader_ pipeline on each node in the cluster, shard the input data, and execute the data pipelines using Apache Spark and/or Ray distributedly. 
-
-_**Note:** Known limitations include:_
-1. _TensorFlow Dataset pipeline that contains transformations defined in native python function, such as `tf.py_func`, `tf.py_function`
-and `tf.numpy_function` are currently not supported._
-2. _TensorFlow Dataset pipeline created from generators, such as `Dataset.from_generators` are currently not supported._
-3. _For TensorFlow Dataset and Pytorch DataLoader pipelines that read from files (including `tf.data.TFRecordDataset` and `tf.data.TextLineDataset`), one needs to ensure that the same file paths can be accessed on every node in the cluster._
-
-#### 1.1. Data Creator Function
-Alternatively, the user may also pass a *Data Creator Function* as the input to the distributed training and inference. Inside the *Data Creator Function*, the user needs to create and return a `tf.data.Dataset` or `torch.utils.data.DataLoader` object, as shown below.
-
-TensorFlow:
-```python
-import tensorflow as tf
-import tensorflow_datasets as tfds
-def preprocess(data):
-    data['image'] = tf.cast(data["image"], tf.float32) / 255.
-    return data['image'], data['label']
-
-def train_data_creator(config, batch_size):
-    dataset = tfds.load(name="mnist", split="train", data_dir=dataset_dir)
-    dataset = dataset.map(preprocess)
-    dataset = dataset.shuffle(1000)
-    dataset = dataset.batch(batch_size)
-    return dataset
-```
-
-Pytorch:
-```python
-def train_data_creator(config, batch_size):
-    train_loader = torch.utils.data.DataLoader(
-            datasets.MNIST(config["dir"], train=True, download=True,
-                           transform=transforms.Compose([
-                               transforms.ToTensor(),
-                               transforms.Normalize((0.1307,), (0.3081,))
-                           ])),
-            batch_size=batch_size, shuffle=True)
-    return train_loader
-```
-
-### 2. Spark Dataframes
-Orca supports Spark Dataframes as the input to the distributed training, and as the input/output of the distributed inference. Consequently, the user can easily process large-scale dataset using Apache Spark, and directly apply AI models on the distributed (and possibly in-memory) Dataframes without data conversion or serialization. 
-
-```python
-df = spark.read.parquet("data.parquet")
-est = Estimator.from_keras(keras_model=model) # the model accept two inputs and one label
-est.fit(data=df,
-        feature_cols=['user', 'item'], # specifies which column(s) to be used as inputs
-        label_cols=['label']) # specifies which column(s) to be used as labels
-```
-
-### 3. XShards (Distributed Data-Parallel Python Processing)
-
-`XShards` in Orca allows the user to process large-scale dataset using *existing* Python codes in a distributed and data-parallel fashion, as shown below. 
-
-```python
-import numpy as np
-from bigdl.orca.data import XShards
-
-train_images = np.random.random((20, 3, 224, 224))
-train_label_images = np.zeros(20)
-train_shards = XShards.partition([train_images, train_label_images])
-
-def transform_to_dict(train_data):
-    return {"x": train_data[0], "y": train_data[1]}
-    
-train_shards = train_shards.transform_shard(transform_to_dict)
-```
-
-In essence, an `XShards` contains an automatically sharded (or partitioned) Python object (e.g., Pandas Dataframe, Numpy NDArray,  Python Dictionary or List, etc.). Each partition of the `XShards` stores a subset of the Python object and is distributed across different nodes in the cluster; and the user may run arbitrary Python codes on each partition in a data-parallel fashion using `XShards.transform_shard`.
-
-View the related [Python API doc](./data) for more details.
- 
-#### 3.1 Data-Parallel Pandas
-The user may use `XShards` to efficiently process large-size Pandas Dataframes in a distributed and data-parallel fashion.
-
-First, the user can read CVS, JSON or Parquet files (stored on local disk, HDFS, AWS S3, etc.) to obtain an `XShards` of Pandas Dataframe, as shown below:
-```python
-from bigdl.orca.data.pandas import read_csv
-csv_path = "/path/to/csv_file_or_folder"
-shard = read_csv(csv_path)
-```
-
-Each partition of the returned `XShards` stores a Pandas Dataframe object (containing a subset of the entire dataset), and then the user can apply Pandas operations as well as other (e.g., sklearn) operations on each partition, as shown below:   
-```python
-def negative(df, column_name):
-    df[column_name] = df[column_name] * (-1)
-    return df
-    
-train_shards = shard.transform_shard(negative, 'value')
-```
-
-In addition, some global operations  (such as `partition_by`, `unique`, etc.) are also supported on the `XShards` of Pandas Dataframe, as shown below:
-```python
-shard.partition_by(cols="location", num_partitions=4)
-location_list = shard["location"].unique()
-```
diff --git a/docs/readthedocs/source/doc/Orca/Overview/distributed-training-inference.md b/docs/readthedocs/source/doc/Orca/Overview/distributed-training-inference.md
deleted file mode 100644
index a8b4b5a5..00000000
--- a/docs/readthedocs/source/doc/Orca/Overview/distributed-training-inference.md
+++ /dev/null
@@ -1,346 +0,0 @@
-# Distributed Training and Inference
-
----
-
-**Orca `Estimator` provides sklearn-style APIs for transparently distributed model training and inference** 
-
-### 1. Estimator
-
-To perform distributed training and inference, the user can first create an Orca `Estimator` from any standard (single-node) TensorFlow, Kera or PyTorch model, and then call `Estimator.fit` or `Estimator.predict`  methods (using the [data-parallel processing pipeline](./data-parallel-processing.md) as input).
-
-Under the hood, the Orca `Estimator` will replicate the model on each node in the cluster, feed the data partition (generated by the data-parallel processing pipeline) on each node to the local model replica, and synchronize model parameters using various *backend* technologies (such as *Horovod*, `tf.distribute.MirroredStrategy`, `torch.distributed`, or the parameter sync layer in [*BigDL*](https://github.com/intel-analytics/BigDL)).
-
-### 2. TensorFlow/Keras Estimator
-
-#### 2.1 TensorFlow 1.15 and Keras 2.3
-
-There are two ways to create an Estimator for TensorFlow 1.15, either from a low level computation graph or a Keras model. Examples are as follow:
-
-TensorFlow Computation Graph:
-```python
-# define inputs to the graph
-images = tf.placeholder(dtype=tf.float32, shape=(None, 28, 28, 1))
-labels = tf.placeholder(dtype=tf.int32, shape=(None,))
-
-# define the network and loss
-logits = lenet(images)
-loss = tf.reduce_mean(tf.losses.sparse_softmax_cross_entropy(logits=logits, labels=labels))
-
-# define a metric
-acc = accuracy(logits, labels)
-
-# create an estimator using endpoints of the graph
-est = Estimator.from_graph(inputs=images,
-                           outputs=logits,
-                           labels=labels,
-                           loss=loss,
-                           optimizer=tf.train.AdamOptimizer(),
-                           metrics={"acc": acc})
-```
-
-Keras Model:
-```python
-model = create_keras_lenet_model()
-model.compile(optimizer=keras.optimizers.RMSprop(),
-              loss='sparse_categorical_crossentropy',
-              metrics=['accuracy'])
-est = Estimator.from_keras(keras_model=model)
-```
-
-Then users can perform distributed model training and inference as follows:
-
-```python
-dataset = tfds.load(name="mnist", split="train")
-dataset = dataset.map(preprocess)
-est.fit(data=mnist_train,
-        batch_size=320,
-        epochs=max_epoch)
-predictions = est.predict(data=df,
-                          feature_cols=['image'])
-```
-The `data` argument in `fit` method can be a Spark DataFrame, an *XShards* or a `tf.data.Dataset`. The `data` argument in `predict` method can be a spark DataFrame or an *XShards*. See the *data-parallel processing pipeline* [page](./data-parallel-processing.md) for more details.
-
-View the related [Python API doc](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Orca/orca.html#module-bigdl.orca.learn.tf.estimator) for more details.
-
-#### 2.2 TensorFlow 2.x and Keras 2.4+
-
-**Using `ray` or *Horovod* backend**
-
-Users can create an `Estimator` for TensorFlow 2.x from a Keras model (using a _Model Creator Function_) when the backend is
-`ray` (currently default for TF2) or *Horovod*. For example:
-
-```python
-def model_creator(config):
-    model = create_keras_lenet_model()
-    model.compile(optimizer=keras.optimizers.RMSprop(),
-                  loss='sparse_categorical_crossentropy',
-                  metrics=['accuracy'])
-    return model
-est = Estimator.from_keras(model_creator=model_creator) # or backend="horovod"
-```
-
-The `model_creator` argument should be a function that takes a `config` dictionary and returns a compiled Keras model.
-
-Then users can perform distributed model training and inference as follows:
-
-```python
-def train_data_creator(config, batch_size):
-    dataset = tfds.load(name="mnist", split="train")
-    dataset = dataset.map(preprocess)
-    dataset = dataset.batch(batch_size)
-    return dataset
-stats = est.fit(data=train_data_creator,
-                epochs=max_epoch,
-                steps_per_epoch=total_size // batch_size)
-predictions = est.predict(data=df,
-                          feature_cols=['image'])
-```
-
-The `data` argument in `fit` method can be a spark DataFrame, an *XShards* or a *Data Creator Function* (that returns a `tf.data.Dataset`). The `data` argument in `predict` method can be a spark DataFrame or an *XShards*. See the *data-parallel processing pipeline* [page](./data-parallel-processing.md) for more details.
-
-View the related [Python API doc](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Orca/orca.html#orca-learn-tf2-tf2-ray-estimator) for more details.
-
-**Using *spark* backend**
-
-Users can create an `Estimator` for TensorFlow 2.x using the *spark* backend as follows:
-
-```python
-def model_creator(config):
-    model = create_keras_lenet_model()
-    model.compile(**compile_args(config))
-    return model
-
-def compile_args(config):
-    if "lr" in config:
-        lr = config["lr"]
-    else:
-        lr = 1e-2
-    args = {
-        "optimizer": keras.optimizers.SGD(lr),
-        "loss": "mean_squared_error",
-        "metrics": ["mean_squared_error"]
-    }
-    return args
-
-est = Estimator.from_keras(model_creator=model_creator,
-                           config={"lr": 1e-2},
-                           workers_per_node=2,
-                           backend="spark",
-                           model_dir=model_dir)
-```
-
-The `model_creator` argument should be a function that takes a `config` dictionary and returns a compiled Keras model.
-The `model_dir` argument is required for *spark* backend, it should be a share filesystem path which can be accessed by executors for culster mode.  
-
-Then users can perform distributed model training and inference as follows:
-
-```python
-def train_data_creator(config, batch_size):
-    dataset = tfds.load(name="mnist", split="train")
-    dataset = dataset.map(preprocess)
-    dataset = dataset.batch(batch_size)
-    return dataset
-stats = est.fit(data=train_data_creator,
-                epochs=max_epoch,
-                steps_per_epoch=total_size // batch_size)
-predictions = est.predict(data=df,
-                          feature_cols=['image']).collect()
-```
-
-The `data` argument in `fit` method can be a spark DataFrame, an *XShards* or a *Data Creator Function* (that returns a `tf.data.Dataset`). The `data` argument in `predict` method can be a spark DataFrame or an *XShards*. See the *data-parallel processing pipeline* [page](./data-parallel-processing.md) for more details.
-
-View the related [Python API doc](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Orca/orca.html#orca-learn-tf2-tf2-spark-estimator) for more details.
-
-### 3. PyTorch Estimator
-
-**Using *BigDL* backend**
-
-Users may create a PyTorch `Estimator` using the *Spark* backend (currently default for PyTorch) as follows:
-
-```python
-def model_creator(config):
-    model = LeNet() # a torch.nn.Module
-    model.train()
-    return model
-
-def optimizer_creator(model, config):
-    return torch.optim.Adam(model.parameters(), config["lr"])
-
-est = Estimator.from_torch(model=model_creator,
-                           optimizer=optimizer_creator,
-                           loss=nn.NLLLoss(),
-                           config={"lr": 1e-2})
-```
-
-Then users can perform distributed model training and inference as follows:
-
-```python
-est.fit(data=train_loader, epochs=args.epochs)
-predictions = est.predict(xshards)
-```
-
-The input to `fit` methods can be a `torch.utils.data.DataLoader`, a Spark Dataframe, an *XShards*, or a *Data Creator Function* (that returns a `torch.utils.data.DataLoader`). The input to `predict` methods should be a Spark Dataframe, or an *XShards*. See the *data-parallel processing pipeline* [page](./data-parallel-processing.md) for more details.
-
-View the related [Python API doc](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Orca/orca.html#orca-learn-pytorch-pytorch-spark-estimator) for more details.
-
-**Using `torch.distributed` or *Horovod* backend**
-
-Alternatively, users can create a PyTorch `Estimator` using `torch.distributed` or *Horovod* backend by specifying the `backend` argument to be "ray" or "horovod". In this case, the `model` and `optimizer` should be wrapped in _Creater Functions_. For example:
-
-```python
-def model_creator(config):
-    model = LeNet() # a torch.nn.Module
-    model.train()
-    return model
-
-def optimizer_creator(model, config):
-    return torch.optim.Adam(model.parameters(), config["lr"])
-
-est = Estimator.from_torch(model=model_creator,
-                           optimizer=optimizer_creator,
-                           loss=nn.NLLLoss(),
-                           config={"lr": 1e-2},
-                           backend="ray") # or backend="horovod"
-```
-
-Then users can perform distributed model training and inference as follows:
-
-```python
-est.fit(data=train_loader_func, epochs=args.epochs)
-predictions = est.predict(data=df,
-                          feature_cols=['image'])
-```
-
-The input to `fit` methods can be a Spark DataFrame, an *XShards*, or a *Data Creator Function* (that returns a `torch.utils.data.DataLoader`). The `data` argument in `predict` method can be a Spark DataFrame or an *XShards*. See the *data-parallel processing pipeline* [page](./data-parallel-processing.md) for more details.
-
-View the related [Python API doc](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Orca/orca.html#orca-learn-pytorch-pytorch-ray-estimator) for more details.
-
-### 4. MXNet Estimator
-
-The user may create a MXNet `Estimator` as follows:
-```python
-from bigdl.orca.learn.mxnet import Estimator, create_config
-
-def get_model(config):
-    net = LeNet() # a mxnet.gluon.Block
-    return net
-
-def get_loss(config):
-    return gluon.loss.SoftmaxCrossEntropyLoss()
-
-config = create_config(log_interval=2, optimizer="adam",
-                       optimizer_params={'learning_rate': 0.02})
-est = Estimator.from_mxnet(config=config,
-                           model_creator=get_model,
-                           loss_creator=get_loss,
-                           num_workers=2)
-```
-
-Then the user can perform distributed model training as follows:
-```python
-import numpy as np
-
-def get_train_data_iter(config, kv):
-    train = mx.io.NDArrayIter(data_ndarray, label_ndarray,
-                              batch_size=config["batch_size"], shuffle=True)
-    return train
-
-est.fit(get_train_data_iter, epochs=2)
-```
-
-The input to `fit` methods can be an *XShards*, or a *Data Creator Function* (that returns an `MXNet DataIter/DataLoader`). See the *data-parallel processing pipeline* [page](./data-parallel-processing.html) for more details.
-
-### 5. BigDL Estimator
-
-The user may create a BigDL `Estimator` as follows:
-```python
-from bigdl.dllib.nn.criterion import *
-from bigdl.dllib.nn.layer import *
-from bigdl.dllib.optim.optimizer import *
-from bigdl.orca.learn.bigdl import Estimator
-
-linear_model = Sequential().add(Linear(2, 2))
-mse_criterion = MSECriterion()
-est = Estimator.from_bigdl(model=linear_model, loss=mse_criterion, optimizer=Adam())
-```
-
-Then the user can perform distributed model training and inference as follows:
-```python
-# read spark Dataframe
-df = spark.read.parquet("data.parquet")
-
-# distributed model training
-est.fit(df, 1, batch_size=4)
-
-#distributed model inference
-result_df = est.predict(df)
-```
-
-The input to `fit` and `predict` methods can be a *Spark Dataframe*, or an *XShards*. See the *data-parallel processing pipeline* [page](./data-parallel-processing.html) for more details.
-
-View the related [Python API doc](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Orca/orca.html#module-bigdl.orca.learn.bigdl.estimator) for more details.
-
-### 6. OpenVINO Estimator
-
-The user may create a OpenVINO `Estimator` as follows:
-```python
-from bigdl.orca.learn.openvino import Estimator
-
-model_path = "The/file_path/to/the/OpenVINO_IR_xml_file"
-est = Estimator.from_openvino(model_path=model_path)
-```
-
-Then the user can perform distributed model inference as follows:
-```python
-# ndarray
-input_data = np.random.random([20, 4, 3, 224, 224])
-result = est.predict(input_data)
-
-# xshards
-shards = XShards.partition({"x": input_data})
-result_shards = est.predict(shards)
-```
-
-The input to `predict` methods can be an *XShards*, or a *numpy array*. See the *data-parallel processing pipeline* [page](./data-parallel-processing.html) for more details.
-
-View the related [Python API doc](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Orca/orca.html#orca-learn-openvino-estimator) for more details.
-
-### 7. MPI Estimator
-The Orca MPI Estimator is to run distributed training job based on MPI.
-
-#### Preparation:
-* Configure password-less ssh from the master node (the one you'll launch training from) to all other nodes.
-
-* All hosts have the same working directory.
-* All hosts have the same Python environment in the same location.
-
-#### Train
-Then the user may create a MPI Estimator as follows:
-```python
-from bigdl.orca.learn.mpi import MPIEstimator
-
-est = MPIEstimator(model_creator=model_creator,
-                   optimizer_creator=optimizer_creator,
-                   loss_creator=None,
-                   metrics=None,
-                   scheduler_creator=None,
-                   config=config,
-                   init_func=init,  # Init the distributed environment for MPI if any
-                   hosts=hosts,
-                   workers_per_node=workers_per_node,
-                   env=None)                   
-```
-Then the user can perform distributed model training as follows:
-```python
-# read spark Dataframe
-df = spark.read.parquet("data.parquet")
-
-# distributed model training
-est.fit(data=df, epochs=1, batch_size=4, feature_col="feature", label_cols="label")
-
-```
-The input to `fit` methods can be an Spark Dataframe, or a callable  function to return a `torch.utils.data.DataLoader`. 
-
-View the related [Python API doc](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Orca/orca.html#orca-learn-mpi-mpi-estimator) for more details.
-
-
diff --git a/docs/readthedocs/source/doc/Orca/Overview/distributed-tuning.md b/docs/readthedocs/source/doc/Orca/Overview/distributed-tuning.md
deleted file mode 100644
index 5e15254f..00000000
--- a/docs/readthedocs/source/doc/Orca/Overview/distributed-tuning.md
+++ /dev/null
@@ -1,213 +0,0 @@
-# Distributed Hyper-Parameter Tuning
-
----
-
-**Orca `AutoEstimator` provides similar APIs as Orca `Estimator` for distributed hyper-parameter tuning.**
-
-
-
-### 1. AutoEstimator
-
-To perform distributed hyper-parameter tuning, user can first create an Orca `AutoEstimator` from standard TensorFlow Keras or PyTorch model, and then call `AutoEstimator.fit`.
-
-Under the hood, the Orca `AutoEstimator` generates different trials and schedules them on each mode in the cluster. Each trial runs a different combination of hyper parameters, sampled from the user-desired hyper-parameter space.
-HDFS is used to save temporary results of each trial and all the results will be finally transferred to driver for further analysis.
-
-### 2. Pytorch AutoEstimator
-
-User could pass *Creator Function*s, including *Data Creator Function*, *Model Creator Function* and *Optimizer Creator Function* to `AutoEstimator` for training.
-
-The *Creator Function*s should take a parameter of `config` as input and get the hyper-parameter values from `config` to enable hyper parameter search.
-
-#### 2.1 Data Creator Function
-You can define the train and validation datasets using *Data Creator Function*. The *Data Creator Function* takes `config` as input and returns a `torch.utils.data.DataLoader` object, as shown below.
-```python
-# "batch_size" is the hyper-parameter to be tuned.
-def train_loader_creator(config):
-    train_loader = torch.utils.data.DataLoader(
-        datasets.MNIST(dir, train=True, download=True,
-                       transform=transforms.Compose([
-                           transforms.ToTensor(),
-                           transforms.Normalize((0.1307,), (0.3081,))
-                       ])),
-        batch_size=config["batch_size"], shuffle=True)
-    return train_loader
-```
-The input data for Pytorch `AutoEstimator` can be a *Data Creator Function* or a tuple of numpy ndarrays in the form of (x, y), where x is training input data and y is training target data.
-
-#### 2.2 Model Creator Function
-*Model Creator Function* also takes `config` as input and returns a `torch.nn.Module` object, as shown below.
-
-```python
-import torch.nn as nn
-class LeNet(nn.Module):
-    def __init__(self, fc1_hidden_size=500):
-        super(LeNet, self).__init__()
-        self.conv1 = nn.Conv2d(1, 20, 5, 1)
-        self.conv2 = nn.Conv2d(20, 50, 5, 1)
-        self.fc1 = nn.Linear(4*4*50, fc1_hidden_size)
-        self.fc2 = nn.Linear(fc1_hidden_size, 10)
-
-    def forward(self, x):
-        pass
-
-def model_creator(config):
-    # "fc1_hidden_size" is the hyper-parameter to be tuned.
-    model = LeNet(fc1_hidden_size=config["fc1_hidden_size"])
-    return model
-```
-
-#### 2.3 Optimizer Creator Function
-*Optimizer Creator Function* takes `model` and `config` as input, and returns a `torch.optim.Optimizer` object.
-```python
-import torch
-def optim_creator(model, config):
-    return torch.optim.Adam(model.parameters(), lr=config["lr"])
-```
-
-Note that the `optimizer` argument in Pytorch `AutoEstimator` constructor could be a *Optimizer Creator Function* or a string, which is the name of Pytorch Optimizer. The above *Optimizer Creator Function* has the same functionality with "Adam".
-
-#### 2.4 Create and Fit Pytorch AutoEstimator
-User could create a Pytorch `AutoEstimator` as below.
-```python
-from bigdl.orca.automl.auto_estimator import AutoEstimator
-
-auto_est = AutoEstimator.from_torch(model_creator=model_creator,
-                                    optimizer=optim_creator,
-                                    loss=nn.NLLLoss(),
-                                    logs_dir="/tmp/orca_automl_logs",
-                                    resources_per_trial={"cpu": 2},
-                                    name="lenet_mnist")
-```
-Then user can perform distributed hyper-parameter tuning as follows. For more details about the `search_space` argument, view the *search space and search algorithms* [page](#search-space-and-search-algorithms).
-```python
-auto_est.fit(data=train_loader_creator,
-             validation_data=test_loader_creator,
-             search_space=search_space,
-             n_sampling=2,
-             epochs=1,
-             metric="accuracy")
-```
-Finally, user can get the best learned model and the best hyper-parameters for further deployment.
-```python
-best_model = auto_est.get_best_model() # a `torch.nn.Module` object
-best_config = auto_est.get_best_config() # a dictionary of hyper-parameter names and values.
-```
-View the related [Python API doc](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Orca/automl.html#orca-automl-auto-estimator) for more details.
-
-### 3. TensorFlow/Keras AutoEstimator
-Users can create an `AutoEstimator` for TensorFlow Keras from a `tf.keras` model (using a *Model Creator Function*). For example:
-
-```python
-def model_creator(config):
-    model = tf.keras.models.Sequential([tf.keras.layers.Dense(config["hidden_size"],
-                                                              input_shape=(1,)),
-                                        tf.keras.layers.Dense(1)])
-    model.compile(loss="mse",
-                  optimizer=tf.keras.optimizers.SGD(config["lr"]),
-                  metrics=["mse"])
-    return model
-
-auto_est = AutoEstimator.from_keras(model_creator=model_creator,
-                                    logs_dir="/tmp/orca_automl_logs",
-                                    resources_per_trial={"cpu": 2},
-                                    name="auto_keras")
-```
-
-Then user can perform distributed hyper-parameter tuning as follows. For more details about `search_space`, view the *search space and search algorithms* [page](#search-space-and-search-algorithms).
-```python
-auto_est.fit(data=train_data,
-             validation_data=val_data,
-             search_space=search_space,
-             n_sampling=2,
-             epochs=1,
-             metric="accuracy")
-```
-The `data` and `validation_data` in `fit` method can only be a tuple of numpy ndarrays. We haven't support *Data Create Function* now. The numpy ndarray should also be in the form of (x, y), where x is training input data and y is training target data.
-
-Finally, user can get the best learned model and the best hyper-parameters for further deployment.
-```python
-best_model = auto_est.get_best_model() # a `torch.nn.Module` object
-best_config = auto_est.get_best_config() # a dictionary of hyper-parameter names and values.
-```
-View the related [Python API doc](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Orca/automl.html#orca-automl-auto-estimator) for more details.
-
-### 4. Search Space and Search Algorithms
-For Hyper-parameter Optimization, user should define the search space of various hyper-parameter values for neural network training, as well as how to search through the chosen hyper-parameter space.
-
-#### 4.1 Basic Search Algorithms
-
-For basic search algorithms like **Grid Search** and **Random Search**, we provide several sampling functions with `automl.hp`. See [API doc](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Orca/automl.html#orca-automl-hp) for more details.
-
-`AutoEstimator` requires a dictionary for the `search_space` argument in `fit`.
-In the dictionary, the keys are the hyper-parameter names, and the values specify how to sample the search spaces for the hyper-parameters.
-
-```python
-from bigdl.orca.automl import hp
-
-search_space = {
-    "fc1_hidden_size": hp.grid_search([500, 600]),
-    "lr": hp.loguniform(0.001, 0.1),
-    "batch_size": hp.choice([160, 320, 640]),
-}
-```
-
-#### 4.2 Advanced Search Algorithms
-Beside grid search and random search, user could also choose to use some advanced hyper-parameter optimization methods,
-such as [Ax](https://ax.dev/), [Bayesian Optimization](https://github.com/fmfn/BayesianOptimization), [Scikit-Optimize](https://scikit-optimize.github.io), etc. We supported all *Search Algorithms* in [Ray Tune](https://docs.ray.io/en/master/index.html). View the [Ray Tune Search Algorithms](https://docs.ray.io/en/master/tune/api_docs/suggestion.html) for more details.
-Note that you should install the dependency for your search algorithm manually.
-
-Take bayesian optimization as an instance. You need to first install the dependency with
-
-```bash
-pip install bayesian-optimization
-```
-
-And pass the search algorithm name to `search_alg` in `AutoEstimator.fit`.
-```python
-from bigdl.orca.automl import hp
-
-search_space = {
-    "width": hp.uniform(0, 20),
-    "height": hp.uniform(-100, 100)
-}
-
-auto_estimator.fit(
-    data,
-    search_space=search_space,
-    metric="mean_loss",
-    mode="min",
-    search_alg="bayesopt",
-)
-```
-See [API Doc](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Orca/automl.html#orca-automl-auto-estimator) for more details.
-
-### 5. Scheduler
-*Scheduler* can stop/pause/tweak the hyper-parameters of running trials, making the hyper-parameter tuning process much efficient.
-
-We support all *Schedulers* in [Ray Tune](https://docs.ray.io/en/master/index.html). See [Ray Tune Schedulers](https://docs.ray.io/en/master/tune/api_docs/schedulers.html#schedulers-ref) for more details.
-
-User can pass the *Scheduler* name to `scheduler` in `AutoEstimator.fit`. The *Scheduler* names supported are "fifo", "hyperband", "async_hyperband", "median_stopping_rule", "hb_bohb", "pbt", "pbt_replay".
-The default `scheduler` is "fifo", which just runs trials in submission order.
-
-See examples below about how to use *Scheduler* in `AutoEstimator`.
-```python
-scheduler_params = dict(
-            max_t=50,
-            grace_period=1,
-            reduction_factor=3,
-            brackets=3,
-        )
-
-auto_estimator.fit(
-    data,
-    search_space=search_space,
-    metric="mean_loss",
-    mode="min",
-    search_alg="skopt",
-    scheduler = "AsyncHyperBand",
-    scheduler_params=scheduler_params
-)
-```
-*Scheduler* shares the same parameters as ray tune schedulers.
-And `scheduler_params` are extra parameters for `scheduler` other than `metric` and `mode`.
diff --git a/docs/readthedocs/source/doc/Orca/Overview/index.rst b/docs/readthedocs/source/doc/Orca/Overview/index.rst
deleted file mode 100644
index c2eec58f..00000000
--- a/docs/readthedocs/source/doc/Orca/Overview/index.rst
+++ /dev/null
@@ -1,8 +0,0 @@
-Orca Key Features
-=================================
-
-* `Orca Context <orca-context.html>`_
-* `Distributed Data Processing <data-parallel-processing.html>`_
-* `Distributed Training and Inference <distributed-training-inference.html>`_
-* `Distributed Hyper Parameter Tuning <distributed-tuning.html>`_
-* `RayOnSpark <ray.html>`_
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Orca/Overview/install.md b/docs/readthedocs/source/doc/Orca/Overview/install.md
deleted file mode 100644
index 36b4bce9..00000000
--- a/docs/readthedocs/source/doc/Orca/Overview/install.md
+++ /dev/null
@@ -1,145 +0,0 @@
-# Installation
-
----
-## Prepare the environment
-You can follow the commands in this section to install Java and conda before installing BigDL Orca.
-
-### Install Java
-You need to download and install JDK in the environment, and properly set the environment variable `JAVA_HOME`. JDK8 is highly recommended.
-
-```bash
-# For Ubuntu
-sudo apt-get install openjdk-8-jre
-export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
-
-# For CentOS
-su -c "yum install java-1.8.0-openjdk"
-export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.282.b08-1.el7_9.x86_64/jre
-
-export PATH=$PATH:$JAVA_HOME/bin
-java -version  # Verify the version of JDK.
-```
-
-### Install Anaconda
-We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the Python environment.
-
-You can follow the steps below to install conda:
-```bash
-# Download Anaconda installation script 
-wget -P /tmp https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh
-
-# Execute the script to install conda
-bash /tmp/Anaconda3-2020.02-Linux-x86_64.sh
-
-# Run this command in your terminal to activate conda
-source ~/.bashrc
-``` 
-
-Then create a Python environment for BigDL Orca:
-```bash
-conda create -n py37 python=3.7  # "py37" is conda environment name, you can use any name you like.
-conda activate py37
-```
-
----
-## Install BigDL Orca
-
-This section demonstrates how to install BigDL Orca via `pip`, which is the most recommended way.
-
-__Notes:__
-* Installing BigDL Orca from pip will automatically install `pyspark`. To avoid possible conflicts, you are highly recommended to  **unset the environment variable `SPARK_HOME`**  if it exists in your environment.
-
-* If you are using a custom URL of Python Package Index to install the latest version, you may need to check whether the latest packages have been sync'ed with pypi. Or you can add the option `-i https://pypi.python.org/simple` when pip install to use pypi as the index-url.
-
-
-### To use basic Orca features
-You can install Orca in your created conda environment for distributed data processing, training and inference with the following command:
-```bash
-pip install bigdl-orca  # For the official release version
-```
-
-or for the nightly build version, use:
-```bash
-pip install --pre --upgrade bigdl-orca  # For the latest nightly build version
-```
-
-Note that installing Orca will automatically install the dependencies including `bigdl-dllib`, `bigdl-tf`, `bigdl-math`, `packaging`, `filelock`, `pyzmq` and their dependencies if they haven't been detected in your conda environment._
-
-### To additionally use RayOnSpark
-
-If you wish to run [RayOnSpark](ray.md) or [sklearn-style Estimator APIs in Orca](distributed-training-inference.md) with **"ray" backend**, use the extra key `[ray]` during the installation above:
-
-```bash
-pip install bigdl-orca[ray]  # For the official release version
-```
-
-or for the nightly build version, use:
-```bash
-pip install --pre --upgrade bigdl-orca[ray]  # For the latest nightly build version
-```
-
-Note that with the extra key of [ray], `pip` will automatically install the additional dependencies for RayOnSpark,
-including `ray[default]==1.9.2`, `aiohttp==3.9.0`, `async-timeout==4.0.1`, `aioredis==1.3.1`, `hiredis==2.0.0`, `prometheus-client==0.11.0`, `psutil`,  `setproctitle`.
-
-### To additionally use AutoML
-
-If you wish to run AutoML, use the extra key `[automl]` during the installation above:
-
-```bash
-pip install bigdl-orca[automl]  # For the official release version
-````
-
-or for the nightly build version, use:
-```bash
-pip install --pre --upgrade bigdl-orca[automl]  # For the latest nightly build version
-```
-
-Note that with the extra key of [automl], `pip` will automatically install the additional dependencies for distributed hyper-parameter tuning,
-including `ray[tune]==1.9.2`, `scikit-learn`, `tensorboard`, `xgboost` together with the dependencies given by the extra key [ray].
-
-- To use [Pytorch AutoEstimator](distributed-tuning.md#pytorch-autoestimator), you need to install Pytorch with `pip install torch==1.8.1`.
-
-- To use [TensorFlow/Keras AutoEstimator](distributed-tuning.md#tensorflow-keras-autoestimator), you need to install TensorFlow with `pip install tensorflow==1.15.0`.
-
-### To install Orca for Spark3
-
-By default, Orca is built on top of Spark 2.4.6 (with pyspark==2.4.6 as a dependency). If you want to install Orca built on top of Spark 3.1.3 (with pyspark==3.1.3 as a dependency), you can use the following command instead:
-
-```bash
-# For the official release version
-pip install bigdl-orca-spark3
-pip install bigdl-orca-spark3[ray]
-pip install bigdl-orca-spark3[automl]
-
-# For the latest nightly build version
-pip install --pre --upgrade bigdl-orca-spark3
-pip install --pre --upgrade bigdl-orca-spark3[ray]
-pip install --pre --upgrade bigdl-orca-spark3[automl]
-```
-
-__Note__: You should only install Orca built on top of __ONE__ Spark version, but not both. If you want to switch the Spark version, please [**uninstall**](#to-uninstall-orca) Orca cleanly before reinstall.
-
-### To uninstall Orca
-```bash
-# For default Orca built on top of Spark 2.4.6
-pip uninstall bigdl-orca bigdl-dllib bigdl-tf bigdl-math bigdl-core
-
-# For Orca built on top of Spark 3.1.3
-pip uninstall bigdl-orca-spark3 bigdl-dllib-spark3 bigdl-tf bigdl-math bigdl-core
-```
-
-__Note__: If necessary, you need to manually uninstall `pyspark` and other [dependencies](https://github.com/intel-analytics/BigDL/tree/main/python/requirements/orca) introduced by Orca.
-
----
-## Download BigDL Orca
-
-You can also download the BigDL package via the download links below.
-
-|           | <center>2.4.0</center> | 2.5.0-SNAPSHOT | 
-| :-: | :-: | :-: | 
-| Spark 2.4 | [download](https://repo1.maven.org/maven2/com/intel/analytics/bigdl/bigdl-assembly-spark_2.4.6/2.4.0/bigdl-assembly-spark_2.4.6-2.4.0-fat-jars.zip) | [download](https://oss.sonatype.org/content/repositories/snapshots/com/intel/analytics/bigdl/bigdl-assembly-spark_2.4.6/2.5.0-SNAPSHOT/bigdl-assembly-spark_2.4.6-2.5.0-20240129.115705-78-fat-jars.zip) | 
-| Spark 3.1 | [download](https://repo1.maven.org/maven2/com/intel/analytics/bigdl/bigdl-assembly-spark_3.1.3/2.4.0/bigdl-assembly-spark_3.1.3-2.4.0-fat-jars.zip) | [download](https://oss.sonatype.org/content/repositories/snapshots/com/intel/analytics/bigdl/bigdl-assembly-spark_3.1.3/2.5.0-SNAPSHOT/bigdl-assembly-spark_3.1.3-2.5.0-20240129.113204-80-fat-jars.zip) | 
-
-Note that *SNAPSHOT* indicates the latest nightly build version of BigDL.
-
-If you wish to download the BigDL package in the command line, you can run this [script](https://github.com/intel-analytics/BigDL/blob/main/scripts/download-bigdl.sh) instead.
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Orca/Overview/known_issues.md b/docs/readthedocs/source/doc/Orca/Overview/known_issues.md
deleted file mode 100644
index 33f6add2..00000000
--- a/docs/readthedocs/source/doc/Orca/Overview/known_issues.md
+++ /dev/null
@@ -1,226 +0,0 @@
-# Orca Known Issues
-
-## Estimator Issues
-
-### UnkownError: Could not start gRPC server
-
-This error occurs while running Orca TF2 Estimator with spark backend, which may because the previous pyspark tensorflow job was not cleaned completely. You can retry later or you can set spark config `spark.python.worker.reuse=false` in your application.
-
-If you are using `init_orca_context(cluster_mode="yarn-client")`:
-   ```
-   conf = {"spark.python.worker.reuse": "false"}
-   init_orca_context(cluster_mode="yarn-client", conf=conf)
-   ```
-   If you are using `init_orca_context(cluster_mode="spark-submit")`:
-   ```
-   spark-submit --conf spark.python.worker.reuse=false
-   ```
-
-### RuntimeError: Inter op parallelism cannot be modified after initialization
-
-This error occurs if you build your TensorFlow model on the driver rather than on workers. You should build the complete model in `model_creator` which runs on each worker node. You can refer to the following examples:
-
-**Wrong Example**
-   ```
-   model = ...
-
-   def model_creator(config):
-       model.compile(...)
-       return model
-
-   estimator = Estimator.from_keras(model_creator=model_creator,...)
-   ...
-   ```
-
-**Correct Example**
-   ```
-   def model_creator(config):
-       model = ...
-       model.compile(...)
-       return model
-
-   estimator = Estimator.from_keras(model_creator=model_creator,...)
-   ...
-   ```
-
-## OrcaContext Issues
-
-### Exception: Failed to read dashbord log: [Errno 2] No such file or directory: '/tmp/ray/.../dashboard.log'
-
-This error occurs when initialize an orca context with `init_ray_on_spark=True`. We have not locate the root cause of this problem, but it might be caused by an atypical python environment.
-
-You could follow below steps to workaround:
-
-1. If you only need to use functions in ray (e.g. `bigdl.orca.learn` with `backend="ray"`, `bigdl.orca.automl` for pytorch/tensorflow model, `bigdl.chronos.autots` for time series model's auto-tunning), we may use ray as the first-class.
-
-   1. Start a ray cluster by `ray start --head`. if you already have a ray cluster started, please direcetly jump to step 2.
-   2. Initialize an orca context with `runtime="ray"` and `init_ray_on_spark=False`, please refer to detailed information [here](./orca-context.html).
-   3. If you are using `bigdl.orca.automl` or `bigdl.chronos.autots` on a single node, please set:
-      ```python
-      ray_ctx = OrcaContext.get_ray_context()
-      ray_ctx.is_local=True
-      ```
-
-2. If you really need to use ray on spark, please install bigdl-orca under a conda environment. Detailed information please refer to [here](./orca.html).
-
-## Ray Issues
-
-### ValueError: Ray component worker_ports is trying to use a port number ... that is used by other components.
-
-This error is because that some port in worker port list is occupied by other processes. To handle this issue, you can set range of the worker port list by using the parameters `min-worker-port` and `max-worker-port` in `init_orca_context` as follows:
-
-```python
-init_orca_context(extra_params={"min-worker-port": "30000", "max-worker-port": "30033"})
-```
-
-### ValueError: Failed to bind to 0.0.0.0:8265 because it's already occupied. You can use `ray start --dashboard-port ...` or `ray.init(dashboard_port=...)` to select a different port.
-
-This error is because that ray dashboard port is occupied by other processes. To handle this issue, you can end the process that occupies the port or you can manually set the ray dashboard port by using the parameter `dashboard-port` in `init_orca_context` as follows:
-
-```python
-init_orca_context(extra_params={"dashboard-port": "50005"})
-```
-
-Note that, the similar error can happen to ray redis port as well, you can also set the ray redis port by using the parameter `redis_port` in `init_orca_context` as follows:
-
-```python
-init_orca_context(redis_port=50006)
-```
-
-## Other Issues
-
-### OSError: Unable to load libhdfs: ./libhdfs.so: cannot open shared object file: No such file or directory
-
-This error is because PyArrow fails to locate `libhdfs.so` in default path of `$HADOOP_HOME/lib/native` when you run with YARN on Cloudera.
-To solve this issue, you need to set the path of `libhdfs.so` in Cloudera to the environment variable of `ARROW_LIBHDFS_DIR` on Spark driver and executors with the following steps:
-
-1. Run `locate libhdfs.so` on the client node to find `libhdfs.so`
-2. `export ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64` (replace with the result of `locate libhdfs.so` in your environment)
-3. If you are using `init_orca_context(cluster_mode="yarn-client")`:
-   ```
-   conf = {"spark.executorEnv.ARROW_LIBHDFS_DIR": "/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64"}
-   init_orca_context(cluster_mode="yarn-client", conf=conf)
-   ```
-   If you are using `init_orca_context(cluster_mode="spark-submit")`:
-   ```
-   # For yarn-client mode
-   spark-submit --conf spark.executorEnv.ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64
-
-   # For yarn-cluster mode
-   spark-submit --conf spark.executorEnv.ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64 \
-                --conf spark.yarn.appMasterEnv.ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64
-
-### Spark Dynamic Allocation
-
-By design, BigDL does not support Spark Dynamic Allocation mode, and needs to allocate fixed resources for deep learning model training. Thus if your environment has already configured Spark Dynamic Allocation, or stipulated that Spark Dynamic Allocation must be used, you may encounter the following error:
-
-> **requirement failed: Engine.init: spark.dynamicAllocation.maxExecutors and spark.dynamicAllocation.minExecutors must be identical in dynamic allocation for BigDL**
->
-
-Here we provide a workaround for running BigDL under Spark Dynamic Allocation mode.
-
-For `spark-submit` cluster mode, the first solution is to disable the Spark Dynamic Allocation mode in `SparkConf` when you submit your application as follows:
-
-```bash
-spark-submit --conf spark.dynamicAllocation.enabled=false
-```
-
-Otherwise, if you can not set this configuration due to your cluster settings, you can set `spark.dynamicAllocation.minExecutors` to be equal to `spark.dynamicAllocation.maxExecutors` as follows:
-
-```bash
-spark-submit --conf spark.dynamicAllocation.enabled=true \
-             --conf spark.dynamicAllocation.minExecutors 2 \
-             --conf spark.dynamicAllocation.maxExecutors 2
-```
-
-For other cluster modes, such as `yarn` and `k8s`, our program will initiate `SparkContext` for you, and the Spark Dynamic Allocation mode is disabled by default. Thus, generally you wouldn't encounter such problem.
-
-If you are using Spark Dynamic Allocation, you have to disable barrier execution mode at the very beginning of your application as follows:
-
-```python
-from bigdl.orca import OrcaContext
-
-OrcaContext.barrier_mode = False
-```
-
-For Spark Dynamic Allocation mode, you are also recommended to manually set `num_ray_nodes` and `ray_node_cpu_cores` equal to `spark.dynamicAllocation.minExecutors` and `spark.executor.cores` respectively. You can specify `num_ray_nodes` and `ray_node_cpu_cores` in `init_orca_context` as follows:
-
-```python
-init_orca_context(..., num_ray_nodes=2, ray_node_cpu_cores=4)
-```
-
-### No Space Left on Device
-This error may happen when your disk even has free space, the reason could be:
-1. Inodes do not have enough space.  
-2. Processes are still using deleted files.
-
-To solve this issue, please follow the steps below:
-1. Checkout Spaces on Inodes
-
-   Please check the space on available inodes using the command below:
-      ```bash
-      sudo df -i
-      ```
-
-   Then you will see the overview information of all Inodes and the availability state.
-   ```bash
-   Filesystem       Inodes   IUsed    IFree IUse% Mounted on
-   udev           98880204    3552 98876652    1% /dev
-   tmpfs          98889585    3381 98886204    1% /run
-   /dev/sda2      14622720 2119508 12503212   15% /
-   tmpfs          98889585   18225 98871360    1% /dev/shm
-   tmpfs          98889585       5 98889580    1% /run/lock
-   tmpfs          98889585      19 98889566    1% /sys/fs/cgroup
-   ```
-
-   If there is a disk that uses a small part but the Inode table is full, you should delete useless files.
-
-2. Restart the Process to Free Space
-   
-   Files that were deleted (while processes are still running) could keep the space reserved, you should restart processes to free up the space.
-
-   Please run the command below to see which processes have opened descriptors to deleted files:
-   ```bash
-   sudo lsof | grep deleted
-   ```
-
-   Then you could restart the processes to free up the reserved space.
-   ```bash
-   sudo systemctl restart service_name
-   ```
-
-### Current incarnation doesn't match with one in the group
-
-Full error log example:
-```shell
-tensorflow.python.framework.errors_impl.FailedPreconditionError: Collective ops is aborted by: Device /job:worker/replica:0/task:14/device:CPU:0 current incarnation doesn't match with one in the group. This usually means this worker has restarted but the collective leader hasn't, or this worker connects to a wrong cluster. Additional GRPC error information from remote target /job:worker/replica:0/task:0: :{"created":"@1681905587.420462284","description":"Error received from peer ipv4:172.16.0.150:47999","file":"external/com_github_grpc_grpc/src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Device /job:worker/replica:0/task:14/device:CPU:0 current incarnation doesn't match with one in the group. This usually means this worker has restarted but the collective leader hasn't, or this worker connects to a wrong cluster.","grpc_status":9} The error could be from a previous operation. Restart your program to reset. [Op:CollectiveReduceV2]
-```
-This error may happen when Spark reduce locality shuffle is enabled. To eliminate this issue, you can disable it by setting `spark.shuffle.reduceLocality.enabled` to false, or load property file named `spark-bigdl.conf` from bigdl release package.
-
-```shell
-# set spark.shuffle.reduceLocality.enabled to false
-spark-submit \
-   --conf spark.shuffle.reduceLocality.enabled=false \
-   ...
-
-# load property file
-spark-submit \
-   --properties-file /path/to/spark-bigdl.conf \
-   ...
-```
-
-### Start Spark task before all executor is scheduled.
-
-This issue may lead to slower data processing. To avoid this, you can set `spark.scheduler.maxRegisteredResourcesWaitingTime` to a larger number, the default value is `30s`. Or you can load property file named `spark-bigdl.conf` from bigdl release package.
-
-```shell
-# set spark.scheduler.maxRegisteredResourcesWaitingTime
-spark-submit \
-   --conf spark.scheduler.maxRegisteredResourcesWaitingTime=3600s \
-   ...
-
-# load property file
-spark-submit \
-   --properties-file /path/to/spark-bigdl.conf \
-   ...
-```
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Orca/Overview/orca-context.md b/docs/readthedocs/source/doc/Orca/Overview/orca-context.md
deleted file mode 100644
index beee6a77..00000000
--- a/docs/readthedocs/source/doc/Orca/Overview/orca-context.md
+++ /dev/null
@@ -1,82 +0,0 @@
-# Orca Context
-
----
-
-`OrcaContext` is the main entry for provisioning the Orca program on the underlying cluster (such as K8s or Hadoop cluster), or just on a single laptop.
-
----
-### 1. Initialization
-
-An Orca program usually starts with the initialization of `OrcaContext` as follows:
-
-```python
-from bigdl.orca import init_orca_context
-
-init_orca_context(...)
-```
-
-In `init_orca_context`, the user may specify necessary runtime configurations for the Orca program, including:
-
-- *Cluster mode*: Users can specify the computing environment for the program (a local machine, K8s cluster, Hadoop/YARN cluster, etc.).
-- *Runtime*: Users can specify the backend for the program (spark and ray, etc.) to create SparkContext and/or OrcaRayContext, the cluster mode would work based on the specified runtime backend.
-- *Physical resources*: Users can specify the amount of physical resources to be allocated for the program on the underlying cluster, including the number of nodes in the cluster, the cores and memory allocated for each node, etc.
-
-The Orca program simply runs `init_orca_context` on the local machine, which will automatically provision the runtime Python environment and distributed execution engine on the underlying computing environment (such as a single laptop, a large K8s or Hadoop cluster, etc.).
-
-View the related [Python API doc]() for more details.
-
----
-### 2. Python Dependencies
-
-A key challenge for scaling out Python program across a distributed cluster is how to properly install the required Python environment (libraries and dependencies) on each node in the cluster (preferably in an automatic and dynamic fashion). 
-
-For K8s cluster, the user may install required Python packages in the container and specify the `container_image` argument when `init_orca_context`. For Hadoop/YARN cluster, the user may use `conda` to create the Python virtual environment with required dependencies on the local machine, and `init_orca_context` will automatically detect the active `conda` environment and provision it on each node in the cluster.
-
-You can also add .py, .zip or .egg files to distribute with your application by specifying `extra_python_lib` in `init_orca_context`. If you depend on multiple Python files we recommend packaging them into a .zip or .egg. Those files will be added to each node's python search path.
-
-```python
-init_orca_context(..., extra_python_lib="func1.py,func2.py,lib3.zip")
-```
-
-View the user guide for [K8s](../../UserGuide/k8s.md) and [Hadoop/YARN](../../UserGuide/hadoop.md) for more details.
-
----
-### 3. Execution Engine
-
-Under the hood, `OrcaContext` will automatically provision Apache Spark and/or Ray as the underlying execution engine for the distributed data processing and model training/inference.
-
-Users can easily retrieve `SparkContext` and `OrcaRayContext`, the main entry point for Spark and Ray respectively, via `OrcaContext`:
-
-```python
-from bigdl.orca import OrcaContext
-
-sc = OrcaContext.get_spark_context()
-ray_ctx = OrcaContext.get_ray_context()
-```
-
----
-### 4. Extra Configurations
-
-Users can make extra configurations when using the functionalities of Project Orca via `OrcaContext`.
-
-* `OrcaContext.log_output`: Default to be False. `OrcaContext.log_output = True` is recommended when running Jupyter notebook (this will display all the program output in the notebook). Make sure you set it before `init_orca_context`.
-
-* `OrcaContext.serialize_data_creator`: Default to be False. `OrcaContext.serialize_data_creator = True` would add a file lock when initializing data for distributed training (this may be useful if you run multiple workers on a single node and they download data to the same destination).
-
-* `OrcaContext.pandas_read_backend`: The backend to be used for reading data as Panda DataFrame. Default to be "spark". See [here](./data-parallel-processing.html#data-parallel-pandas) for more details.
-
-* `OrcaContext.train_data_store`: Default to be "DRAM". `OrcaContext.train_data_store = "DISK_n"` (e.g., "DISK_2") if the training data cannot fit in memory (this will store the data on disk, and cache only 1/n of the data in memory; after going through the 1/n, it will release the current cache, and load another 1/n into memory). Currently it works for TensorFlow and Keras Estimators only.
-
-* `OrcaContext.barrier_mode`: Whether to use Spark barrier execution mode to launch Ray. Default to be True. You can set it to be False if you are using Spark below 2.4 or you need to have dynamic allocation enabled.
-
----
-
-### 5. Termination
-
-After the Orca program finishes, the user can call `stop_orca_context` to release resources and shut down the underlying Spark and/or Ray execution engine.
-
-```python
-from bigdl.orca import stop_orca_context
-
-stop_orca_context()
-```
diff --git a/docs/readthedocs/source/doc/Orca/Overview/orca.md b/docs/readthedocs/source/doc/Orca/Overview/orca.md
deleted file mode 100644
index 8d26ea91..00000000
--- a/docs/readthedocs/source/doc/Orca/Overview/orca.md
+++ /dev/null
@@ -1,104 +0,0 @@
-# Orca in 5 minutes
-
-### Overview
-
-The  _**Orca**_ library in BigDL can seamlessly scale out your single node Python notebook across large clusters to process large-scale data.
-
-This page demonstrates how to scale the distributed training and inference of a standard TensorFlow model to a large cluster with minimum code changes to your notebook using Orca. We use [Neural Collaborative Filtering](https://arxiv.org/abs/1708.05031) for recommendation as an example.
-
----
-
-### TensorFlow Bite-sized Example
-
-Before running this example, follow the steps [here](install.md) to prepare the environment and install Orca in your environment.
-
-This section uses **TensorFlow 2.x**, and you should also install TensorFlow before running this example:
-```bash
-pip install tensorflow
-```
-
-First, initialize [Orca Context](orca-context.md):
-
-```python
-from bigdl.orca import init_orca_context, stop_orca_context, OrcaContext
-
-# cluster_mode can be "local", "k8s" or "yarn"
-sc = init_orca_context(cluster_mode="local", cores=4, memory="10g", num_nodes=1)
-```
-
-Next, perform [data-parallel processing in Orca](data-parallel-processing.md) (supporting standard Spark DataFrames, TensorFlow Dataset, PyTorch DataLoader, Pandas, etc.). Here to make things simple, we just generate some random data with Spark DataFrame:
-
-```python
-import random
-from pyspark.sql.types import StructType, StructField, IntegerType
-from bigdl.orca import OrcaContext
-
-spark = OrcaContext.get_spark_session()
-
-num_users, num_items = 200, 100
-rdd = sc.range(0, 512).map(
-    lambda x: [random.randint(0, num_users-1), random.randint(0, num_items-1), random.randint(0, 1)])
-schema = StructType([StructField("user", IntegerType(), False),
-                     StructField("item", IntegerType(), False),
-                     StructField("label", IntegerType(), False)])
-df = spark.createDataFrame(rdd, schema)
-train_df, test_df = df.randomSplit([0.8, 0.2], seed=1)
-```
-
-Finally, use [sklearn-style Estimator APIs in Orca](distributed-training-inference.md) to perform distributed _TensorFlow_, _PyTorch_, _Keras_ and _BigDL_ training and inference:
-
-```python
-from bigdl.orca.learn.tf2.estimator import Estimator
-
-# Define the NCF model in standard TensorFlow API
-def model_creator(config):
-    from tensorflow import keras
-
-    user_input = keras.layers.Input(shape=(1,), dtype="int32", name="use_input")
-    item_input = keras.layers.Input(shape=(1,), dtype="int32", name="item_input")
-
-    mlp_embed_user = keras.layers.Embedding(input_dim=config["num_users"], output_dim=config["embed_dim"],
-                                            input_length=1)(user_input)
-    mlp_embed_item = keras.layers.Embedding(input_dim=config["num_items"], output_dim=config["embed_dim"],
-                                            input_length=1)(item_input)
-
-    user_latent = keras.layers.Flatten()(mlp_embed_user)
-    item_latent = keras.layers.Flatten()(mlp_embed_item)
-
-    mlp_latent = keras.layers.concatenate([user_latent, item_latent], axis=1)
-    predictions = keras.layers.Dense(1, activation="sigmoid")(mlp_latent)
-    model = keras.models.Model(inputs=[user_input, item_input], outputs=predictions)
-    model.compile(optimizer='adam',
-                  loss='binary_crossentropy',
-                  metrics=['accuracy'])
-    return model
-
-
-batch_size = 64
-train_steps = int(train_df.count() / batch_size)
-val_steps = int(test_df.count() / batch_size)
-
-est = Estimator.from_keras(model_creator=model_creator, backend="spark",
-                           config={"embed_dim": 8, "num_users": num_users, "num_items": num_items})
-
-# Distributed training
-est.fit(data=train_df,
-        batch_size=batch_size,
-        epochs=4,
-        feature_cols=['user', 'item'],
-        label_cols=['label'],
-        steps_per_epoch=train_steps,
-        validation_data=test_df,
-        validation_steps=val_steps)
-
-# Distributed inference
-prediction_df = est.predict(test_df,
-                            batch_size=batch_size,
-                            feature_cols=['user', 'item'])
-```
-
-Stop [Orca Context](orca-context.md) after you finish your program:
-
-```python
-stop_orca_context()
-```
diff --git a/docs/readthedocs/source/doc/Orca/Overview/ray.md b/docs/readthedocs/source/doc/Orca/Overview/ray.md
deleted file mode 100644
index e53a6319..00000000
--- a/docs/readthedocs/source/doc/Orca/Overview/ray.md
+++ /dev/null
@@ -1,142 +0,0 @@
-# RayOnSpark
-
----
-
-[Ray](https://github.com/ray-project/ray) is an open source distributed framework for emerging AI applications.
-With the _**RayOnSpark**_ support packaged in [BigDL Orca](../Overview/orca.md),
-Users can seamlessly integrate Ray applications into the big data processing pipeline on the underlying Big Data cluster
-(such as [Hadoop/YARN](../../UserGuide/hadoop.md) or [K8s](../../UserGuide/k8s.md)).
-
-_**Note:** BigDL has been tested on Ray 1.9.2 and you are highly recommended to use this tested version._
-
-
-### 1. Install
-
-We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the Python environment.
-When installing bigdl-orca with pip, you can specify the extras key `[ray]` to install the additional dependencies
-for running Ray (i.e. `ray[default]==1.9.2`, `aiohttp==3.9.0`, `async-timeout==4.0.1`, `aioredis==1.3.1`, `hiredis==2.0.0`, `prometheus-client==0.11.0`, `psutil`,  `setproctitle`):
-
-```bash
-conda create -n py37 python=3.7  # "py37" is conda environment name, you can use any name you like.
-conda activate py37
-
-pip install bigdl-orca[ray]
-```
-
-View [Python User Guide](../../UserGuide/python.html#install) and [Orca User Guide](../Overview/orca.md) for more installation instructions.
-
----
-### 2. Initialize
-
-We recommend using `init_orca_context` to initiate and run RayOnSpark on the underlying cluster. The Ray cluster would be launched by specifying `init_ray_on_spark=True`. For example, to launch Spark and Ray on standard Hadoop/YARN clusters in [YARN client mode](https://spark.apache.org/docs/latest/running-on-yarn.html#launching-spark-on-yarn):
-
-```python
-from bigdl.orca import init_orca_context
-
-sc = init_orca_context(cluster_mode="yarn-client", cores=4, memory="10g", num_nodes=2, init_ray_on_spark=True)
-```
-
-You can input the following RayOnSpark related arguments when you `init_orca_context` for Ray configurations:
-- `redis_port`: The redis port for the ray head node. The value would be randomly picked if not specified.
-- `redis_password`: The password for redis. The value would be ray's default password if not specified.
-- `object_store_memory`: The memory size for ray object_store in string. This can be specified in bytes(b), kilobytes(k), megabytes(m) or gigabytes(g). For example, "50b", "100k", "250m", "30g".
-- `verbose`: True for more logs when starting ray. Default is False.
-- `env`: The environment variable dict for running ray processes. Default is None.
-- `extra_params`: The key value dict for extra options to launch ray. For example, `extra_params={"dashboard-port": "11281", "temp-dir": "/tmp/ray/"}`.
-- `include_webui`: Default is True for including web ui when starting ray.
-- `system_config`: The key value dict for overriding RayConfig defaults. Mainly for testing purposes. An example for system_config could be: `{"object_spilling_config":"{\"type\":\"filesystem\", \"params\":{\"directory_path\":\"/tmp/spill\"}}"}`.
-- `num_ray_nodes`: The number of ray processes to start across the cluster. For Spark local mode, you don't need to specify this value.
-For Spark cluster mode, it is default to be the number of Spark executors. If spark.executor.instances can't be detected in your SparkContext, you need to explicitly specify this. It is recommended that num_ray_nodes is not larger than the number of Spark executors to make sure there are enough resources in your cluster.
-- `ray_node_cpu_cores`: The number of available cores for each ray process. For Spark local mode, it is default to be the number of Spark local cores.
-For Spark cluster mode, it is default to be the number of cores for each Spark executor. If spark.executor.cores or spark.cores.max can't be detected in your SparkContext, you need to explicitly specify this. It is recommended that ray_node_cpu_cores is not larger than the number of cores for each Spark executor to make sure there are enough resources in your cluster.
-
-By default, the Ray cluster would be launched using Spark barrier execution mode, you can turn it off via the configurations of `OrcaContext`:
-
-```python
-from bigdl.orca import OrcaContext
-
-OrcaContext.barrier_mode = False
-```
-
-View [Orca Context](../Overview/orca-context.md) for more details.
-
----
-### 3. Run
-
-- After the initialization, you can directly run Ray applications on the underlying cluster. [Ray tasks](https://docs.ray.io/en/master/walkthrough.html#remote-functions-tasks) or [actors](https://docs.ray.io/en/master/actors.html) would be launched across the cluster. The following code shows a simple example:
-
-  ```python
-  import ray
-
-  @ray.remote
-  class Counter(object):
-        def __init__(self):
-            self.n = 0
-
-        def increment(self):
-            self.n += 1
-            return self.n
-
-
-  counters = [Counter.remote() for i in range(5)]
-  print(ray.get([c.increment.remote() for c in counters]))
-  ```
-
-- You can retrieve the information of the Ray cluster via [`OrcaContext`](../Overview/orca-context.md):
-
-  ```python
-  from bigdl.orca import OrcaContext
-
-  ray_ctx = OrcaContext.get_ray_context()
-  address_info = ray_ctx.address_info  # The dictionary information of the ray cluster, including node_ip_address, object_store_address, webui_url, etc.
-  redis_address = ray_ctx.redis_address  # The redis address of the ray cluster.
-  ```
-
-- You should call `stop_orca_context()` when your program finishes:
-
-  ```python
-  from bigdl.orca import stop_orca_context
-
-  stop_orca_context()
-  ```
-
----
-### 4. Known Issue
-If you encounter the following error when launching Ray on the underlying cluster, especially when you are using a [Spark standalone](https://spark.apache.org/docs/latest/spark-standalone.html) cluster:
-
-```
-This system supports the C.UTF-8 locale which is recommended. You might be able to resolve your issue by exporting the following environment variables:
-
-    export LC_ALL=C.UTF-8
-    export LANG=C.UTF-8
-```
-
-Add the environment variables when calling `init_orca_context` would resolve the issue:
-
-```python
-sc = init_orca_context(cluster_mode, init_ray_on_spark=True, env={"LANG": "C.UTF-8", "LC_ALL": "C.UTF-8"})
-```
-
----
-### 5. FAQ
-- **ValueError: Ray component worker_ports is trying to use a port number ... that is used by other components.**
-
-  This error is because that some port in worker port list is occupied by other processes. To handle this issue, you can set range of the worker port list by using the parameters `min-worker-port` and `max-worker-port` in `init_orca_context` as follows:
-
-  ```python
-  init_orca_context(extra_params={"min-worker-port": "30000", "max-worker-port": "30033"})
-  ```
-
-- **ValueError: Failed to bind to 0.0.0.0:8265 because it's already occupied. You can use `ray start --dashboard-port ...` or `ray.init(dashboard_port=...)` to select a different port.**
-
-  This error is because that ray dashboard port is occupied by other processes. To handle this issue, you can end the process that occupies the port or you can manually set the ray dashboard port by using the parameter `dashboard-port` in `init_orca_context` as follows:
-
-  ```python
-  init_orca_context(extra_params={"dashboard-port": "50005"})
-  ```
-
-  Note that, the similar error can happen to ray redis port as well, you can also set the ray redis port by using the parameter `redis_port` in `init_orca_context` as follows:
-
-  ```python
-  init_orca_context(redis_port=50006)
-  ```
diff --git a/docs/readthedocs/source/doc/Orca/Tutorial/index.rst b/docs/readthedocs/source/doc/Orca/Tutorial/index.rst
deleted file mode 100644
index 0a263e05..00000000
--- a/docs/readthedocs/source/doc/Orca/Tutorial/index.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-Orca Tutorials
-=================================
-
-* `Run on Hadoop/YARN clusters <yarn.html>`_
-* `Run on Kubernetes clusters <k8s.html>`_
-* `Run on Azure Databricks <../../UserGuide/databricks.html>`_
-* `Run on Google Colab <../../UserGuide/colab.html>`_
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Orca/Tutorial/k8s.md b/docs/readthedocs/source/doc/Orca/Tutorial/k8s.md
deleted file mode 100644
index 8548c510..00000000
--- a/docs/readthedocs/source/doc/Orca/Tutorial/k8s.md
+++ /dev/null
@@ -1,707 +0,0 @@
-# Run on Kubernetes Clusters
-
-This tutorial provides a step-by-step guide on how to run BigDL-Orca programs on Kubernetes (K8s) clusters, using a [PyTorch Fashin-MNIST program](https://github.com/intel-analytics/BigDL/tree/main/python/orca/tutorial/pytorch/FashionMNIST) as a working example.
-
-In this tutorial, the __Develop Node__ is the host machine where you launch the client container or create a Kubernetes Deployment. The **Client Container** is the created BigDL K8s Docker container where you launch or submit your applications.
-
----
-## 1. Basic Concepts
-### 1.1 init_orca_context
-A BigDL Orca program usually starts with the initialization of OrcaContext. For every BigDL Orca program, you should call `init_orca_context` at the beginning of the program as below:
-
-```python
-from bigdl.orca import init_orca_context
-
-init_orca_context(cluster_mode, master, container_image, 
-                  cores, memory, num_nodes, driver_cores, driver_memory, 
-                  extra_python_lib, conf)
-```
-
-In `init_orca_context`, you may specify necessary runtime configurations for running the example on K8s, including:
-* `cluster_mode`: one of `"k8s-client"`, `"k8s-cluster"` or `"spark-submit"` when you run on K8s clusters.
-* `master`: a URL format to specify the master address of the K8s cluster.
-* `container_image`: the name of Docker container image for K8s pods. The Docker container image for BigDL is `intelanalytics/bigdl-k8s`.
-* `cores`: the number of cores for each executor (default to be `2`).
-* `memory`: the memory for each executor (default to be `"2g"`).
-* `num_nodes`: the number of executors (default to be `1`).
-* `driver_cores`: the number of cores for the driver node (default to be `4`).
-* `driver_memory`: the memory for the driver node (default to be `"2g"`).
-* `extra_python_lib`: the path to extra Python packages, separated by comma (default to be `None`). `.py`, `.zip` or `.egg` files are supported.
-* `conf`: a dictionary to append extra conf for Spark (default to be `None`).
-
-__Note__: 
-* All arguments __except__ `cluster_mode` will be ignored when using [`spark-submit`](#use-spark-submit) or [`Kubernetes deployment`](#use-kubernetes-deployment) to submit and run Orca programs, in which case you are supposed to specify these configurations via the submit command.
-
-After Orca programs finish, you should always call `stop_orca_context` at the end of the program to release resources and shutdown the underlying distributed runtime engine (such as Spark or Ray).
-```python
-from bigdl.orca import stop_orca_context
-
-stop_orca_context()
-```
-
-For more details, please see [OrcaContext](../Overview/orca-context.md).
-
-
-### 1.2 K8s-Client & K8s-Cluster
-The difference between k8s-client mode and k8s-cluster mode is where you run your Spark driver. 
-
-For k8s-client, the Spark driver runs in the client process (outside the K8s cluster), while for k8s-cluster the Spark driver runs inside the K8s cluster.
-
-Please see more details in [K8s-Cluster](https://spark.apache.org/docs/latest/running-on-kubernetes.html#cluster-mode) and [K8s-Client](https://spark.apache.org/docs/latest/running-on-kubernetes.html#client-mode).
-
-For **k8s-client** mode, you can directly find the driver logs in the console.
-
-For **k8s-cluster** mode, a `driver-pod-name` (`train-py-fc5bec85fca28cb3-driver` in the following log) will be returned when the application completes.
-```
-23-01-29 08:34:47 INFO  LoggingPodStatusWatcherImpl:57 - Application status for spark-9341aa0ec6b249ad974676c696398b4e (phase: Succeeded)
-23-01-29 08:34:47 INFO  LoggingPodStatusWatcherImpl:57 - Container final statuses:
-         container name: spark-kubernetes-driver
-         container image: intelanalytics/bigdl-k8s:latest
-         container state: terminated
-         container started at: 2023-01-29T08:26:56Z
-         container finished at: 2023-01-29T08:35:07Z
-         exit code: 0
-         termination reason: Completed
-23-01-29 08:34:47 INFO  LoggingPodStatusWatcherImpl:57 - Application train.py with submission ID default:train-py-fc5bec85fca28cb3-driver finished
-23-01-29 08:34:47 INFO  ShutdownHookManager:57 - Shutdown hook called
-23-01-29 08:34:47 INFO  ShutdownHookManager:57 - Deleting directory /tmp/spark-fa8eeb45-bebf-4da9-9c0b-8bb59543842d
-```
-
-You can access the results of the driver pod on the __Develop Node__ following the commands below:
-
-* Retrieve the logs on the driver pod:
-```bash
-kubectl logs <driver-pod-name>
-```
-
-* Check the pod status or get basic information of the driver pod:
-```bash
-kubectl describe pod <driver-pod-name>
-```
-
-* You may need to delete the driver pod manually after the application finishes:
-```bash
-kubectl delete pod <driver-pod-name>
-```
-
-
-### 1.3 Load Data from Volumes
-When you are running programs on K8s, please load data from [Volumes](https://kubernetes.io/docs/concepts/storage/volumes/) accessible to all K8s pods. We use Network File Systems (NFS) with path `/bigdl/nfsdata` in this tutorial as an example. You are recommended to put your working directory in the Volume (NFS) as well.
-
-To load data from Volumes, please set the corresponding Volume configurations for spark using `--conf` option in Spark scripts or specifying `conf` in `init_orca_context`. Here we list the configurations for using NFS as the Volume.
-
-For **k8s-client** mode:
-* `spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName`: specify the claim name of persistentVolumeClaim with volumnName `nfsvolumeclaim` to mount into executor pods.
-* `spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path`: specify the NFS path (`/bigdl/nfsdata` in our example) to be mounted as nfsvolumeclaim into executor pods.
-
-Besides the above two configurations, you need to additionally set the following configurations for **k8s-cluster** mode:
-* `spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName`: specify the claim name of persistentVolumeClaim with volumnName `nfsvolumeclaim` to mount into the driver pod.
-* `spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path`: specify the NFS path (`/bigdl/nfsdata` in our example) to be mounted as nfsvolumeclaim into the driver pod.
-* `spark.kubernetes.authenticate.driver.serviceAccountName`: the service account for the driver pod.
-* `spark.kubernetes.file.upload.path`: the path to store files at spark submit side in k8s-cluster mode. In this example we use the NFS path as well.
-
-Sample conf for NFS in the Fashion-MNIST example provided by this tutorial is as follows:
-```python
-{
-    # For k8s-client mode
-    "spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName": "nfsvolumeclaim",
-    "spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path": "/bigdl/nfsdata",
-    
-    # Additionally for k8s-cluster mode
-    "spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName": "nfsvolumeclaim",
-    "spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path": "/bigdl/nfsdata",
-    "spark.kubernetes.authenticate.driver.serviceAccountName": "spark",
-    "spark.kubernetes.file.upload.path": "/bigdl/nfsdata/"
-}
-```
-
-After mounting the Volume (NFS) into the pods, the Fashion-MNIST example could load data from NFS as local storage.
-
-```python
-import torch
-import torchvision
-import torchvision.transforms as transforms
-
-def train_data_creator(config, batch_size):
-    transform = transforms.Compose([transforms.ToTensor(),
-                                    transforms.Normalize((0.5,), (0.5,))])
-
-    trainset = torchvision.datasets.FashionMNIST(root="/bigdl/nfsdata/dataset", train=True, 
-                                                 download=False, transform=transform)
-
-    trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
-                                              shuffle=True, num_workers=0)
-    return trainloader
-```
-
----
-## 2 Pull Docker Image
-Please pull the BigDL [`bigdl-k8s`](https://hub.docker.com/r/intelanalytics/bigdl-k8s/tags) image (built on top of Spark 3.1.3) from Docker Hub beforehand as follows:
-```bash
-# For the release version, e.g. 2.2.0
-sudo docker pull intelanalytics/bigdl-k8s:version
-
-# For the latest nightly build version
-sudo docker pull intelanalytics/bigdl-k8s:latest
-```
-
-* The environment for Spark (including SPARK_VERSION and SPARK_HOME) and BigDL (including BIGDL_VERSION and BIGDL_HOME) are already configured in the BigDL K8s Docker image.
-* Spark executor containers are scheduled by K8s at runtime and you don't need to create them manually.
-
----
-## 3. Create BigDL K8s Container
-Note that you can __SKIP__ this section if you want to run applications with [`Kubernetes deployment`](#use-kubernetes-deployment).
-
-You need to create a BigDL K8s client container only when you use [`python` command](#use-python-command) or [`spark-submit`](#use-spark-submit).
-
-### 3.1 Create a K8s Client Container
-Please create the __Client Container__ using the script below:
-```bash
-export RUNTIME_DRIVER_HOST=$( hostname -I | awk '{print $1}' )
-
-sudo docker run -itd --net=host \
-    -v /etc/kubernetes:/etc/kubernetes \
-    -v /root/.kube:/root/.kube \
-    -v /path/to/nfsdata:/bigdl/nfsdata \
-    -e http_proxy=http://your-proxy-host:your-proxy-port \
-    -e https_proxy=https://your-proxy-host:your-proxy-port \
-    -e RUNTIME_SPARK_MASTER=k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
-    -e RUNTIME_K8S_SERVICE_ACCOUNT=spark \
-    -e RUNTIME_K8S_SPARK_IMAGE=intelanalytics/bigdl-k8s:version \
-    -e RUNTIME_PERSISTENT_VOLUME_CLAIM=nfsvolumeclaim \
-    -e RUNTIME_DRIVER_HOST=${RUNTIME_DRIVER_HOST} \
-    intelanalytics/bigdl-k8s:version bash
-```
-
-In the script:
-* **Please modify the version tag according to the BigDL K8s Docker image you pull.**
-* **Please make sure you are mounting the correct Volume path (e.g. NFS) into the container.**
-* `--net=host`: use the host network stack for the Docker container.
-* `-v /etc/kubernetes:/etc/kubernetes`: specify the path of Kubernetes configurations to mount into the Docker container.
-* `-v /root/.kube:/root/.kube`: specify the path of Kubernetes installation to mount into the Docker container.
-* `-v /path/to/nfsdata:/bigdl/nfsdata`: mount NFS path on the host into the Docker container as the specified path (e.g. "/bigdl/nfsdata").
-* `RUNTIME_SPARK_MASTER`: a URL format that specifies the Spark master: `k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>`.
-* `RUNTIME_K8S_SERVICE_ACCOUNT`: the service account for the driver pod.
-* `RUNTIME_K8S_SPARK_IMAGE`: the name of the BigDL K8s Docker image. Note that you need to change the version accordingly.
-* `RUNTIME_PERSISTENT_VOLUME_CLAIM`: the Kubernetes volumeName (e.g. "nfsvolumeclaim").
-* `RUNTIME_DRIVER_HOST`: a URL format that specifies the driver localhost (only required if you use k8s-client mode).
-
-
-### 3.2 Launch the K8s Client Container
-Once the container is created, a `containerID` would be returned and with which you can enter the container following the command below:
-```bash
-sudo docker exec -it <containerID> bash
-```
-In the remaining part of this tutorial, you are supposed to operate and run commands *__inside__* this __Client Container__ if you use [`python` command](#use-python-command) or [`spark-submit`](#use-spark-submit).
-
-
----
-## 4. Prepare Environment
-In the launched BigDL K8s **Client Container** (if you use [`python` command](#use-python-command) or [`spark-submit`](#use-spark-submit)) or on the **Develop Node** (if you use [`Kubernetes deployment`](#use-kubernetes-deployment)), please setup the environment following the steps below:
-
-- See [here](../Overview/install.md#install-anaconda) to install conda and prepare the Python environment.
-
-- See [here](../Overview/install.md#to-install-orca-for-spark3) to install BigDL Orca in the created conda environment. Note that if you use [`spark-submit`](#use-spark-submit) or [`Kubernetes deployment`](#use-kubernetes-deployment), please __SKIP__ this step and __DO NOT__ install BigDL Orca with pip install command in the conda environment.
-
-- You should install all the other Python libraries that you need in your program in the conda environment as well. `torch`, `torchvision` and `tqdm` are needed to run the Fashion-MNIST example we provide:
-```bash
-pip install torch torchvision tqdm
-```
-
-
----
-## 5. Prepare Dataset
-To run the Fashion-MNIST example provided by this tutorial on K8s, you should upload the dataset to the Volume (e.g. NFS) beforehand.
-
-Please manually download the Fashion-MNIST dataset and put the data into the Volume. Note that PyTorch `FashionMNIST Dataset` requires unzipped files located in `FashionMNIST/raw/` under the dataset folder.
-
-```bash
-# PyTorch official dataset download link
-git clone https://github.com/zalandoresearch/fashion-mnist.git
-
-# Copy the dataset files to the folder FashionMNIST/raw in NFS
-cp /path/to/fashion-mnist/data/fashion/* /path/to/nfs/dataset/FashionMNIST/raw
-
-# Extract FashionMNIST archives
-gzip -d /path/to/nfs/dataset/FashionMNIST/raw/*
-```
-
-In the given example, you can specify the argument `--data_dir` to be the directory on NFS for the Fashion-MNIST dataset. The directory should contain `FashionMNIST/raw/train-images-idx3-ubyte` and `FashionMNIST/raw/t10k-images-idx3`.
-
-
----
-## 6. Prepare Custom Modules
-Spark allows to upload Python files(`.py`), and zipped Python packages(`.zip`) across the cluster by setting `--py-files` option in Spark scripts or specifying `extra_python_lib` in `init_orca_context`.
-
-The FasionMNIST example needs to import the modules from [`model.py`](https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/pytorch/FashionMNIST/model.py).
-
-__Note:__ Please upload the extra Python dependency files to the Volume (e.g. NFS) when running the program on k8s-cluster mode (see __[Section 6.2.2](#id2)__ for more details).
-
-* When using [`python` command](#use-python-command), please specify `extra_python_lib` in `init_orca_context`.
-```python
-init_orca_context(..., extra_python_lib="/path/to/model.py")
-```
-For more details, please see [BigDL Python Dependencies](https://bigdl.readthedocs.io/en/latest/doc/Orca/Overview/orca-context.html#python-dependencies).
-
-
-* When using [`spark-submit`](#use-spark-submit), please specify `--py-files` option in the submit command.    
-```bash
-spark-submit
-    ...
-    --py-files /path/to/model.py
-    ...
-```
-For more details, please see [Spark Python Dependencies](https://spark.apache.org/docs/latest/submitting-applications.html). 
-
-
-* After uploading `model.py` to K8s, you can import this custom module as follows:
-```python
-from model import model_creator, optimizer_creator
-```
-
-
-If your program depends on a nested directory of Python files, you are recommended to follow the steps below to use a zipped package instead.
-
-1. Compress the directory into a zipped package.
-    ```bash
-    zip -q -r FashionMNIST_zipped.zip FashionMNIST
-    ```
-2. Upload the zipped package (`FashionMNIST_zipped.zip`) to K8s by setting `--py-files` or specifying `extra_python_lib` as discussed above.
-
-3. You can then import the custom modules from the unzipped file in your program as follows:
-    ```python
-    from FashionMNIST.model import model_creator, optimizer_creator
-    ```
-
-
----
-## 7. Run Jobs on K8s
-In the remaining part of this tutorial, we will illustrate three ways to submit and run BigDL Orca applications on K8s.
-
-* Use `python` command
-* Use `spark-submit`
-* Use Kubernetes Deployment
-
-You can choose one of them based on your preference or cluster settings.
-
-We provide the running command for the [Fashion-MNIST example](https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/pytorch/FashionMNIST/) in this section.
-
-### 7.1 Use `python` command
-This is the easiest and most recommended way to run BigDL Orca on K8s as a normal Python program.
-
-See [here](#init-orca-context) for the runtime configurations.
-
-#### 7.1.1 K8s-Client
-Run the example with the following command by setting the cluster_mode to "k8s-client":
-```bash
-python train.py --cluster_mode k8s-client --data_dir /bigdl/nfsdata/dataset
-```
-
-
-#### 7.1.2 K8s-Cluster
-Before running the example on k8s-cluster mode in the __Client Container__, you should:
-
-1. Pack the current activate conda environment to an archive:
-    ```bash
-    conda pack -o environment.tar.gz
-    ```
-2. Upload the conda archive to NFS:
-    ```bash
-    cp /path/to/environment.tar.gz /bigdl/nfsdata
-    ```
-3. Upload the Python script (`train.py` in our example) to NFS:
-    ```bash
-    cp /path/to/train.py /bigdl/nfsdata
-    ```
-4. Upload the extra Python dependency files (`model.py` in our example) to NFS:
-    ```bash
-    cp /path/to/model.py /bigdl/nfsdata
-    ```
-
-Run the example with the following command by setting the cluster_mode to "k8s-cluster":
-```bash
-python /bigdl/nfsdata/train.py --cluster_mode k8s-cluster --data_dir /bigdl/nfsdata/dataset
-```
-
-
-### 7.2 Use `spark-submit`
-
-If you prefer to use `spark-submit`, please follow the steps below in the __Client Container__ before submitting the application. . 
-
-1. Download the requirement file(s) from [here](https://github.com/intel-analytics/BigDL/tree/main/python/requirements/orca) and install the required Python libraries of BigDL Orca according to your needs.
-    ```bash
-    pip install -r /path/to/requirements.txt
-    ```
-    Note that you are recommended **NOT** to install BigDL Orca with pip install command in the conda environment if you use spark-submit to avoid possible conflicts.
-
-2. Pack the current activate conda environment to an archive:
-    ```bash
-    conda pack -o environment.tar.gz
-    ```
-
-3. Set the cluster_mode to "spark-submit" in `init_orca_context`:
-    ```python
-    sc = init_orca_context(cluster_mode="spark-submit")
-    ```
-
-Some runtime configurations for Spark are as follows:
-
-* `--master`: a URL format that specifies the Spark master: `k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>`.
-* `--name`: the name of the Spark application.
-* `--conf spark.kubernetes.container.image`: the name of Docker container image for K8s pods. The Docker container image for BigDL is `intelanalytics/bigdl-k8s`.
-* `--num-executors`: the number of executors.
-* `--executor-cores`: the number of cores for each executor.
-* `--total-executor-cores`: the total number of executor cores.
-* `--executor-memory`: the memory for each executor.
-* `--driver-cores`: the number of cores for the driver.
-* `--driver-memory`: the memory for the driver.
-* `--properties-file`: the BigDL configuration properties to be uploaded to K8s.
-* `--py-files`: the extra Python dependency files to be uploaded to K8s.
-* `--archives`: the conda archive to be uploaded to K8s.
-* `--conf spark.driver.extraClassPath`: upload and register BigDL jars files to the driver's classpath.
-* `--conf spark.executor.extraClassPath`: upload and register BigDL jars files to the executors' classpath.
-* `--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName`: specify the claim name of `persistentVolumeClaim` to mount `persistentVolume` into executor pods.
-* `--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path`: specify the path to be mounted as `persistentVolumeClaim` into executor pods.
-
-
-#### 7.2.1 K8s Client
-Submit and run the program for k8s-client mode following the `spark-submit` script below: 
-```bash
-${SPARK_HOME}/bin/spark-submit \
-    --master ${RUNTIME_SPARK_MASTER} \
-    --deploy-mode client \
-    --name orca-k8s-client-tutorial \
-    --conf spark.driver.host=${RUNTIME_DRIVER_HOST} \
-    --conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
-    --num-executors 2 \
-    --executor-cores 4 \
-    --total-executor-cores 8 \
-    --executor-memory 2g \
-    --driver-cores 2 \
-    --driver-memory 2g \
-    --archives /path/to/environment.tar.gz#environment \
-    --conf spark.pyspark.driver.python=python \
-    --conf spark.pyspark.python=environment/bin/python \
-    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
-    --py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,/path/to/model.py \
-    --conf spark.driver.extraClassPath=${BIGDL_HOME}/jars/* \
-    --conf spark.executor.extraClassPath=${BIGDL_HOME}/jars/* \
-    --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
-    --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/bigdl/nfsdata \
-    train.py --cluster_mode spark-submit --data_dir /bigdl/nfsdata/dataset
-```
-
-In the `spark-submit` script:
-* `deploy-mode`: set it to `client` when running programs on k8s-client mode.
-* `--conf spark.driver.host`: the localhost for the driver pod.
-* `--conf spark.pyspark.driver.python`: set the activate Python location in __Client Container__ as the driver's Python environment.
-* `--conf spark.pyspark.python`: set the Python location in the conda archive as each executor's Python environment.
-
-
-#### 7.2.2 K8s Cluster
-
-Before running the example on k8s-cluster mode in the __Client Container__, you should:
-
-1. Upload the conda archive to NFS:
-    ```bash
-    cp /path/to/environment.tar.gz /bigdl/nfsdata
-    ```
-2. Upload the Python script (`train.py` in our example) to NFS:
-    ```bash
-    cp /path/to/train.py /bigdl/nfsdata
-    ```
-3. Upload the extra Python dependency files (`model.py` in our example) to NFS:
-    ```bash
-    cp /path/to/model.py /bigdl/nfsdata
-    ```
-
-Submit and run the program for k8s-cluster mode following the `spark-submit` script below:
-```bash
-${SPARK_HOME}/bin/spark-submit \
-    --master ${RUNTIME_SPARK_MASTER} \
-    --deploy-mode cluster \
-    --name orca-k8s-cluster-tutorial \
-    --conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
-    --conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
-    --num-executors 2 \
-    --executor-cores 4 \
-    --total-executor-cores 8 \
-    --executor-memory 2g \
-    --driver-cores 2 \
-    --driver-memory 2g \
-    --archives /bigdl/nfsdata/environment.tar.gz#environment \
-    --conf spark.pyspark.driver.python=environment/bin/python \
-    --conf spark.pyspark.python=environment/bin/python \
-    --conf spark.kubernetes.file.upload.path=/bigdl/nfsdata \
-    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
-    --py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,/bigdl/nfsdata/train.py,/bigdl/nfsdata/model.py \
-    --conf spark.driver.extraClassPath=${BIGDL_HOME}/jars/* \
-    --conf spark.executor.extraClassPath=${BIGDL_HOME}/jars/* \
-    --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
-    --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/bigdl/nfsdata \
-    --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
-    --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/bigdl/nfsdata \
-    /bigdl/nfsdata/train.py --cluster_mode spark-submit --data_dir /bigdl/nfsdata/dataset
-```
-
-In the `spark-submit` script:
-* `deploy-mode`: set it to `cluster` when running programs on k8s-cluster mode.
-* `--conf spark.kubernetes.authenticate.driver.serviceAccountName`: the service account for the driver pod.
-* `--conf spark.pyspark.driver.python`: set the Python location in the conda archive as the driver's Python environment.
-* `--conf spark.pyspark.python`: also set the Python location in the conda archive as each executor's Python environment.
-* `--conf spark.kubernetes.file.upload.path`: the path to store files at spark submit side in k8s-cluster mode.
-* `--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName`: specify the claim name of `persistentVolumeClaim` to mount `persistentVolume` into the driver pod.
-* `--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path`: specify the path to be mounted as `persistentVolumeClaim` into the driver pod.
-
-
-### 7.3 Use Kubernetes Deployment
-BigDL supports users who want to execute programs directly on __Develop Node__ to run an application by creating a Kubernetes Deployment object.
-After preparing the [Conda environment](#prepare-environment) on the __Develop Node__, follow the steps below before submitting the application.
-
-1. Download the requirement file(s) from [here](https://github.com/intel-analytics/BigDL/tree/main/python/requirements/orca) and install the required Python libraries of BigDL Orca according to your needs.
-    ```bash
-    pip install -r /path/to/requirements.txt
-    ```
-    Note that you are recommended **NOT** to install BigDL Orca with pip install command in the conda environment if you use spark-submit to avoid possible conflicts.
-
-2. Pack the current activate conda environment to an archive before:
-    ```bash
-    conda pack -o environment.tar.gz
-    ```
-
-3. Upload the conda archive, Python script (`train.py` in our example) and extra Python dependency files (`model.py` in our example) to NFS.
-    ```bash
-    cp /path/to/environment.tar.gz /path/to/nfs
-
-    cp /path/to/train.py /path/to/nfs
-
-    cp /path/to/model.py /path/to/nfs
-    ```
-
-4. Set the cluster_mode to "spark-submit" in `init_orca_context`.
-    ```python
-    sc = init_orca_context(cluster_mode="spark-submit")
-    ```
-
-We define a Kubernetes Deployment in a YAML file. Some fields of the YAML are explained as follows:
-
-* `metadata`: a nested object filed that every deployment object must specify.
-    * `name`: a string that uniquely identifies this object and job. We use "orca-pytorch-job" in our example.
-* `restartPolicy`: the restart policy for all containers within the pod. One of Always, OnFailure, Never. Default to be Always.
-* `containers`: a single application container to run within a pod.
-    * `name`: the name of the container. Each container in a pod will have a unique name.
-    * `image`: the name of the BigDL K8s Docker image. Note that you need to change the version accordingly.
-    * `imagePullPolicy`: the pull policy of the Docker image. One of Always, Never and IfNotPresent. Default to be Always if `latest` tag is specified, or IfNotPresent otherwise.
-    * `command`: the command for the containers to run in the pod.
-    * `args`: the arguments to submit the spark application in the pod. See more details in [`spark-submit`](#use-spark-submit).
-    * `securityContext`: the security options the container should be run with.
-    * `env`: a list of environment variables to set in the container, which will be used when submitting the application. Note that you need to change the environment variables including `BIGDL_VERSION` and `BIGDL_HOME` accordingly.
-        * `name`: the name of the environment variable.
-        * `value`: the value of the environment variable.
-    * `volumeMounts`: the paths to mount Volumes into containers.
-        * `name`: the name of a Volume.
-        * `mountPath`: the path in the container to mount the Volume to.
-        * `subPath`: the sub-path within the volume to mount into the container.
-* `volumes`: specify the volumes for the pod. We use NFS as the persistentVolumeClaim in our example.
-
-
-#### 7.3.1 K8s Client
-BigDL has provided an example [orca-tutorial-k8s-client.yaml](https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/pytorch/FashionMNIST/orca-tutorial-k8s-client.yaml) to directly run the Fashion-MNIST example for k8s-client mode.
-The environment variables for Spark (including SPARK_VERSION and SPARK_HOME) and BigDL (including BIGDL_VERSION and BIGDL_HOME) are already configured in the BigDL K8s Docker image.
-
-You need to uncompress the conda archive in NFS before submitting the job:
-```bash
-cd /path/to/nfs
-mkdir environment
-tar -xzvf environment.tar.gz --directory environment
-```
-
-```bash
-apiVersion: batch/v1
-kind: Job
-metadata:
-  name: orca-pytorch-job
-spec:
-  template:
-    spec:
-      serviceAccountName: spark
-      restartPolicy: Never
-      hostNetwork: true
-      containers:
-      - name: spark-k8s-client
-        image: intelanalytics/bigdl-k8s:latest
-        imagePullPolicy: IfNotPresent
-        command: ["/bin/sh","-c"]
-        args: ["
-                export RUNTIME_DRIVER_HOST=$( hostname -I | awk '{print $1}' );
-                ${SPARK_HOME}/bin/spark-submit \
-                --master ${RUNTIME_SPARK_MASTER} \
-                --deploy-mode client \
-                --name orca-k8s-client-tutorial \
-                --conf spark.driver.host=${RUNTIME_DRIVER_HOST} \
-                --conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
-                --num-executors 2 \
-                --executor-cores 4 \
-                --executor-memory 2g \
-                --total-executor-cores 8 \
-                --driver-cores 2 \
-                --driver-memory 2g \
-                --conf spark.pyspark.driver.python=/bigdl/nfsdata/environment/bin/python \
-                --conf spark.pyspark.python=/bigdl/nfsdata/environment/bin/python \
-                --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
-                --py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,/bigdl/nfsdata/model.py \
-                --conf spark.driver.extraClassPath=${BIGDL_HOME}/jars/* \
-                --conf spark.executor.extraClassPath=${BIGDL_HOME}/jars/* \
-                --conf spark.kubernetes.executor.deleteOnTermination=True \
-                --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim \
-                --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path=/bigdl/nfsdata/ \
-                --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim \
-                --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path=/bigdl/nfsdata/ \
-                /bigdl/nfsdata/train.py
-                --cluster_mode spark-submit
-                --data_dir /bigdl/nfsdata/dataset
-                "]
-        securityContext:
-          privileged: true
-        env:
-          - name: RUNTIME_K8S_SPARK_IMAGE
-            value: intelanalytics/bigdl-k8s:latest
-          - name: RUNTIME_SPARK_MASTER
-            value: k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>
-        volumeMounts:
-          - name: nfs-storage
-            mountPath: /bigdl/nfsdata
-          - name: nfs-storage
-            mountPath: /root/.kube/config
-            subPath: kubeconfig
-      volumes:
-      - name: nfs-storage
-        persistentVolumeClaim:
-          claimName: nfsvolumeclaim
-```
-
-
-Submit the application using `kubectl`:
-```bash
-kubectl apply -f orca-tutorial-k8s-client.yaml
-```
-
-Note that you need to delete the job __BEFORE__ re-submitting another one:
-```bash
-kubectl delete job orca-pytorch-job
-```
-
-After submitting the job, you can list all the pods and find the driver pod with name `orca-pytorch-job-xxx`:
-```bash
-kubectl get pods
-kubectl get pods | grep orca-pytorch-job
-```
-
-Retrieve the logs on the driver pod:
-```bash
-kubectl logs orca-pytorch-job-xxx
-```
-
-After the task finishes, delete the job and all related pods if necessary:
-```bash
-kubectl delete job orca-pytorch-job
-```
-
-#### 7.3.2 K8s Cluster
-BigDL has provided an example [orca-tutorial-k8s-cluster.yaml](https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/pytorch/FashionMNIST/orca-tutorial-k8s-cluster.yaml) to run the Fashion-MNIST example for k8s-cluster mode.
-The environment variables for Spark (including SPARK_VERSION and SPARK_HOME) and BigDL (including BIGDL_VERSION and BIGDL_HOME) are already configured in the BigDL K8s Docker image.
-
-```bash
-apiVersion: batch/v1
-kind: Job
-metadata:
-  name: orca-pytorch-job
-spec:
-  template:
-    spec:
-      serviceAccountName: spark
-      restartPolicy: Never
-      hostNetwork: true
-      containers:
-      - name: spark-k8s-cluster
-        image: intelanalytics/bigdl-k8s:latest
-        imagePullPolicy: IfNotPresent
-        command: ["/bin/sh","-c"]
-        args: ["
-                ${SPARK_HOME}/bin/spark-submit \
-                --master ${RUNTIME_SPARK_MASTER} \
-                --name orca-k8s-cluster-tutorial \
-                --deploy-mode cluster \
-                --conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
-                --conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
-                --num-executors 2 \
-                --executor-cores 4 \
-                --total-executor-cores 8 \
-                --executor-memory 2g \
-                --driver-cores 2 \
-                --driver-memory 2g \
-                --archives /bigdl/nfsdata/environment.tar.gz#environment \
-                --conf spark.pyspark.driver.python=environment/bin/python \
-                --conf spark.pyspark.python=environment/bin/python \
-                --conf spark.kubernetes.file.upload.path=/bigdl/nfsdata \
-                --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
-                --py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,/bigdl/nfsdata/train.py,/bigdl/nfsdata/model.py \
-                --conf spark.driver.extraClassPath=${BIGDL_HOME}/jars/* \
-                --conf spark.executor.extraClassPath=${BIGDL_HOME}/jars/* \
-                --conf spark.kubernetes.executor.deleteOnTermination=True \
-                --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim \
-                --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path=/bigdl/nfsdata/ \
-                --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim \
-                --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path=/bigdl/nfsdata/ \
-                /bigdl/nfsdata/train.py
-                --cluster_mode spark-submit
-                --data_dir /bigdl/nfsdata/dataset
-                "]
-        securityContext:
-          privileged: true
-        env:
-          - name: RUNTIME_K8S_SPARK_IMAGE
-            value: intelanalytics/bigdl-k8s:latest
-          - name: RUNTIME_SPARK_MASTER
-            value: k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>
-          - name: RUNTIME_K8S_SERVICE_ACCOUNT
-            value: spark
-        volumeMounts:
-          - name: nfs-storage
-            mountPath: /bigdl/nfsdata
-          - name: nfs-storage
-            mountPath: /root/.kube/config
-            subPath: kubeconfig
-      volumes:
-      - name: nfs-storage
-        persistentVolumeClaim:
-          claimName: nfsvolumeclaim
-```
-
-Submit the application using `kubectl`:
-```bash
-kubectl apply -f orca-tutorial-k8s-cluster.yaml
-```
-
-Note that you need to delete the job __BEFORE__ re-submitting another one:
-```bash
-kubectl delete job orca-pytorch-job
-```
-
-After submitting the job, you can list all the pods and find the driver pod with name `orca-k8s-cluster-tutorial-xxx-driver`.
-```bash
-kubectl get pods
-kubectl get pods | grep orca-k8s-cluster-tutorial
-# Then find the pod of the driver: orca-k8s-cluster-tutorial-xxx-driver
-```
-
-Retrieve the logs on the driver pod:
-```bash
-kubectl logs orca-k8s-cluster-tutorial-xxx-driver
-```
-
-After the task finishes, delete the job and all related pods if necessary:
-```bash
-kubectl delete job orca-pytorch-job
-kubectl delete pod orca-k8s-cluster-tutorial-xxx-driver
-```
diff --git a/docs/readthedocs/source/doc/Orca/Tutorial/yarn.md b/docs/readthedocs/source/doc/Orca/Tutorial/yarn.md
deleted file mode 100644
index 23e8a99d..00000000
--- a/docs/readthedocs/source/doc/Orca/Tutorial/yarn.md
+++ /dev/null
@@ -1,402 +0,0 @@
-# Run on Hadoop/YARN Clusters
-
-This tutorial provides a step-by-step guide on how to run BigDL-Orca programs on Apache Hadoop/YARN clusters, using a [PyTorch Fashion-MNIST program](https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/pytorch/FashionMNIST/) as a working example.
-
-The **Client Node** that appears in this tutorial refer to the machine where you launch or submit your applications.
-
----
-## 1. Basic Concepts
-### 1.1 init_orca_context
-A BigDL Orca program usually starts with the initialization of OrcaContext. For every BigDL Orca program, you should call `init_orca_context` at the beginning of the program as below:
-
-```python
-from bigdl.orca import init_orca_context
-
-sc = init_orca_context(cluster_mode, cores, memory, num_nodes,
-                       driver_cores, driver_memory, extra_python_lib, conf)
-```
-
-In `init_orca_context`, you may specify necessary runtime configurations for running the example on YARN, including:
-* `cluster_mode`: one of `"yarn-client"`, `"yarn-cluster"`, `"bigdl-submit"` or `"spark-submit"` when you run on Hadoop/YARN clusters.
-* `cores`: the number of cores for each executor (default to be `2`).
-* `memory`:  memory for each executor (default to be `"2g"`).
-* `num_nodes`: the number of executors (default to be `1`).
-* `driver_cores`: the number of cores for the driver node (default to be `4`).
-* `driver_memory`: the memory for the driver node (default to be `"2g"`).
-* `extra_python_lib`: the path to extra Python packages, separated by comma (default to be `None`). `.py`, `.zip` or `.egg` files are supported.
-* `conf`: a dictionary to append extra conf for Spark (default to be `None`).
-
-__Note__: 
-* All the arguments __except__ `cluster_mode` will be ignored when using [`bigdl-submit`](#use-bigdl-submit) or [`spark-submit`](#use-spark-submit) to submit and run Orca programs, in which case you are supposed to specify these configurations via the submit command.
-
-After Orca programs finish, you should always call `stop_orca_context` at the end of the program to release resources and shutdown the underlying distributed runtime engine (such as Spark or Ray).
-```python
-from bigdl.orca import stop_orca_context
-
-stop_orca_context()
-```
-
-For more details, please see [OrcaContext](../Overview/orca-context.md).
-
-### 1.2 Yarn-Client & Yarn-Cluster
-The difference between yarn-client mode and yarn-cluster mode is where you run your Spark driver. 
-
-For yarn-client, the Spark driver runs in the client process, and the application master is only used for requesting resources from YARN, while for yarn-cluster the Spark driver runs inside an application master process which is managed by YARN in the cluster.
-
-Please see more details in [Launching Spark on YARN](https://spark.apache.org/docs/latest/running-on-yarn.html#launching-spark-on-yarn).
-
-For **yarn-client** mode, you can directly find the driver logs in the console. 
-
-For **yarn-cluster** mode, an `application_time_id` will be returned (`application_1668477395550_1045` in the following log) when the application master process completes.
-
-```bash
-23/02/15 15:30:26 INFO yarn.Client: Application report for application_1668477395550_1045 (state: FINISHED)
-23/02/15 15:30:26 INFO yarn.Client:
-         client token: N/A
-         diagnostics: N/A
-         ApplicationMaster host: ...
-         ApplicationMaster RPC port: 46652
-         queue: ...
-         start time: 1676446090408
-         final status: SUCCEEDED
-         tracking URL: http://.../application_1668477395550_1045/
-         user: ...
-```
-
-Visit the tracking URL and then click `logs` in the table `ApplicationMaster` to see the driver logs.
-
-### 1.3 Distributed storage on YARN
-__Note__:
-* When you run programs on YARN, you are highly recommended to load/write data from/to a distributed storage (e.g. [HDFS](https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html) or [S3](https://aws.amazon.com/s3/)) instead of the local file system.
-
-The Fashion-MNIST example in this tutorial uses a utility function `get_remote_dir_to_local` provided by BigDL to download datasets and create the PyTorch DataLoader on each executor.
-
-```python
-import torch
-import torchvision
-import torchvision.transforms as transforms
-from bigdl.orca.data.file import get_remote_dir_to_local
-
-def train_data_creator(config, batch_size):
-    transform = transforms.Compose([transforms.ToTensor(),
-                                    transforms.Normalize((0.5,), (0.5,))])
-
-    get_remote_dir_to_local(remote_path="hdfs://path/to/dataset", local_path="/tmp/dataset")
-
-    trainset = torchvision.datasets.FashionMNIST(root="/tmp/dataset", train=True,
-                                                 download=False, transform=transform)
-
-    trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
-                                              shuffle=True, num_workers=0)
-
-    return trainloader
-```
-
----
-## 2. Prepare Environment
-Before running BigDL Orca programs on YARN, you need to properly setup the environment following the steps in this section.
-
-__Note__:
-* When using [`python` command](#use-python-command) or [`bigdl-submit`](#use-bigdl-submit), we would directly use the corresponding `pyspark` (which is a dependency of BigDL Orca) for the Spark environment. Thus to avoid possible conflicts, you *DON'T* need to download Spark by yourself or set the environment variable `SPARK_HOME` unless you use [`spark-submit`](#use-spark-submit). 
-
-
-### 2.1 Setup JAVA & Hadoop Environment
-- See [here](../Overview/install.md#install-java) to prepare Java in your cluster.
-
-- Check the Hadoop setup and configurations of your cluster. Make sure you correctly set the environment variable `HADOOP_CONF_DIR`, which is needed to initialize Spark on YARN:
-    ```bash
-    export HADOOP_CONF_DIR=/path/to/hadoop/conf
-    ```
-
-### 2.2 Install Python Libraries
-- See [here](../Overview/install.md#install-anaconda) to install conda and prepare the Python environment on the __Client Node__.
-
-- See [here](../Overview/install.md#install-bigdl-orca) to install BigDL Orca in the created conda environment. Note that if you use [`spark-submit`](#use-spark-submit), please __SKIP__ this step and __DO NOT__ install BigDL Orca with pip install command in the conda environment.
-
-- You should install all the other Python libraries that you need in your program in the conda environment as well. `torch`, `torchvision` and `tqdm` are needed to run the Fashion-MNIST example:
-    ```bash
-    pip install torch torchvision tqdm
-    ```
-
-
-### 2.3 Run on CDH
-* For [CDH](https://www.cloudera.com/products/open-source/apache-hadoop/key-cdh-components.html) users, the environment variable `HADOOP_CONF_DIR` should be `/etc/hadoop/conf` by default.
-
-* The __Client Node__ may have already installed a different version of Spark than the one installed with BigDL. To avoid conflicts, unset all Spark-related environment variables (you may use use `env | grep SPARK` to find all of them):
-    ```bash
-    unset SPARK_HOME
-    unset ...
-    ```
-
----
-## 3. Prepare Dataset 
-To run the Fashion-MNIST example provided by this tutorial on YARN, you should upload the Fashion-MNIST dataset to a distributed storage (such as HDFS or S3) beforehand.   
-
-First, download the Fashion-MNIST dataset manually on your __Client Node__. Note that PyTorch `FashionMNIST Dataset` requires unzipped files located in `FashionMNIST/raw/` under the dataset folder.
-```bash
-# PyTorch official dataset download link
-git clone https://github.com/zalandoresearch/fashion-mnist.git
-
-# Copy the dataset files to the folder FashionMNIST/raw
-cp /path/to/fashion-mnist/data/fashion/* /path/to/local/data/FashionMNIST/raw
-
-# Extract FashionMNIST archives
-gzip -d /path/to/local/data/FashionMNIST/raw/*
-```
-Then upload it to a distributed storage. Sample command to upload data to HDFS is as follows:
-```bash
-hdfs dfs -put /path/to/local/data/FashionMNIST hdfs://path/to/remote/data
-```
-In the given example, you can specify the argument `--data_dir` to be the directory on a distributed storage for the Fashion-MNIST dataset. The directory should contain `FashionMNIST/raw/train-images-idx3-ubyte` and `FashionMNIST/raw/t10k-images-idx3`.
-
----
-## 4. Prepare Custom Modules
-Spark allows to upload Python files (`.py`), and zipped Python packages (`.zip`) across the cluster by setting `--py-files` option in Spark scripts or specifying `extra_python_lib` in `init_orca_context`.
-
-The FasionMNIST example needs to import the modules from [`model.py`](https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/pytorch/FashionMNIST/model.py).
-* When using [`python` command](#use-python-command), please specify `extra_python_lib` in `init_orca_context`.
-    ```python
-    init_orca_context(..., extra_python_lib="model.py")
-    ```
-
-For more details, please see [BigDL Python Dependencies](https://bigdl.readthedocs.io/en/latest/doc/Orca/Overview/orca-context.html#python-dependencies).
-
-* When using [`bigdl-submit`](#use-bigdl-submit) or [`spark-submit`](#use-spark-submit), please specify `--py-files` option in the submit command.
-    ```bash
-    bigdl-submit # or spark-submit
-        ...
-        --py-files model.py
-        ...
-    ```
-
-For more details, please see [Spark Python Dependencies](https://spark.apache.org/docs/latest/submitting-applications.html). 
-
-* After uploading `model.py` to YARN, you can import this custom module as follows:
-    ```python
-    from model import model_creator, optimizer_creator
-    ```
-
-
-If your program depends on a nested directory of Python files, you are recommended to follow the steps below to use a zipped package instead.
-
-1. Compress the directory into a zipped package.
-    ```bash
-    zip -q -r FashionMNIST_zipped.zip FashionMNIST
-    ```
-2. Upload the zipped package (`FashionMNIST_zipped.zip`) to YARN by setting `--py-files` or specifying `extra_python_lib` as discussed above.
-
-3. You can then import the custom modules from the unzipped file in your program as follows:
-    ```python
-    from FashionMNIST.model import model_creator, optimizer_creator
-    ```
-
----
-## 5. Run Jobs on YARN
-In the remaining part of this tutorial, we will illustrate three ways to submit and run BigDL Orca applications on YARN.
-
-* Use `python` command
-* Use `bigdl-submit`
-* Use `spark-submit`
-
-You can choose one of them based on your preference or cluster settings.
-
-We provide the running command for the [Fashion-MNIST example](https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/pytorch/FashionMNIST/) on the __Client Node__ in this section.
-
-### 5.1 Use `python` Command
-This is the easiest and most recommended way to run BigDL Orca on YARN as a normal Python program. Using this way, you only need to prepare the environment on the __Client Node__ and the environment would be automatically packaged and distributed to the YARN cluster. 
-
-See [here](#init-orca-context) for the runtime configurations.
-
-
-#### 5.1.1 Yarn Client
-Run the example with the following command by setting the cluster_mode to "yarn-client":
-```bash
-python train.py --cluster_mode yarn-client --data_dir hdfs://path/to/remote/data
-```
-
-
-#### 5.1.2 Yarn Cluster
-Run the example with the following command by setting the cluster_mode to "yarn-cluster":
-```bash
-python train.py --cluster_mode yarn-cluster --data_dir hdfs://path/to/remote/data
-```
-
-
-#### 5.1.3 Jupyter Notebook
-You can easily run the example in a Jupyter Notebook using __`yarn-client` mode__. Launch the notebook using the following command:
-```bash
-jupyter notebook --notebook-dir=/path/to/notebook/directory --ip=* --no-browser
-```
-
-You can copy the code in [train.py](https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/pytorch/FashionMNIST/train.py) to the notebook and run the cells. Set the cluster_mode to "yarn-client" in `init_orca_context`.
-```python
-sc = init_orca_context(cluster_mode="yarn-client", cores=4, memory="2g", num_nodes=2, 
-                       driver_cores=2, driver_memory="2g",
-                       extra_python_lib="model.py")
-```
-Note that Jupyter Notebook cannot run on `yarn-cluster` mode, as the driver is not running on the __Client Node__ (where you run the notebook).
-
-
-### 5.2 Use `bigdl-submit`
-For users who want to use a script instead of Python command, BigDL provides an easy-to-use `bigdl-submit` script, which could automatically setup BigDL configuration and jars files from the current activate conda environment.
-
-Set the cluster_mode to "bigdl-submit" in `init_orca_context`.
-```python
-sc = init_orca_context(cluster_mode="bigdl-submit")
-```
-
-Pack the current activate conda environment to an archive on the __Client Node__ before submitting the example:
-```bash
-conda pack -o environment.tar.gz
-```
-
-Some runtime configurations for `bigdl-submit` are as follows:
-
-* `--master`: the spark master, set it to "yarn".
-* `--num_executors`: the number of executors.
-* `--executor-cores`: the number of cores for each executor.
-* `--executor-memory`: the memory for each executor.
-* `--driver-cores`: the number of cores for the driver.
-* `--driver-memory`: the memory for the driver.
-* `--py-files`: the extra Python dependency files to be uploaded to YARN.
-* `--archives`: the conda archive to be uploaded to YARN.
-
-#### 5.2.1 Yarn Client
-Submit and run the example for `yarn-client` mode following the `bigdl-submit` script below:
-```bash
-bigdl-submit \
-    --master yarn \
-    --deploy-mode client \
-    --num-executors 2 \
-    --executor-cores 4 \
-    --executor-memory 2g \
-    --driver-cores 2 \
-    --driver-memory 2g \
-    --py-files model.py \
-    --archives /path/to/environment.tar.gz#environment \
-    --conf spark.pyspark.driver.python=python \
-    --conf spark.pyspark.python=environment/bin/python \
-    train.py --cluster_mode bigdl-submit --data_dir hdfs://path/to/remote/data
-```
-In the `bigdl-submit` script:
-* `--deploy-mode`: set it to `client` when running programs on yarn-client mode.
-* `--conf spark.pyspark.driver.python`: set the activate Python location on __Client Node__ as the driver's Python environment.
-* `--conf spark.pyspark.python`: set the Python location in the conda archive as each executor's Python environment.
-
-
-#### 5.2.2 Yarn Cluster
-Submit and run the program for `yarn-cluster` mode following the `bigdl-submit` script below: 
-```bash
-bigdl-submit \
-    --master yarn \
-    --deploy-mode cluster \
-    --num-executors 2 \
-    --executor-cores 4 \
-    --executor-memory 2g \
-    --driver-cores 2 \
-    --driver-memory 2g \
-    --py-files model.py \
-    --archives /path/to/environment.tar.gz#environment \
-    --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=environment/bin/python \
-    --conf spark.executorEnv.PYSPARK_PYTHON=environment/bin/python \
-    train.py --cluster_mode bigdl-submit --data_dir hdfs://path/to/remote/data
-```
-In the `bigdl-submit` script:
-* `--deploy-mode`: set it to `cluster` when running programs on yarn-cluster mode.
-* `--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON`: set the Python location in the conda archive as the Python environment of the Application Master.
-* `--conf spark.executorEnv.PYSPARK_PYTHON`: also set the Python location in the conda archive as each executor's Python environment. The Application Master and the executors will all use the archive for the Python environment.
-
-
-### 5.3 Use `spark-submit`
-If you prefer to use `spark-submit` instead of `bigdl-submit`, please follow the steps below to prepare the environment on the __Client Node__. 
-
-1. Download the requirement file(s) from [here](https://github.com/intel-analytics/BigDL/tree/main/python/requirements/orca) and install the required Python libraries of BigDL Orca according to your needs.
-    ```bash
-    pip install -r /path/to/requirements.txt
-    ```
-    Note that you are recommended **NOT** to install BigDL Orca with pip install command in the conda environment if you use spark-submit to avoid possible conflicts.
-
-    If you are using `requirements_ray.txt`, you need to additionally install `ray[default]` with version 1.9.2 in your environment.
-
-2. Pack the current activate conda environment to an archive:
-    ```bash
-    conda pack -o environment.tar.gz
-    ```
-
-3. Download the BigDL assembly package from [here](../Overview/install.html#download-bigdl-orca) and unzip it. Then setup the environment variables `${BIGDL_HOME}` and `${BIGDL_VERSION}`.
-    ```bash
-    export BIGDL_VERSION="downloaded BigDL version"
-    export BIGDL_HOME=/path/to/unzipped_BigDL  # the folder path where you extract the BigDL package
-    ```
-
-4. Download and extract [Spark](https://archive.apache.org/dist/spark/). BigDL is currently released for [Spark 2.4](https://archive.apache.org/dist/spark/spark-2.4.6/spark-2.4.6-bin-hadoop2.7.tgz) and [Spark 3.1](https://archive.apache.org/dist/spark/spark-3.1.3/spark-3.1.3-bin-hadoop2.7.tgz). Make sure the version of your downloaded Spark matches the one that your downloaded BigDL is released with. Then setup the environment variables `${SPARK_HOME}` and `${SPARK_VERSION}`.
-    ```bash
-    export SPARK_VERSION="downloaded Spark version"
-    export SPARK_HOME=/path/to/uncompressed_spark  # the folder path where you extract the Spark package
-    ```
-
-5. Set the cluster_mode to "spark-submit" in `init_orca_context`:
-    ```python
-    sc = init_orca_context(cluster_mode="spark-submit")
-    ```
-
-Some runtime configurations for `spark-submit` are as follows:
-
-* `--master`: the spark master, set it to "yarn".
-* `--num_executors`: the number of executors.
-* `--executor-cores`: the number of cores for each executor.
-* `--executor-memory`: the memory for each executor.
-* `--driver-cores`: the number of cores for the driver.
-* `--driver-memory`: the memory for the driver.
-* `--py-files`: the extra Python dependency files to be uploaded to YARN.
-* `--archives`: the conda archive to be uploaded to YARN.
-* `--properties-file`: the BigDL configuration properties to be uploaded to YARN.
-* `--jars`: upload and register BigDL jars to YARN.
-
-#### 5.3.1 Yarn Client
-Submit and run the program for `yarn-client` mode following the `spark-submit` script below: 
-```bash
-${SPARK_HOME}/bin/spark-submit \
-    --master yarn \
-    --deploy-mode client \
-    --num-executors 2 \
-    --executor-cores 4 \
-    --executor-memory 2g \
-    --driver-cores 2 \
-    --driver-memory 2g \
-    --archives /path/to/environment.tar.gz#environment \
-    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
-    --conf spark.pyspark.driver.python=python \
-    --conf spark.pyspark.python=environment/bin/python \
-    --py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,model.py \
-    --jars ${BIGDL_HOME}/jars/bigdl-assembly-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
-    train.py --cluster_mode spark-submit --data_dir hdfs://path/to/remote/data
-```
-In the `spark-submit` script:
-* `--deploy-mode`: set it to `client` when running programs on yarn-client mode.
-* `--conf spark.pyspark.driver.python`: set the activate Python location on __Client Node__ as the driver's Python environment.
-* `--conf spark.pyspark.python`: set the Python location in the conda archive as each executor's Python environment.
-
-#### 5.3.2 Yarn Cluster
-Submit and run the program for `yarn-cluster` mode following the `spark-submit` script below:
-```bash
-${SPARK_HOME}/bin/spark-submit \
-    --master yarn \
-    --deploy-mode cluster \
-    --num-executors 2 \
-    --executor-cores 4 \
-    --executor-memory 2g \
-    --driver-cores 2 \
-    --driver-memory 2g \
-    --archives /path/to/environment.tar.gz#environment \
-    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
-    --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=environment/bin/python \
-    --conf spark.executorEnv.PYSPARK_PYTHON=environment/bin/python \
-    --py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,model.py \
-    --jars ${BIGDL_HOME}/jars/bigdl-assembly-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
-    train.py --cluster_mode spark-submit --data_dir hdfs://path/to/remote/data
-```
-In the `spark-submit` script:
-* `--deploy-mode`: set it to `cluster` when running programs on yarn-cluster mode.
-* `--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON`: set the Python location in the conda archive as the Python environment of the Application Master.
-* `--conf spark.executorEnv.PYSPARK_PYTHON`: also set the Python location in the conda archive as each executor's Python environment. The Application Master and the executors will all use the archive for the Python environment.
diff --git a/docs/readthedocs/source/doc/Orca/index.rst b/docs/readthedocs/source/doc/Orca/index.rst
deleted file mode 100644
index a947207b..00000000
--- a/docs/readthedocs/source/doc/Orca/index.rst
+++ /dev/null
@@ -1,64 +0,0 @@
-BigDL-Orca
-=========================
-
-Most AI projects start with a Python notebook running on a single laptop; however, one usually needs to go through a mountain of pains to scale it to handle larger data set in a distributed fashion. The  **BigDL-Orca** (or **Orca** for short) library seamlessly scales out your single node Python notebook across large clusters (so as to process distributed Big Data).
-
-
--------
-
-
-.. grid:: 1 2 2 2
-    :gutter: 2
-
-    .. grid-item-card::
-
-        **Get Started**
-        ^^^
-
-        For those who are new to Orca.
-
-        +++
-        :bdg-link:`Orca in 5 minutes <./Overview/orca.html>` |
-        :bdg-link:`Installation <./Overview/install.html>`
-
-    .. grid-item-card::
-
-        **Tutorials**
-        ^^^
-
-        Quick examples to get familiar with Orca and step-by-step tutorials to run Orca on large clusters.
-
-        +++
-
-        :bdg-link:`How-to Guides <./Howto/index.html>` |
-        :bdg-link:`Tutorials <./Tutorial/index.html>`
-
-    .. grid-item-card::
-
-        **Key Features Guide**
-        ^^^
-
-        In-depth information, concepts and knowledge about the key features in Orca.
-
-        +++
-        
-        :bdg-link:`Data <./Overview/data-parallel-processing.html>` |
-        :bdg-link:`Estimator <./Overview/distributed-training-inference.html>` |
-        :bdg-link:`RayOnSpark <./Overview/ray.html>`
-
-    .. grid-item-card::
-
-        **API Document**
-        ^^^
-
-        Detailed descriptions of Orca APIs.
-
-        +++
-
-        :bdg-link:`API Document <../PythonAPI/Orca/index.html>`
-
-
-..  toctree::
-    :hidden:
-
-    BigDL-Orca Document <self>
diff --git a/docs/readthedocs/source/doc/PPML/Dev/python_test.md b/docs/readthedocs/source/doc/PPML/Dev/python_test.md
deleted file mode 100644
index 057c99b1..00000000
--- a/docs/readthedocs/source/doc/PPML/Dev/python_test.md
+++ /dev/null
@@ -1,61 +0,0 @@
-# PPML Python Test Develop Guide
-
-### Write a test
-All tests locate in `python/ppml/test`.
-
-#### Single party test
-Writing a single party test is just the same as running PPML pipeline, for example, a simple FGBoostRegression pipeline
-```python
-fl_server = FLServer()
-fl_server.build()
-fl_server.start()
-
-init_fl_context()
-fgboost_regression = FGBoostRegression()
-fgboost_regression.fit(...)
-```
-#### Multiple party test
-There are some extra steps for multiple party test, `python/ppml/test/bigdl/ppml/algorithms/test_fgboost_regression.py` could be refered as an example.
-
-Multiple party test requires multiprocessing package. Import the package by
-```
-import multiprocessing
-```
-and set the subprocess create config in your class method
-```python
-class YourTest(unittest.TestCase):
-    @classmethod
-    def setUpClass(cls) -> None:
-        multiprocessing.set_start_method('spawn') 
-```
-And define the function of subprocess
-```python
-def mock_process(arg1, arg2):
-    init_fl_context()
-    algo = Algo() # The algorithm to test
-    algo.fit(...)
-```
-and start the process in test method
-```python
-mock_party1 = Process(target=mock_process, args=(v1, v2))
-mock_party1.start()
-```
-
-### Debug a test
-#### How it works
-BigDL uses Spark integrated Py4J to do Python call Java. 
-
-Spark starts the JVM when PySpark code create the SparkContext. This method use a Popen subprocess to call `spark-submit`, which call `spark-class`, and call `java`
-
-#### Set JVM debug mode
-First, direct to the `spark-class` file (as there may be multiple class in different packages or copied by python setup during installation) called by PySpark, this could be get by adding a breakpoint after `command = [os.path.join(SPARK_HOME, script)]` in `java_gateway.py` in PySpark lib.
-
-To enable debug, add the JVM args in `spark-class` when call `java`, in the last line `CMD`, change following
-```
-CMD=("${CMD[@]:0:$LAST}")
-```
-to
-```
-CMD=("${CMD[0]}" -agentlib:jdwp=transport=dt_socket,server=y,address=4000,suspend=n "${CMD[@]:1:$LAST}")
-```
-And in IDEA, create a Run Configuration remote JVM profile. The IDEA will create the VM args automatically.
diff --git a/docs/readthedocs/source/doc/PPML/Dev/scala_test.md b/docs/readthedocs/source/doc/PPML/Dev/scala_test.md
deleted file mode 100644
index e69de29b..00000000
diff --git a/docs/readthedocs/source/doc/PPML/Overview/ali_ecs_occlum_cn.md b/docs/readthedocs/source/doc/PPML/Overview/ali_ecs_occlum_cn.md
deleted file mode 100644
index 25174bbe..00000000
--- a/docs/readthedocs/source/doc/PPML/Overview/ali_ecs_occlum_cn.md
+++ /dev/null
@@ -1,535 +0,0 @@
-# BigDL PPML Occlum阿里云ECS中文开发文档
-
-## 概要
-
-本文档介绍了如何用BigDL PPML和Occlum实现基于SGX的端到端、分布式的隐私计算应用。BigDL PPML和Occlum为开发人员提供了一个简单易用的隐私计算环境。开发人员可以将现有应用无缝迁移到该运行环境中，实现端到端安全，并且可验证的分布式隐私计算。该类应用的计算性能接近明文计算，并且可以横向拓展以支持大规模数据集。
-
-文档分为以下几部分：
-1. 环境部署。介绍基于阿里云的PPML基本的环境部署和依赖安装。
-2. 快速上手。介绍迁移或者开发新的隐私计算应用的基本流程。
-3. 应用部署。介绍如何将PPML应用部署到生产环境中。
-4. 背景知识。介绍SGX、Occlum和BigDL PPML的基本概念。
-
-使用建议：
-* 建议1234或者4123的顺序阅读本文档。
-* 如果要将应用部署到生成环境中，请和管理员确认3和4中的内容是否符合内部的安全策略。
-
-
-![PPML基本架构](../images/spark_sgx_occlum.png)
-
-
-## 1. 环境部署
-
-以下以阿里云环境为例，如果是基于裸金属机器搭建，请参考附录。
-首先，我们需要一个安装了SGX Plugin的K8S集群环境。在本例中，我们在阿里云申请了两台g7t的ECS实例（ecs.g7t.4xlarge），基本配置如下。
-| CPU核数  | 内存 | 安全内存（EPC）| 操作系统 |
-| ------------- | ------------- | ------------- | ------------- |
-| 32  | 64GB  | 32GB | Ubuntu 20.04 LTS 2 |
-
-用户也可以根据自己的需求申请不同配置的ECS安全实例。
-VM OS选择Ubuntu20.04 LTS2, 这也是Occlum标准release所基于的操作系统。
-另外，系统内核需要升级以支持SGX。
-
-```bash
-sudo apt install --install-recommends linux-generic-hwe-20.04
-```
-
-然后，需要在每台实例上配置安装K8S环境，并配置安装K8S SGX plugin。
-细节不再赘述，用户可以参考技术文档《在K8S上部署可扩展的基于Occlum的安全推理实例》或者附录的相关部分。
-
-## 2. 快速上手
-
-本章会介绍PPML基本概念，以及如何用BigDL PPML occlum image在SGX中执行应用程序。
-需要注意的是：为了简化上手流程，我们会在运行环境中编译和运行SGX enclave；这种运行方式会有安全风险，仅能用于开发和测试，实际部署需要参照后面的生产环境部署章节。
-
-### 2.1 基本概念
-
-SGX应用需要编译(build)成SGX enclave，才能加载到SGX中运行。通常情况下，开发人员需要用SGX SDK重新编写应用，才能编译成合法的enclave，但这样的开发代价较大，维护成本也较高。为了避免上述问题，我们可以用Occlum实现应用的无缝迁移。Occlum是为SGX开发的LibOS应用，它可以将应用的系统调用翻译成SGX可以识别的调用，从而避免修改应用。BigDL PPML在Occlum的基础上，又进行了一次封装和优化，使得大数据应用，如Spark/Flink可以无缝的运行在SGX上。
-
-![SGX enclave结构](../images/ppml_sgx_enclave.png)
-
-作为硬件级的可信执行环境，SGX的攻击面非常小，攻击者即使攻破操作系统和BIOS也无法获取SGX中的应用和数据。但在端到端的应用中，用户还需要确保其他阶段的安全性。简而言之，用户需要确保数据或者文件在SGX外部是加密的，仅在SGX内部被解密和计算，如下图所示。为了实现这个目的，我们往往需要借助密钥管理服务 (Key Management Service, KMS) 的帮助。用户可以将密钥托管到KMS，等应用在SGX中启动后，再从KMS申请和下载密钥。
-
-![SGX应用设计原则](../images/ppml_dev_basic.png)
-
-PPML项目的核心功能是帮助用户迁移现有的应用，用户可以选择迁移现有的大数据AI应用，也可以选择开发全新的应用。PPML应用的开发和常规应用基本相同。例如PySpark的应用代码和常规应用并没有区别。但在设计、编译和部署时有一定的差异。具体表现为：
-* 设计时需要考虑加解密流程，确保明文数据只出现在SGX内部
-* 编译时，需要通过Occlum将应用编译成SGX enclave
-* 部署时，需要将SGX enclave部署到有SGX环境的节点
-
-在剩下的章节中，我们以PySpark运行SQL和sklearn求线性回归方程为例，介绍如何
-* 通过docker部署单机PySpark应用。
-* 通过K8S部署分布式PySpark应用。
-
-前者主要针对小数据量的单机环境，后者主要针对大数据量的分布式环境。
-
-### 2.2 PySpark执行SQL任务
-
-SparkSQL是Spark生态中的核心功能之一。通过Spark提供的SQL接口，数据分析师和开发人员可以通撰写简单的SQL语句实现对TB/PB级别数据的高效查询。在下面的例子中，我们将介绍如何通过Python格式的SQL文件，查询大规模数据。
-
-#### 2.2.1 部署运行在docker容器中
-1.	配置合适的资源，启动运行脚本`start-spark-local.sh`进入docker image中。
-
-```bash
-# Clean up old container
-sudo docker rm -f bigdl-ppml-trusted-big-data-ml-scala-occlum
-```
-
-```bash
-# Run new command in container
-sudo docker run -it --rm \
---net=host \
---name=bigdl-ppml-trusted-big-data-ml-scala-occlum \
---cpuset-cpus 10-14 \
---device=/dev/sgx/enclave \  #需提前配置好的sgx环境
---device=/dev/sgx/provision \
--v /var/run/aesmd:/var/run/aesmd \
--v data:/opt/occlum_spark/data \
--e SGX_MEM_SIZE=24GB \   #EPC即使用的SGX内存大小
--e SGX_THREAD=1024 \
--e SGX_HEAP=1GB \
--e SGX_KERNEL_HEAP=1GB \
--e SGX_LOG_LEVEL=off \intelanalytics/bigdl-ppml-trusted-big-data-ml-scala-occlum:2.2.0-SNAPSHOT \
-bash
-```
-2.	编写python源码，如sql_example.py 并将其放置在image的目录py-examples下
-3.	修改/opt/run_spark_on_occlum_glibc.sh文件，设置程序启动入口。
-```bash
-run_pyspark_sql_example() {
-    init_instance spark  #执行occlum init初始化occlum文件结构并设置对应配置
-    build_spark #拷贝依赖并执行occlum build 构建可执行程序
-    cd /opt/occlum_spark
-    echo -e "${BLUE}occlum run pyspark SQL example${NC}"
-    occlum run /usr/lib/jvm/java-8-openjdk-amd64/bin/java \
-                -XX:-UseCompressedOops \
-                -XX:ActiveProcessorCount=4 \
-                -Divy.home="/tmp/.ivy" \
-                -Dos.name="Linux" \
-                -Djdk.lang.Process.launchMechanism=vfork \
-                -cp "$SPARK_HOME/conf/:$SPARK_HOME/jars/*" \
-                -Xmx3g org.apache.spark.deploy.SparkSubmit \ #选择合适的jvm大小
-                /py-examples/sql_example.py  #新添加的文件位置
-}
-
-# switch case in the last
-    pysql)
-        run_pyspark_sql_example
-        cd ../
-        ;;
-```
-
-4.	运行PySpark SQL example在container里
-```bash
-bash  /opt/run_spark_on_occlum_glibc.sh pysql
-```
-
-注： 脚本里的build_spark是做”occlum build”来生成Occlum可执行的镜像，这一步骤会耗费不少时间（数分钟左右），请耐心等待。
-非即时部署需提前配置源码和程序入口，并将步骤1的最后一行改为 bash /opt/run_spark_on_occlum_glibc.sh $1，即可直接通过运行bash start-spark-local.sh pysql 启动运行SQL example。
-
-#### 2.2.2 将PySpark SQL任务部署运行在k8s集群中
-
-##### 前提条件：
-1.	阿里云实例上k8s集群已经配置好，k8s SGX device plugin已经安装好。
-设置环境变量`kubernetes_master_url`。
-```bash
-export kubernetes_master_url=${master_node_ip}
-```
-
-2.	阿里云实例上安装spark client工具（以3.1.2版本为例），用于提交spark任务。
-```bash
-wget https://downloads.apache.org/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz
-sudo mkdir /opt/spark
-sudo tar -xf spark*.tgz -C /opt/spark --strip-component 1
-sudo chmod -R 777 /opt/spark
-export SPARK_HOME=/opt/spark
-```
-
-3.	下载BigDL的代码，为后续的修改做准备。
-```bash
-git clone https://github.com/intel-analytics/BigDL.git
-```
-
-接下来的改动位于路径`BigDL/ppml/trusted-big-data-ml/scala/docker-occlum/kubernetes`。
-
-##### 运行步骤：
-1.	配置合适的资源在driver.yml和executor.yaml中
-
-```yaml
-#driver.yaml 同executor.yaml
-    env:
-    - name: DRIVER_MEMORY
-      value: "1g"
-    - name: SGX_MEM_SIZE   #EPC即使用的SGX内存大小
-      value: "15GB"
-    - name: SGX_THREAD
-      value: "1024"
-    - name: SGX_HEAP
-      value: "1GB"
-    - name: SGX_KERNEL_HEAP
-      value: "1GB"
-```
-2.	运行脚本 run_pyspark_sql_example.sh，需提前配置好Spark和K8s环境。
-
-```bash
-${SPARK_HOME}/bin/spark-submit \
-    --master k8s://https://${kubernetes_master_url}:6443 \
-    --deploy-mode cluster \
-    --name pyspark-sql \
-    --conf spark.executor.instances=1 \
-    --conf spark.rpc.netty.dispatcher.numThreads=32 \
-    --conf spark.kubernetes.container.image=intelanalytics/bigdl-ppml-trusted-big-data-ml-scala-occlum:2.2.0-SNAPSHOT \
-    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
-    --conf spark.kubernetes.executor.deleteOnTermination=false \
-    --conf spark.kubernetes.driver.podTemplateFile=./driver.yaml \ #资源配置
-    --conf spark.kubernetes.executor.podTemplateFile=./executor.yaml \ #资源配置
-    --conf spark.kubernetes.sgx.log.level=off \
-    --executor-memory 1g \
-    --conf spark.kubernetes.driverEnv.SGX_DRIVER_JVM_MEM_SIZE="1g" \
-    --conf spark.executorEnv.SGX_EXECUTOR_JVM_MEM_SIZE="6g" \
-    local:/py-examples/sql_example.py
-    # hdfs://ServerIP:Port/path/sql_example.py
-```
-注:若用云存储或HDFS或者云存储传入源文件则无需提前在image里传入源文件。
-
-### 2.3 PySpark运行sklearn LinearRegression
-
-#### 2.3.1 部署运行在docker容器中
-1.	配置合适的资源，启动运行脚本 start-spark-local.sh 进入docker image中。
-
-```bash
-# Clean up old container
-sudo docker rm -f bigdl-ppml-trusted-big-data-ml-scala-occlum
-```
-
-```bash
-# Run new command in container
-sudo docker run -it --rm \
---net=host \
---name=bigdl-ppml-trusted-big-data-ml-scala-occlum \
---cpuset-cpus 10-14 \
---device=/dev/sgx/enclave \  #需提前配置好的sgx环境
---device=/dev/sgx/provision \
--v /var/run/aesmd:/var/run/aesmd \
--v data:/opt/occlum_spark/data \
--e SGX_MEM_SIZE=24GB \   #EPC即使用的SGX内存大小
--e SGX_THREAD=1024 \
--e SGX_HEAP=1GB \
--e SGX_KERNEL_HEAP=1GB \
--e SGX_LOG_LEVEL=off \intelanalytics/bigdl-ppml-trusted-big-data-ml-scala-occlum:2.2.0-SNAPSHOT \
-bash
-```
-
-2.	编写python源码，如sklearn_example.py ， 并将其放置在image的目录py-examples下。
-
-```python
-# sklearn_example.py
-import numpy as np
-from sklearn.linear_model import LinearRegression
-from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error, median_absolute_error
- 
-# Generate test data:
-nSample = 100
-x = np.linspace(0, 10, nSample)
-e = np.random.normal(size=len(x))
-y = 2.36 + 1.58 * x + e  # y = b0 + b1*x1
- 
-x = x.reshape(-1, 1)
-y = y.reshape(-1, 1)
-# print(x.shape,y.shape)
- 
-# OLS
-modelRegL = LinearRegression()
-modelRegL.fit(x, y)
-yFit = modelRegL.predict(x)
- 
-print('intercept: w0={}'.format(modelRegL.intercept_))
-print('coef: w1={}'.format(modelRegL.coef_))
- 
-print('R2_score ：{:.4f}'.format(modelRegL.score(x, y)))
-print('mean_squared_error：{:.4f}'.format(mean_squared_error(y, yFit)))
-print('mean_absolute_error：{:.4f}'.format(mean_absolute_error(y, yFit)))
-print('median_absolute_error：{:.4f}'.format(median_absolute_error(y, yFit)))
-```
-
-3.	修改/opt/run_spark_on_occlum_glibc.sh文件，设置程序启动入口。
-```bash
-run_pyspark_sklearn_example() {
-    init_instance spark  #执行occlum init初始化occlum文件结构并设置对应配置
-    build_spark #拷贝依赖并执行occlum build 构建可执行程序
-    cd /opt/occlum_spark
-    echo -e "${BLUE}occlum run pyspark sklearn example${NC}"
-    occlum run /usr/lib/jvm/java-8-openjdk-amd64/bin/java \
-                -XX:-UseCompressedOops \
-                -XX:ActiveProcessorCount=4 \
-                -Divy.home="/tmp/.ivy" \
-                -Dos.name="Linux" \
-                -Djdk.lang.Process.launchMechanism=vfork \
-                -cp "$SPARK_HOME/conf/:$SPARK_HOME/jars/*" \
-                -Xmx3g org.apache.spark.deploy.SparkSubmit \ #选择合适的jvm大小
-                /py-examples/sklearn_example.py  #新添加的文件位置
-}
-
-# switch case in the last
-    pysql)
-        run_pyspark_sklearn_example
-        cd ../
-        ;;
-```
-
-4.	运行PySpark sklearn example在container里
-```bash
-bash /opt/run_spark_on_occlum_glibc.sh pysklearn
-```
-
-注： 脚本里的build_spark是做”occlum build”来生成Occlum可执行的镜像，这一步骤会耗费不少时间（数分钟左右），请耐心等待。
-非即时部署需提前配置源码和程序入口，并将步骤1的最后一行改为 bash /opt/run_spark_on_occlum_glibc.sh $1，即可直接通过运行bash start-spark-local.sh pysklearn 启动运行 sklearn example。
-
-#### 2.3.2 部署运行在k8s集群中
-
-**前提条件**参考前述章节的配置。
-运行步骤：
-1.	配置合适的资源在driver.yml和executor.yaml中
-```yaml
-#driver.yaml 同executor.yaml
-    env:
-    - name: DRIVER_MEMORY
-      value: "1g"
-    - name: SGX_MEM_SIZE   #EPC即使用的SGX内存大小
-      value: "15GB"
-    - name: SGX_THREAD
-      value: "1024"
-    - name: SGX_HEAP
-      value: "1GB"
-    - name: SGX_KERNEL_HEAP
-      value: "1GB"
-```
-2.	运行脚本 run_pyspark_sklearn_example.sh,需配置Spark和K8s环境。
-```bash
-${SPARK_HOME}/bin/spark-submit \
-    --master k8s://https://${kubernetes_master_url}:6443 \
-    --deploy-mode cluster \
-    --name pyspark-sql \
-    --conf spark.executor.instances=1 \
-    --conf spark.rpc.netty.dispatcher.numThreads=32 \
-    --conf spark.kubernetes.container.image=intelanalytics/bigdl-ppml-trusted-big-data-ml-scala-occlum:2.2.0-SNAPSHOT \
-    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
-    --conf spark.kubernetes.executor.deleteOnTermination=false \
-    --conf spark.kubernetes.driver.podTemplateFile=./driver.yaml \ #资源配置
-    --conf spark.kubernetes.executor.podTemplateFile=./executor.yaml \ #资源配置
-    --conf spark.kubernetes.sgx.log.level=off \
-    --executor-memory 1g \
-    --conf spark.kubernetes.driverEnv.SGX_DRIVER_JVM_MEM_SIZE="1g" \
-    --conf spark.executorEnv.SGX_EXECUTOR_JVM_MEM_SIZE="6g" \
-    local:/py-examples/sklearn_example.py
-    # hdfs://ServerIP:Port/path/sklearn_example.py
-```
-
-注:若用云存储或者HDFS传入源文件则无需提前在image里传入源文件。
-
-## 3. 生产环境部署
-
-与快速上手阶段不同，生产部署需要考虑完整的数据流和密钥安全，并且需要根据现有的服务和设施进行对接。考虑到用户所用的服务有所差异，我们以开源和云服务为案例，介绍部署和配置KMS的基本过程；以及在安全环境中，构建生成环境中所需的image。
-安装和配置KMS
-KMS是SGX应用部署中的核心服务。用户可以直接使用阿里云提供的KMS，并配合云存储实现数据的透明加解密服务，详情请参照《对象存储客户端加密》。通过运行在SGX中的客户端加解密数据，可以保证明文数据只出现在SGX中。其他开源的分布式存储，例如HDFS也提供了类似的方案，请参考Hadoop官方文档配置HDFS透明加密，这里不再赘述。
-为了提升安全水位，我们提供了带TEE 保护的开源KMS的部署方式供用户参考。即EHSM（运行在SGX中的KMS）。
-
-### 3.1 安装和配置EHSM
-安装EHSM的教程请参照文档[Deploy BigDL-eHSM-KMS on Kubernetes](https://github.com/intel-analytics/BigDL/tree/main/ppml/services/ehsm/kubernetes)。
-**使用PPMLContext和EHSM实现输入输出数据加解密。**基本流程如下：
-1.	按照EHSM教程配置好PCCS和EHSM等环境。
-注意因为是部署在阿里云上，阿里云有可用的PCCS服务，所以对于教程里的第一步“Deploy BigDL-PCCS on Kubernetes”可以忽略。
-2.	注册获取app_id和api_key。
-```bash
-# Enroll
-curl -v -k -G "https://<kms_ip>:9000/ehsm?Action=Enroll"
-......
-
-{"code":200,"message":"successful","result":{"apikey":"E8QKpBBapaknprx44FaaTY20rptg54Sg","appid":"8d5dd3b8-3996-40f5-9785-dcb8265981ba"}}
-```
-3.	填入相关参数，启动运行脚本 start-spark-local.sh 进入docker image。
-
-其中，参数PCCS_URL可以根据阿里云安全增强型实例所在区域，设置为相对应的地址，细节请参考阿里云文档。
-```bash
-# Clean up old container
-sudo docker rm -f bigdl-ppml-trusted-big-data-ml-scala-occlum
- ```
-
-```bash
-# Run new command in container
-sudo docker run -it \
---net=host \
---name=bigdl-ppml-trusted-big-data-ml-scala-occlum \
---cpuset-cpus 10-14 \
---device=/dev/sgx/enclave \
---device=/dev/sgx/provision \
--v /var/run/aesmd:/var/run/aesmd \
--v data:/opt/occlum_spark/data \
--e SGX_MEM_SIZE=24GB \
--e SGX_THREAD=512 \
--e SGX_HEAP=512MB \
--e SGX_KERNEL_HEAP=1GB \
--e ATTESTATION=false \
--e PCCS_URL=$PCCS_URL \ #1
--e ATTESTATION_URL=ESHM_IP:EHSM_PORT \ #2
--e APP_ID=your_app_id \ #3
--e API_KEY=your_api_key \ #4
--e CHALLENGE=cHBtbAo= \
--e REPORT_DATA=ppml \
--e SGX_LOG_LEVEL=off \
--e RUNTIME_ENV=native \
-intelanalytics/bigdl-ppml-trusted-big-data-ml-scala-occlum:2.2.0-SNAPSHOT \
-bash
-```
-
-4. 生成加解密相关的密钥
-```bash
-bash /opt/ehsm_entry.sh generatekeys $APP_ID $API_KEY
-```
-
-5. 用提供的generate_people_csv.py 生成原始输入数据
-```bash
-python generate_people_csv.py /opt/occlum_spark/data/people.csv <num_lines>
-```
-
-6. 用密钥加密原始输入数据
-```bash
-bash /opt/ehsm_entry.sh  encrypt $APP_ID $API_KEY /opt/occlum_spark/data/people.csv
-```
-
-7. 修改加密的文件后缀并移动到合适的位置
-```bash
-mv /opt/occlum_spark/data/people.csv.encrypted /opt/occlum_spark/data/encrypt/people.csv.encrypted.cbc
-```
-8. 运行 BigDL SimpleQuery e2e Example（同上开发步骤，已提前写好程序入口，程序源码已打成jar包）
-```bash
-bash /opt/run_spark_on_occlum_glibc.sh sql_e2e
-```
-9. 解密计算结果
-```bash
-bash /opt/ehsm_entry.sh decrypt $APP_ID $API_KEY /opt/occlum_spark/data/model/{result_file_name}.
-```
-
-注：需要把SparkContext换成PPMLContext（源码需改动），才能支持密钥管理，和应用自动加解密服务。其大致流程为：
-1.	应用通过PPMLContext读取加密文件
-2.	PPMLContext自动从指定的密钥管理服务获取解密密钥
-3.	应用解密数据并进行计算
-4.	应用将计算结果加密后，写入到存储系统
-
-### 3.2 构建部署生产应用image
-
-
-![编译和部署PPML应用](../images/ppml_scope.png)
-
-在开发新应用时，SGX程序程序在启动前需要经历occlum init和occlum build两个阶段，才能构建可执行的occlum instance（opt/occlum_spark文件夹，所有依赖和程序都存储在当中）。但是，将build放到部署环境中，会导致build阶段用到的用户密钥（user key）暴露到非安全环境中。为了进一步提高安全性，在实际部署中需要将build阶段和实际运行分开，既在安全环境中build所需的image，然后在部署和运行该image。
-在这个过程中，用户也可对BigDL image直接进行修改，加入自己的程序和配置（User image），并提前执行occlum init和build构建实际部署所需的image（Runnable image）。
-```bash
-# Production and Production-build and Customer image
-#BigDL image or production image
-docker pull  intelanalytics/bigdl-ppml-trusted-big-data-ml-scala-occlum-production:2.2.0
-
-#Runable image or production-build image
-docker pull  intelanalytics/bigdl-ppml-trusted-big-data-ml-scala-occlum-production:2.2.0-build
-
-#Small size Runable image  or customer image
-docker pull  intelanalytics/bigdl-ppml-trusted-big-data-ml-scala-occlum-production-customer:2.2.0-build
-```
-
-`intelanalytics/bigdl-ppml-trusted-big-data-ml-scala-occlum-production:2.2.0` image是提供给有定制docker image需求的客户的，下面以 pyspark sql example为例，说明如何定制化runnable image。
-
-1.	获取production image
-```bash
-docker pull intelanalytics/bigdl-ppml-trusted-big-data-ml-scala-occlum-production:2.2.0
-```
-
-2.	运行启动脚本进入容器内部
-```bash
-# Clean up old container 
-export container_name=bigdl-ppml-trusted-big-data-ml-scala-occlum-production 
-sudo docker rm -f $container_name 
- 
-# Run new command in container 
-sudo docker run -it \
-        --net=host \
-        --name=$container_name \
-        --cpuset-cpus 3-5 \
-        -e SGX_MEM_SIZE=30GB \
-        -e SGX_THREAD=2048 \
-        -e SGX_HEAP=1GB \
-        -e SGX_KERNEL_HEAP=1GB \
-        -e ENABLE_SGX_DEBUG=true \
-        -e ATTESTATION=true \
-        intelanalytics/bigdl-ppml-trusted-big-data-ml-scala-occlum-production:2.2.0 \
-        bash
-```
-
-3.	添加相关python源码（/opt/py-examples/）或jar包依赖($BIGDL_HOME/jars/)或python依赖(/opt/python-occlum/)。如添加sql_example.py到/opt/py-examples/目录下。
-
-4.	构建runnable occlum instance。这一步的作用是初始化occlum文件夹，并将源码和相关配置和依赖拷贝进/opt/occlum_spark中，并执行occlum build构建occlum runnable instance即production-build image。
-```bash
-bash /opt/run_spark_on_occlum_glibc.sh init
-```
-
-5.	退出后提交得到最终的runnable image。 intelanalytics/bigdl-ppml-trusted-big-data-ml-scala-occlum-production: 2.2.0-build 即不添加任何外部依赖的runnable image，可直接运行任意已有的example。
-```bash
-docker commit $container_name $container_name-build
-```
-
-得到的未定制的`intelanalytics/bigdl-ppml-trusted-big-data-ml-scala-occlum-production: 2.2.0-build`大小有14.2GB，其中仅有`/opt/occlum_spark`文件夹和少部分配置文件是运行时所需的，其余大多数是拷贝和编译产生的垃圾文件。可在 production-build image的基础上copy occlum runnable instance 并安装Occlum运行时依赖和其他一些依赖得到最终的customer image，其大小仅不到5GB，且其功能与production-build image基本相同，`intelanalytics/bigdl-ppml-trusted-big-data-ml-scala-occlum-production-customer:2.2.0-build`即不经过任何定制的customer image。（通过修改运行build-customer-image.sh文件构建customer image）
-
-Production-build 或 Customer image的attestation流程
-1.	配置PCCS和EHSM环境，注册得到app_id和api_key，启动任务时，增加相关环境变量（同上）。
-2.	验证ehsm是否可信
-```bash
-bash start-spark-local.sh verify
-```
-3.	离线注册occlum instance，得到 policy_Id
-```bash
-bash start-spark-local.sh register
-# policy_Id 28da128a-c572-4f5f-993c-6da10d5243f8
-```
-4.	在docker环境或者k8s环境设置policy_Id。
-```yaml
-#start-spark-local.sh
--e ${policy_Id}
-
-#driver.yaml and executor.yaml
-env:
-  - name: policy_Id
-    value: "${policy_Id}"
-```
-
-5.  在docker或k8s启动应用（同上），仅会在SGX中运行EHSM对应用程序进行验证（IV. attest MREnclave）。
-
-## 4. 背景知识
-
-### 4.1 Intel SGX
-
-英特尔软件防护扩展（英语：Intel Software Guard Extensions，SGX）是一组安全相关的指令，它被内置于一些现代Intel 中央处理器（CPU）中。它们允许用户态及内核态代码定义将特定内存区域，设置为私有区域，此区域也被称作飞地（Enclaves）。其内容受到保护，不能被本身以外的任何进程存取，包括以更高权限级别运行的进程。
-
-CPU对受SGX保护的内存进行加密处理。受保护区域的代码和数据的加解密操作在CPU内部动态（on the fly）完成。因此，处理器可以保护代码不被其他代码窥视或检查。SGX使用的威胁模型如下：Enclaves是可信的，但Enclaves之外的任何进程都不可信（包括操作系统本身和任何虚拟化管理程序），所有这些不可信的主体都被视为有存在恶意行为的风险。Enclaves之外代码的任何代码读取受保护区域，只能得到加密后的内容。
-
-SGX被设计用于实现安全远程计算、安全网页浏览和数字版权管理（DRM）。其他应用也包括保护专有算法和加密密钥。
-
-### 4.2 Occlum
-
-2014年正式成立的蚂蚁集团服务于超10亿用户，是全球领先的金融科技企业之一。蚂蚁集团一直积极探索隐私保护机器学习领域，并发起了开源项目 Occlum。Occlum 是用于英特尔® SGX 的内存安全多进程用户态操作系统（LibOS）。
-使用 Occlum 后，机器学习工作负载等只需修改极少量（甚至无需修改）源代码即可在英特尔® SGX 上运行，以高度透明的方式保护了用户数据的机密性和完整性。用于英特尔® SGX 的 Occlum 架构如图所示。
-
- 
-![Occlum架构](https://raw.githubusercontent.com/occlum/occlum/master/docs/images/arch_overview.png)
-
-Occlum有以下显著特征:
-* 高效的多任务处理。 Occlum提供轻量级LibOS流程:它们是轻量级的，因为所有LibOS流程共享同一个SGX enclave。 与重型、per-enclave的LibOS进程相比，Occlum的轻型LibOS进程在启动时最高快1000倍，在IPC上快3倍。 此外，如果需要，Occlum还提供了一个可选的多域软件故障隔离方案来隔离Occlum LibOS进程。
-* 支持多个文件系统。 支持多种类型的文件系统，如只读散列文件系统(用于完整性保护)、可写加密文件系统(用于机密保护)、内存文件系统，不受信任的主机文件系统(用于LibOS和主机操作系统之间方便的数据交换)等等，满足应用的各种文件I/O需求。
-* 内存安全。 Occlum是第一个用内存安全编程语言(Rust)编写的SGX LibOS。Rust语言是为追求内存安全，且不会带来额外的性能损耗的编程语言。因此，在Occlum中杜绝了低级的内存安全错误，对于托管安全关键的应用程序更值得信任。
-* 支持musl-libc和glibc应用，支持超过150个常用系统调用，绝大多数程序无需改动（甚至无需重新编译）或者只需少许改动即可运行在Occlum LibOS之上。
-* 支持多种语言开发的应用，包括但不限于c/c++，Java，Python，Go和Rust。
-* 易用性。 Occlum提供了类容器的用户友好的构建和命令行工具。 在SGX enclave内的Occlum上运行应用程序可以非常简单。
-
-### 4.3 BigDL PPML
-在Occlum提供的安全内存运行环境上，英特尔和蚂蚁集团基于BigDL构建了一个分布式的隐私保护机器学习（Privacy Preserving Machine Learning, PPML）平台，能够保护端到端（包括数据输入、数据分析、机器学习、深度学习等各个阶段）的分布式人工智能应用。
- 
-![BigDL PPML 软件栈](https://user-images.githubusercontent.com/61072813/177922914-f670111c-e174-40d2-b95a-aafe92485024.png)
-
-与传统的隐私计算框架不同，BigDL PPML提供了一个可以运行标准大数据应用的环境，希望帮助现有的大数据/分布式应用无缝的迁移到端到端安全的环境中，并且强化每个环节的安全性。在此基础上，PPML也提供了安全参数聚集、隐私求交和联邦学习等高阶功能，帮助行业客户打破数据孤岛，进一步实现数据赋能。
-以Apache Spark为例，通过BigDL PPML和Occlum提供的Spark in SGX功能，可以让现有的Spark应用，直接运行到SGX环境中，而不用做任何代码修改。受益于第三代至强平台提供的大容量SGX EPC，Spark的内存计算可以完全被SGX保护，并且可以根据数据规模进行横向拓展，从而轻松支持TB级别的数据规模；另一方面，负责完整性的远程证明功能，也被无感的添加到了整个流程中，应用开发者不需要显式的增加远程证明代码，即可通过Occlum和PPML提供的远程证明功能实现实例的远程证明和校验。
diff --git a/docs/readthedocs/source/doc/PPML/Overview/attestation_basic.md b/docs/readthedocs/source/doc/PPML/Overview/attestation_basic.md
deleted file mode 100644
index 887169e3..00000000
--- a/docs/readthedocs/source/doc/PPML/Overview/attestation_basic.md
+++ /dev/null
@@ -1,97 +0,0 @@
-# Ensure Integrity and Build Trust with Attestation
-
-The process of validating the integrity of a computing device such as a server needed for trusted computing. It is widely used in a Trusted Execution Environment (TEE) or Trusted Platform Module (TPM) for ensuring integrity and building trust.
-
-### Attestation Basic
-
-The basic idea of attestation is to verify:
-1. The platform is secured. Trusted Computing Base (TCB) is secured.
-2. Running in TEE/TPM.
-3. Application is as expected (same hash or HMAC).
-
-Local or remote attestation:
-
-* Verifying a local enclave (TEE env) on the same node/server is called local attestation.
-* Verifying a remote enclave on another node/server is called remote attestation.
-
-Due to platform differences, Intel SGX has 2 kinds of attestations:
-
-1. Elliptic Curve Digital Signature Algorithm (ECDSA) Attestation for 3rd generation Intel® Xeon® Scalable processors and selected Intel® Xeon® E3 processors.
-2. Intel® Enhanced Privacy ID (Intel® EPID) Attestation for desktop and Xeon E3 processors, and selected Intel® Xeon® E processor.
-
-*Note that SGX attestation mentioned in BigDL PPML should be ECDSA attestation with DCAP.*
-
-The basic workflow of attestation:
-
-```eval_rst
-.. mermaid::
-   
-   sequenceDiagram
-      Verifier->>App in SGX: Challenge(Prove YourSelf)
-      Note right of App in SGX: Generate Quote(Signed Context)
-      App in SGX->>Verifier: Evidence(App Quote)
-      Note left of Verifier: Verify Quote
-      Verifier ->>App in SGX: Response(Pass/Fail)
-```
-
-The key steps in attestation:
-* Quote Generation. Generate a Quote/Evidence with SDK/API. This quote is signed by a pre-defined key, and it cannot be modified. You can add 128bits user data into a SGX quote.
-* Quote Verification. Verify a Quote/Evidence with SDK/API. 
-
-### Attestation in E2E PPML applications
-
-Attestation is not hard if you are running a new written application. Because you can directly integrate `quote generation` and `quote verification` into your application code. However, if you are migrating an existing application, attestation may cause some additional effort. Especially, when you are running distributed applications like PPML applications in multi-nodes. That means you have to add attestation into your distributed applications or frameworks, e.g., add attestation when modules running on different nodes build connections.
-
-To avoid such changes, we can utilize a third-party attestation service to offload `quote verification` from your existing applications. This service will help us to verify if a running application is as expected.
-
-#### Attestation Service
-
-When working with an attestation service, we can define a policy/requirement for each application. During application initialization (server or worker), we can require each module to generate its quote and send it to an attestation service. This attestation service will check these quotes based on pre-defined policy/requirement, then send back responses (`success/fail`). If we get a `success` result, we keep starting this module. Otherwise, we just quit or kill this module.
-
-```eval_rst
-.. mermaid::
-   
-   graph TD
-      Admin --Policy--> as(Attestation Service)
-      subgraph Production Env/Cloud
-         sgxserver(Server in SGX) -.- sgxworker1
-         sgxserver(Server in SGX) -.- sgxworker2
-         sgxworker1(Worker1 in SGX)
-         sgxworker2(Worker2 in SGX)
-      end
-      sgxserver --Quote--> as
-      sgxworker1 --Quote--> as
-      sgxworker2 --Quote--> as
-      as --response-->sgxserver
-      as --response-->sgxworker1
-      as --response-->sgxworker2
-```
-
-With this attestation service design, we can avoid adding malicious applications or modules to distributed applications.
-
-#### Attestation Service from Cloud Service Provider (CSP)
-
-Azure provides an Attestation Service for applications running in TEE VM or containers provided by Azure. Before we submit our applications to a cloud service, we need to verify the identity and security posture of the platform. Azure Attestation receives evidence from the platform, validates it with security standards, evaluates it against configurable policies, and produces an attestation token for claims-based applications.
-
-The involved actors in Azure Attestation workflow:
-* Relying party: The component which relies on Azure Attestation to verify enclave validity.
-* Client: The component which collects information from an enclave and sends requests to Azure Attestation.
-* Azure Attestation: The component which accepts enclave evidence from client, validates it and returns attestation token to the client
-
-![Azure Attestation Workflow](https://learn.microsoft.com/en-us/azure/attestation/media/sgx-validation-flow.png)
-
-Here are the general steps in a typical SGX enclave attestation workflow (using Azure Attestation): The client collect the evidence from the enclave by generating a quote and send it to an URI which refers to an instance of Azure Attestation. Azure Attestation validates the submitted information and evaluates it against a configured policy. If the verification succeeds, Azure Attestation issues an attestation token and returns it to the client. The client sends the attestation token to relying party. The relying party calls public key metadata endpoint of Azure Attestation to retrieve signing certificates. The relying party then verifies the signature of the attestation token and ensures the enclave trustworthiness. 
-
-### Advanced Usage
-
-During remote attestation, the attestation protocol will build a secure channel. It can help build [TLS connection with integrity](https://arxiv.org/pdf/1801.05863.pdf). Meanwhile, attestation can be [integrated with HTTP protocol to provide trusted end-to-end web service](https://arxiv.org/abs/2205.01052).
-
-### References
-
-1. https://sgx101.gitbook.io/sgx101/sgx-bootstrap/attestation
-2. https://www.intel.com/content/www/us/en/developer/articles/technical/quote-verification-attestation-with-intel-sgx-dcap.html
-3. https://download.01.org/intel-sgx/sgx-dcap/1.9/linux/docs/Intel_SGX_DCAP_ECDSA_Orientation.pdf
-4. https://azure.microsoft.com/en-us/products/azure-attestation/
-5. https://en.wikipedia.org/wiki/Trusted_Computing
-6. [Integrating Intel SGX Remote Attestation with Transport Layer Security](https://arxiv.org/pdf/1801.05863.pdf)
-7. [HTTPA/2: a Trusted End-to-End Protocol for Web Services](https://arxiv.org/abs/2205.01052)
diff --git a/docs/readthedocs/source/doc/PPML/Overview/azure_ppml.md b/docs/readthedocs/source/doc/PPML/Overview/azure_ppml.md
deleted file mode 100644
index d8dee10f..00000000
--- a/docs/readthedocs/source/doc/PPML/Overview/azure_ppml.md
+++ /dev/null
@@ -1,543 +0,0 @@
-# Privacy Preserving Machine Learning (PPML) on Azure User Guide
-
-## 1. Introduction
-Protecting privacy and confidentiality is critical for large-scale data analysis and machine learning. BigDL ***PPML*** combines various low-level hardware and software security technologies (e.g., [Intel® Software Guard Extensions (Intel® SGX)](https://www.intel.com/content/www/us/en/architecture-and-technology/software-guard-extensions.html), [Library Operating System (LibOS)](https://events19.linuxfoundation.org/wp-content/uploads/2017/12/Library-OS-is-the-New-Container-Why-is-Library-OS-A-Better-Option-for-Compatibility-and-Sandboxing-Chia-Che-Tsai-UC-Berkeley.pdf) such as [Graphene](https://github.com/gramineproject/graphene) and [Occlum](https://github.com/occlum/occlum), [Federated Learning](https://en.wikipedia.org/wiki/Federated_learning), etc.), so that users can continue to apply standard Big Data and AI technologies (such as Apache Spark, Apache Flink, Tensorflow, PyTorch, etc.) without sacrificing privacy.
-
-BigDL PPML on Azure solution integrate BigDL ***PPML*** technology with Azure Services(Azure Kubernetes Service, Azure Storage Account, Azure Key Vault, etc.) to facilitate Azure customer to create Big Data and AI applications while getting high privacy and confidentiality protection.
-
-### Overall Architecture
-![](../images/ppml_azure_latest.png)
-
-### End-to-End Workflow
-![](../images/ppml_azure_workflow.png)
-
-## 2. Setup
-### 2.1 Install Azure CLI
-Before you setup your environment, please install Azure CLI on your machine according to [Azure CLI guide](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli).
-
-Then run `az login` to login to Azure system before you run the following Azure commands.
-
-### 2.2 Create Azure Linux VM for hosting BigDL PPML image
-#### 2.2.1 Create Resource Group
-On your machine, create resource group or use your existing resource group. Example code to create resource group with Azure CLI:
-```
-az group create \
-    --name myResourceGroup \
-    --location myLocation \
-    --output none
-```
-
-#### 2.2.2 Create Linux VM with SGX support
-Create Linux VM through Azure [CLI](https://docs.microsoft.com/en-us/azure/developer/javascript/tutorial/nodejs-virtual-machine-vm/create-linux-virtual-machine-azure-cli)/[Portal](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/quick-create-portal)/Powershell.
-For size of the VM, please choose DCSv3 Series VM with more than 4 vCPU cores.
-
-#### 2.2.3 Start AESM service on Linux VM
-* ubuntu 20.04
-```bash
-echo 'deb [arch=amd64] https://download.01.org/intel-sgx/sgx_repo/ubuntu focal main' | tee /etc/apt/sources.list.d/intelsgx.list
-wget -qO - https://download.01.org/intel-sgx/sgx_repo/ubuntu/intel-sgx-deb.key | apt-key add -
-sudo apt update
-apt-get install libsgx-dcap-ql
-apt install sgx-aesm-service
-```
-* ubuntu 18.04
-```bash
-echo 'deb [arch=amd64] https://download.01.org/intel-sgx/sgx_repo/ubuntu bionic main' | tee /etc/apt/sources.list.d/intelsgx.list
-wget -qO - https://download.01.org/intel-sgx/sgx_repo/ubuntu/intel-sgx-deb.key | apt-key add -
-sudo apt update
-apt-get install libsgx-dcap-ql
-apt install sgx-aesm-service
-```
-
-#### 2.2.4 Pull BigDL PPML image and run on Linux VM
-* Go to Azure Marketplace, search "BigDL PPML" and find `BigDL PPML: Secure Big Data AI on Intel SGX` product. Click "Create" button which will lead you to `Subscribe` page.
-On `Subscribe` page, input your subscription, your Azure container registry, your resource group and your location. Then click `Subscribe` to subscribe BigDL PPML to your container registry.
-
-* Go to your Azure container regsitry (i.e. myContainerRegistry), check `Repostirories`, and find `intel_corporation/bigdl-ppml-trusted-bigdata-gramine`
-* Login to the created VM. Then login to your Azure container registry, pull BigDL PPML image as needed.
-  * If you want to run with 16G SGX memory, you can pull the image as below:
-    ```bash
-    docker pull myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-bigdata-gramine:2.3.0-SNAPSHOT-16g
-    ```
-  * If you want to run with 32G SGX memory, you can pull the image as below:
-    ```bash
-    docker pull myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-bigdata-gramine:2.3.0-SNAPSHOT-32g
-    ```
-  * If you want to run with 64G SGX memory, you can pull the image as below:
-    ```bash
-    docker pull myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-bigdata-gramine:2.3.0-SNAPSHOT-64g
-    ```
-* Start container of this image
-  The example script to start the image is as below:
-  ```bash
-  #!/bin/bash
-
-  export LOCAL_IP=YOUR_LOCAL_IP
-  export DOCKER_IMAGE=myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-bigdata-gramine:2.3.0-SNAPSHOT-16g
-
-  sudo docker run -itd \
-      --privileged \
-      --net=host \
-      --cpuset-cpus="0-5" \
-      --oom-kill-disable \
-      --device=/dev/gsgx \
-      --device=/dev/sgx/enclave \
-      --device=/dev/sgx/provision \
-      -v /var/run/aesmd/aesm.socket:/var/run/aesmd/aesm.socket \
-      --name=spark-local \
-      -e LOCAL_IP=$LOCAL_IP \
-      $DOCKER_IMAGE bash
-  ```
-
-### 2.3 Create AKS(Azure Kubernetes Services) or use existing AKs
-First, login to your client VM and enter your BigDL PPML container:
-```bash
-docker exec -it spark-local bash
-```
-Then run `az login` to login to Azure system.
-
-Create AKS or use existing AKS with Intel SGX support.
-In your BigDL PPML container, you can run `/ppml/trusted-big-data-ml/azure/create-aks.sh` to create AKS with confidential computing support.
-
-Note: Please use the same VNet information of your client to create AKS. And use DC-Series VM size(i.e.Standard_DC8ds_v3) to create AKS.
-```bash
-/ppml/trusted-big-data-ml/azure/create-aks.sh \
-    --resource-group myResourceGroup \
-    --vnet-resource-group myVnetResourceGroup \
-    --vnet-name myVnetName \
-    --subnet-name mySubnetName \
-    --cluster-name myAKSName \
-    --vm-size myAKSNodeVMSize \
-    --node-count myAKSInitNodeCount
-
-```
-You can check the information by running:
-```bash
-/ppml/trusted-big-data-ml/azure/create-aks.sh --help
-```
-
-### 2.4 Create Azure Data Lake Store Gen 2
-#### 2.4.1 Create Data Lake Storage account or use an existing one.
-The example command to create Data Lake store is as below:
-```bash
-az dls account create --account myDataLakeAccount --location myLocation --resource-group myResourceGroup
-```
-* Create Container to put user data
-
-  Example command to create container
-  ```bash
-  az storage fs create -n myFS --account-name myDataLakeAccount --auth-mode login
-  ```
-* Create folder, upload file/folder
-
-  Example command to create folder
-  ```bash
-  az storage fs directory create -n myDirectory -f myFS --account-name myDataLakeAccount --auth-mode login
-  ```
-  Example command to upload file
-  ```bash
-  az storage fs file upload -s "path/to/file" -p myDirectory/file  -f  myFS --account-name myDataLakeAccount --auth-mode login
-  ```
-  Example command to upload directory
-  ```bash
-  az storage fs directory upload -f myFS --account-name myDataLakeAccount -s "path/to/directory" -d myDirectory --recursive
-  ```
-#### 2.4.2  Access data in Hadoop through ABFS(Azure Blob Filesystem) driver
-You can access Data Lake Storage in Hadoop filesytem by such URI:  ```abfs[s]://file_system@account_name.dfs.core.windows.net/<path>/<path>/<file_name>```
-##### Authentication
-The ABFS driver supports two forms of authentication so that the Hadoop application may securely access resources contained within a Data Lake Storage Gen2 capable account.
-- Shared Key: This permits users to access to ALL resources in the account. The key is encrypted and stored in Hadoop configuration.
-
-- Azure Active Directory OAuth Bearer Token: Azure AD bearer tokens are acquired and refreshed by the driver using either the identity of the end user or a configured Service Principal. Using this authentication model, all access is authorized on a per-call basis using the identity associated with the supplied token and evaluated against the assigned POSIX Access Control List (ACL).
-
-By default, in our solution, we use shared key authentication.
-- Get Access key list of the storage account:
-  ```bash
-  az storage account keys list -g MyResourceGroup -n myDataLakeAccount
-  ```
-Use one of the keys for authentication.
-
-### 2.5 Create Azure Key Vault
-#### 2.5.1 Create or use an existing Azure Key Vault
-Example command to create key vault
-```bash
-az keyvault create -n myKeyVault -g myResourceGroup -l location
-```
-Take note of the following properties for use in the next section:
-
-* The name of the secret object in the key vault
-* The object type (secret, key, or certificate)
-* The name of your Azure key vault resource
-* The Azure tenant ID that the subscription belongs to
-
-#### 2.5.2 Set access policy for the client VM
-* Run such command to get the system identity:
-  ```bash
-  az vm identity assign -g myResourceGroup -n myVM
-  ```
-  The output would be like this:
-  ```bash
-  {
-    "systemAssignedIdentity": "ff5505d6-8f72-4b99-af68-baff0fbd20f5",
-    "userAssignedIdentities": {}
-  }
-  ```
-  Take note of the systemAssignedIdentity of the client VM.
-
-* Set access policy for client VM
-
-  Example command:
-  ```bash
-  az keyvault set-policy --name myKeyVault --object-id <mySystemAssignedIdentity> --secret-permissions all --key-permissions all unwrapKey wrapKey
-  ```
-
-#### 2.5.3 AKS access Key Vault
-##### 2.5.3.1 Set access for AKS VM ScaleSet
-###### a. Find your VM ScaleSet in your AKS, and assign system managed identity to VM ScaleSet.
-```bash
-az vm identity assign -g myResourceGroup -n myAKSVMSS
-```
-The output would be like below:
-```bash
-principalId: xxxxxxxxx
-tenantId: xxxxxxxxxxx
-type: SystemAssigned, UserAssigned
-userAssignedIdentities:
-  ? /subscriptions/xxxx/resourceGroups/xxxxx/providers/Microsoft.ManagedIdentity/userAssignedIdentities/bigdl-ks-agentpool
-  : clientId: xxxxxx
-    principalId: xxxxx
-```
-Take note of principalId of the first line as System Managed Identity of your VMSS.
-###### b. Set access policy for AKS VM ScaleSet
-Example command:
-```bash
-az keyvault set-policy --name myKeyVault --object-id <systemManagedIdentityOfVMSS> --secret-permissions get --key-permissions get unwrapKey
-```
-
-## 3. Run Spark PPML jobs
-Login to your client VM and enter your BigDL PPML container:
-```bash
-docker exec -it spark-local bash
-```
-Then run `az login` to login to Azure system.
-### 3.1 Save kubeconfig to secret
-Login to AKS use such command:
-```bash
-az aks get-credentials --resource-group  myResourceGroup --name myAKSCluster
-```
-Run such script to save kubeconfig to secret
-```bash
-/ppml/trusted-big-data-ml/azure/kubeconfig-secret.sh
-```
-### 3.2 Generate keys
-Run such scripts to generate keys:
-```bash
-/ppml/trusted-big-data-ml/azure/generate-keys-az.sh
-```
-When entering the passphrase or password, you could input the same password by yourself; and these passwords could also be used for the next step of generating other passwords. Password should be longer than 6 bits and contain numbers and letters, and one sample password is "3456abcd". These passwords would be used for future remote attestations and to start SGX enclaves more securely.
-
-After generate keys, run such command to save keys in Kubernetes.
-```
-kubectl apply -f /ppml/trusted-big-data-ml/work/keys/keys.yaml
-```
-### 3.3 Generate password
-Run such script to save the password to Azure Key Vault
-```bash
-/ppml/trusted-big-data-ml/azure/generate-password-az.sh myKeyVault used_password_when_generate_keys
-```
-### 3.4 Create the RBAC
-```bash
-kubectl create serviceaccount spark
-kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
-```
-### 3.5 Create image pull secret from your Azure container registry
-  * If you already logged in to your Azure container registry, find your docker config json file (i.e. ~/.docker/config.json), and create secret for your registry credential like below:
-  ```bash
-  kubectl create secret generic regcred \
-  --from-file=.dockerconfigjson=<path/to/.docker/config.json> \
-  --type=kubernetes.io/dockerconfigjson
-  ```
-  * If you haven't logged in to your Azure container registry, you can create secret for your registry credential using your username and password:
-  ```bash
-  kubectl create secret docker-registry regcred --docker-server=myContainerRegistry.azurecr.io --docker-username=<your-name> --docker-password=<your-pword> --docker-email=<your-email>
-  ```
-
-### 3.6 Add image pull secret to service account
-```bash
-kubectl patch serviceaccount spark -p '{"imagePullSecrets": [{"name": "regcred"}]}'
-```
-
-### 3.7 (Optional) Enable Microsoft Azure Attestation
-First, upload `appid` and `apikey` as place-holder. The `appid` and `apikey` will not actually effect but they should be non-empty otherwise the attestation workflow would throw a value-missing error.
-```bash
-kubectl create secret generic kms-secret \
-                  --from-literal=app_id=YOUR_APP_ID \
-                  --from-literal=api_key=YOUR_API_KEY 
-```
-Then configure attestation related environment variable in the driver-template and executor-template. 
-Here is an example for `spark-driver-template-az.yaml`:
-```yaml
-apiVersion: v1
-kind: Pod
-spec:
-  containers:
-  - name: spark-driver
-    securityContext:
-      privileged: true
-    env:
-      - name: ATTESTATION
-        value: true
-      - name: ATTESTATION_URL
-       value: your_attestation_url # e.g. https://sharedcus.cus.attest.azure.net
-      - name: APP_ID
-       valueFrom:
-         secretKeyRef:
-           name: kms-secret # consistent with the above
-           key: app_id
-      - name: API_KEY
-       valueFrom:
-         secretKeyRef:
-           name: kms-secret
-           key: api_key
-      - name: ATTESTATION_TYPE
-       value: AzureAttestationService
-      - name: QUOTE_TYPE
-       value: gramine
-...
-```
-
-And similar configures should be applied to `spark-executor-template-az.yaml` too.
-
-### 3.8 Run PPML spark job
-The example script to run PPML spark job on AKS is as below. You can also refer to `/ppml/trusted-big-data-ml/azure/submit-spark-sgx-az.sh`
-```bash
-export RUNTIME_DRIVER_MEMORY=8g
-export RUNTIME_DRIVER_PORT=54321
-
-RUNTIME_SPARK_MASTER=
-AZ_CONTAINER_REGISTRY=myContainerRegistry
-BIGDL_VERSION=2.3.0-SNAPSHOT
-SGX_MEM=16g
-SPARK_EXTRA_JAR_PATH=
-SPARK_JOB_MAIN_CLASS=
-ARGS=
-DATA_LAKE_NAME=
-DATA_LAKE_ACCESS_KEY=
-KEY_VAULT_NAME=
-PRIMARY_KEY_PATH=
-DATA_KEY_PATH=
-
-export secure_password=`az keyvault secret show --name "key-pass" --vault-name $KEY_VAULT_NAME --query "value" | sed -e 's/^"//' -e 's/"$//'`
-
-bash bigdl-ppml-submit.sh \
-    --master $RUNTIME_SPARK_MASTER \
-    --deploy-mode client \
-    --sgx-enabled true \
-    --sgx-driver-jvm-memory 2g \
-    --sgx-executor-jvm-memory 7g \
-    --driver-memory 8g \
-    --driver-cores 4 \
-    --executor-memory 18g \
-    --executor-cores 4 \
-    --num-executors 2 \
-    --conf spark.cores.max=8 \
-    --name spark-decrypt-sgx \
-    --conf spark.kubernetes.container.image=$AZ_CONTAINER_REGISTRY.azurecr.io/intel_corporation/bigdl-ppml-trusted-bigdata-gramine:$BIGDL_VERSION-$SGX_MEM \
-    --driver-template /ppml/trusted-big-data-ml/azure/spark-driver-template-az.yaml \
-    --executor-template /ppml/trusted-big-data-ml/azure/spark-executor-template-az.yaml \
-    --jars local://$SPARK_EXTRA_JAR_PATH \
-    --conf spark.hadoop.fs.azure.account.auth.type.${DATA_LAKE_NAME}.dfs.core.windows.net=SharedKey \
-    --conf spark.hadoop.fs.azure.account.key.${DATA_LAKE_NAME}.dfs.core.windows.net=${DATA_LAKE_ACCESS_KEY} \
-    --conf spark.hadoop.fs.azure.enable.append.support=true \
-    --conf spark.bigdl.kms.type=AzureKeyManagementService \
-    --conf spark.bigdl.kms.azure.vault=$KEY_VAULT_NAME \
-    --conf spark.bigdl.kms.key.primary=$PRIMARY_KEY_PATH \
-    --conf spark.bigdl.kms.key.data=$DATA_KEY_PATH \
-    --class $SPARK_JOB_MAIN_CLASS \
-    --verbose \
-    $SPARK_EXTRA_JAR_PATH \
-    $ARGS
-```
-### 3.9 Run simple query python example
-This is an example script to run simple query python example job on AKS with data stored in Azure data lake store.
-```bash
-export RUNTIME_DRIVER_MEMORY=6g
-export RUNTIME_DRIVER_PORT=54321
-
-RUNTIME_SPARK_MASTER=
-AZ_CONTAINER_REGISTRY=myContainerRegistry
-BIGDL_VERSION=2.3.0-SNAPSHOT
-SGX_MEM=16g
-SPARK_VERSION=3.1.3
-
-DATA_LAKE_NAME=
-DATA_LAKE_ACCESS_KEY=
-INPUT_DIR_PATH=xxx@$DATA_LAKE_NAME.dfs.core.windows.net/xxx
-KEY_VAULT_NAME=
-PRIMARY_KEY_PATH=
-DATA_KEY_PATH=
-
-export secure_password=`az keyvault secret show --name "key-pass" --vault-name $KEY_VAULT_NAME --query "value" | sed -e 's/^"//' -e 's/"$//'`
-
-bash bigdl-ppml-submit.sh \
-    --master $RUNTIME_SPARK_MASTER \
-    --deploy-mode client \
-    --sgx-enabled true \
-    --sgx-driver-jvm-memory 2g \
-    --sgx-executor-jvm-memory 7g \
-    --driver-memory 6g \
-    --driver-cores 4 \
-    --executor-memory 24g \
-    --executor-cores 2 \
-    --num-executors 1 \
-    --name simple-query-sgx \
-    --conf spark.kubernetes.container.image=$AZ_CONTAINER_REGISTRY.azurecr.io/intel_corporation/bigdl-ppml-trusted-bigdata-gramine:$BIGDL_VERSION-$SGX_MEM \
-    --driver-template /ppml/trusted-big-data-ml/azure/spark-driver-template-az.yaml \
-    --executor-template /ppml/trusted-big-data-ml/azure/spark-executor-template-az.yaml \
-    --conf spark.hadoop.fs.azure.account.auth.type.${DATA_LAKE_NAME}.dfs.core.windows.net=SharedKey \
-    --conf spark.hadoop.fs.azure.account.key.${DATA_LAKE_NAME}.dfs.core.windows.net=${DATA_LAKE_ACCESS_KEY} \
-    --conf spark.hadoop.fs.azure.enable.append.support=true \
-    --properties-file /ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/conf/spark-bigdl.conf \
-    --conf spark.executor.extraClassPath=/ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/*:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/jars/* \
-    --conf spark.driver.extraClassPath=/ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/*:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/jars/* \
-    --py-files /ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/python/bigdl-ppml-spark_$SPARK_VERSION-$BIGDL_VERSION-python-api.zip,/ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/python/bigdl-spark_$SPARK_VERSION-$BIGDL_VERSION-python-api.zip,/ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/python/bigdl-dllib-spark_$SPARK_VERSION-$BIGDL_VERSION-python-api.zip \
-    /ppml/trusted-big-data-ml/work/examples/simple_query_example.py \
-    --kms_type AzureKeyManagementService \
-    --azure_vault $KEY_VAULT_NAME \
-    --primary_key_path $PRIMARY_KEY_PATH \
-    --data_key_path $DATA_KEY_PATH \
-    --input_encrypt_mode aes/cbc/pkcs5padding \
-    --output_encrypt_mode plain_text \
-    --input_path $INPUT_DIR_PATH/people.csv \
-    --output_path $INPUT_DIR_PATH/simple-query-result.csv
-```
-## 4. Run TPC-H example
-TPC-H queries are implemented using Spark DataFrames API running with BigDL PPML.
-
-### 4.1 Generating tables
-
-Go to [TPC Download](https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp) site, choose `TPC-H` source code, then download the TPC-H toolkits.
-After you download the TPC-h tools zip and uncompressed the zip file. Go to `dbgen` directory and create a makefile based on `makefile.suite`, and run `make`.
-
-This should generate an executable called `dbgen`.
-
-```
-./dbgen -h
-```
-
-`dbgen` gives you various options for generating the tables. The simplest case is running:
-
-```
-./dbgen
-```
-which generates tables with extension `.tbl` with scale 1 (default) for a total of rougly 1GB size across all tables. For different size tables you can use the `-s` option:
-```
-./dbgen -s 10
-```
-will generate roughly 10GB of input data.
-
-### 4.2 Generate primary key and data key
-Generate primary key and data key, then save to file system.
-
-The example code for generating the primary key and data key is like below:
-
-```bash
-BIGDL_VERSION=2.3.0-SNAPSHOT
-SPARK_VERSION=3.1.3
-java -cp /ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/*:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/conf/:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/jars/* \
-    -Xmx10g \
-    com.intel.analytics.bigdl.ppml.examples.GenerateKeys \
-    --kmsType AzureKeyManagementService \
-    --vaultName xxx \
-    --primaryKeyPath xxx/keys/primaryKey \
-    --dataKeyPath xxx/keys/dataKey
-```
-
-### 4.3 Encrypt Data
-Encrypt data with specified BigDL `AzureKeyManagementService`
-
-The example code of encrypting data is like below:
-
-```bash
-BIGDL_VERSION=2.3.0-SNAPSHOT
-SPARK_VERSION=3.1.3
-java -cp /ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/*:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/conf/:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/jars/* \
-    -Xmx10g \
-    com.intel.analytics.bigdl.ppml.examples.tpch.EncryptFiles \
-    --kmsType AzureKeyManagementService \
-    --vaultName xxx \
-    --primaryKeyPath xxx/keys/primaryKey \
-    --dataKeyPath xxx/keys/dataKey \
-    --inputPath xxx/dbgen \
-    --outputPath xxx/dbgen-encrypted
-```
-
-After encryption, you may upload encrypted data to Azure Data Lake store.
-
-The example script is like below:
-
-```bash
-az storage fs directory upload -f myFS --account-name myDataLakeAccount -s xxx/dbgen-encrypted -d myDirectory --recursive
-```
-
-### 4.4 Running
-Make sure you set the INPUT_DIR and OUTPUT_DIR in `TpchQuery` class before compiling to point to the
-location of the input data and where the output should be saved.
-
-The example script to run a query is like:
-
-```bash
-export RUNTIME_DRIVER_MEMORY=8g
-export RUNTIME_DRIVER_PORT=54321
-
-export secure_password=`az keyvault secret show --name "key-pass" --vault-name $KEY_VAULT_NAME --query "value" | sed -e 's/^"//' -e 's/"$//'`
-
-RUNTIME_SPARK_MASTER=
-AZ_CONTAINER_REGISTRY=myContainerRegistry
-BIGDL_VERSION=2.3.0-SNAPSHOT
-SGX_MEM=16g
-SPARK_VERSION=3.1.3
-
-DATA_LAKE_NAME=
-DATA_LAKE_ACCESS_KEY=
-KEY_VAULT_NAME=
-PRIMARY_KEY_PATH=
-DATA_KEY_PATH=
-INPUT_DIR=xxx/dbgen-encrypted
-OUTPUT_DIR=xxx/output
-
-bash bigdl-ppml-submit.sh \
-    --master $RUNTIME_SPARK_MASTER \
-    --deploy-mode client \
-    --sgx-enabled true \
-    --sgx-driver-jvm-memory 2g \
-    --sgx-executor-jvm-memory 7g \
-    --driver-memory 8g \
-    --driver-cores 4 \
-    --executor-memory 18g \
-    --executor-cores 4 \
-    --num-executors 2 \
-    --conf spark.cores.max=8 \
-    --name spark-tpch-sgx \
-    --conf spark.kubernetes.container.image=$AZ_CONTAINER_REGISTRY.azurecr.io/intel_corporation/bigdl-ppml-trusted-bigdata-gramine:$BIGDL_VERSION-$SGX_MEM \
-    --driver-template /ppml/trusted-big-data-ml/azure/spark-driver-template-az.yaml \
-    --executor-template /ppml/trusted-big-data-ml/azure/spark-executor-template-az.yaml \
-    --conf spark.sql.auto.repartition=true \
-    --conf spark.default.parallelism=400 \
-    --conf spark.sql.shuffle.partitions=400 \
-    --conf spark.hadoop.fs.azure.account.auth.type.${DATA_LAKE_NAME}.dfs.core.windows.net=SharedKey \
-    --conf spark.hadoop.fs.azure.account.key.${DATA_LAKE_NAME}.dfs.core.windows.net=${DATA_LAKE_ACCESS_KEY} \
-    --conf spark.hadoop.fs.azure.enable.append.support=true \
-    --conf spark.bigdl.kms.type=AzureKeyManagementService \
-    --conf spark.bigdl.kms.azure.vault=$KEY_VAULT_NAME \
-    --conf spark.bigdl.kms.key.primary=$PRIMARY_KEY_PATH \
-    --conf spark.bigdl.kms.key.data=$DATA_KEY_PATH \
-    --class com.intel.analytics.bigdl.ppml.examples.tpch.TpchQuery \
-    --verbose \
-    local:///ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/bigdl-ppml-spark_$SPARK_VERSION-$BIGDL_VERSION.jar \
-    $INPUT_DIR $OUTPUT_DIR aes/cbc/pkcs5padding plain_text [QUERY]
-```
-
-INPUT_DIR is the TPC-H's data dir.
-OUTPUT_DIR is the dir to write the query result.
-The optional parameter [QUERY] is the number of the query to run e.g 1, 2, ..., 22
diff --git a/docs/readthedocs/source/doc/PPML/Overview/azure_ppml_occlum.md b/docs/readthedocs/source/doc/PPML/Overview/azure_ppml_occlum.md
deleted file mode 100644
index 5933b995..00000000
--- a/docs/readthedocs/source/doc/PPML/Overview/azure_ppml_occlum.md
+++ /dev/null
@@ -1,149 +0,0 @@
-# BigDL PPML Azure Occlum Example
-
-## Overview
-
-This documentation demonstrates how to run standard Apache Spark applications with BigDL PPML and Occlum on Azure Intel SGX enabled Confidential Virtual machines ([DCsv3](https://docs.microsoft.com/en-us/azure/virtual-machines/dcv3-series) or [Azure Kubernetes Service (AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service/)). These Azure Virtual Machines include the Intel SGX extensions.
-
-Key points:
-
-* Azure Cloud Services:  
-    * [Azure Data Lake Storage](https://azure.microsoft.com/en-us/services/storage/data-lake-storage/): a secure cloud storage platform that provides scalable, cost-effective storage for big data analytics. 
-    * [Key Vault](https://azure.microsoft.com/en-us/services/key-vault/): Safeguard cryptographic keys and other secrets used by cloud apps and services. Although, this solution works for all Azure Key Valut types, it is recommended to use [Azure Key Vault Managed HSM](https://learn.microsoft.com/en-us/azure/key-vault/managed-hsm/overview) (FIPS 140-2 Level 3) for better safety.
-    * [Attestation Service](https://azure.microsoft.com/en-us/services/azure-attestation/): A unified solution for remotely verifying the trustworthiness of a platform and integrity of the binaries running inside it.
-
-    ![Distributed Spark in SGX on Azure](../images/spark_sgx_azure.png)
-
-* Occlum: Occlum is a memory-safe, multi-process library OS (LibOS) for Intel SGX. As a LibOS, it enables legacy applications to run on Intel® SGX with little to no modifications of source code, thus protecting the confidentiality and integrity of user workloads transparently.
-
-    ![Microsoft Azure Attestation on Azure](../images/occlum_maa.png)
-
-* For Azure attestation details in Occlum init process please refer to [`maa_init`](https://github.com/occlum/occlum/tree/master/demos/remote_attestation/azure_attestation/maa_init).
-
-## Prerequisites
-
-* Set up Azure VM on Azure
-    * Create a [DCSv3](https://docs.microsoft.com/en-us/azure/virtual-machines/dcv3-series) VM for [single node spark example](#single-node-spark-examples-on-azure).
-    * Prepare image of Spark (Required for distributed Spark examples only)
-      * Login to the created VM, then download [Spark 3.1.2](https://archive.apache.org/dist/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz) and extract Spark binary. Install OpenJDK-8, and `export SPARK_HOME=${Spark_Binary_dir}`.
-    * Go to Azure Marketplace, search "BigDL PPML" and find `BigDL PPML: Secure Big Data AI on Intel SGX (experimental and reference only, Occlum Edition)` product. Click "Create" button which will lead you to `Subscribe` page.
-      On `Subscribe` page, input your subscription, your Azure container registry, your resource group and your location. Then click `Subscribe` to subscribe BigDL PPML Occlum to your container registry.
-    * On the created VM, login to your Azure container registry, then pull BigDL PPML Occlum image using this command:
-    ```bash
-    docker pull myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-azure-occlum:latest
-    ```
-* Set up [Azure Kubernetes Service (AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service/) for [distributed Spark examples](#distributed-spark-example-on-aks).
-  * Follow the [guide](https://learn.microsoft.com/en-us/azure/confidential-computing/confidential-enclave-nodes-aks-get-started) to deploy an AKS with confidential computing Intel SGX nodes.
-  * Install Azure CLI on the created VM or your local machine according to [Azure CLI guide](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli).
-  * Login to AKS with such command:
-  ```bash
-  az aks get-credentials --resource-group  myResourceGroup --name myAKSCluster
-  ```
-  * Create image pull secret from your Azure container registry
-      * If you already logged in to your Azure container registry, find your docker config json file (i.e. ~/.docker/config.json), and create secret for your registry credential like below:
-      ```bash
-      kubectl create secret generic regcred \
-      --from-file=.dockerconfigjson=<path/to/.docker/config.json> \
-      --type=kubernetes.io/dockerconfigjson
-      ```
-      * If you haven't logged in to your Azure container registry, you can create secret for your registry credential using your username and password:
-      ```bash
-      kubectl create secret docker-registry regcred --docker-server=myContainerRegistry.azurecr.io --docker-username=<your-name> --docker-password=<your-pword> --docker-email=<your-email>
-      ```
-  * Create the RBAC to AKS
-    ```bash
-    kubectl create serviceaccount spark
-    kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
-    ```
-  * Add image pull secret to service account
-    ```bash
-    kubectl patch serviceaccount spark -p '{"imagePullSecrets": [{"name": "regcred"}]}'
-    ```
-  
-## Single Node Spark Examples on Azure
-### SparkPi example
-
-On the VM, Run the SparkPi example with `run_spark_on_occlum_glibc.sh`.
-
-```bash
-docker run --rm -it \
-    --name=azure-ppml-example-with-occlum \
-    --device=/dev/sgx/enclave \
-    --device=/dev/sgx/provision \
-    myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-azure-occlum:latest bash 
-cd /opt
-bash run_spark_on_occlum_glibc.sh pi
-```
-
-### Nytaxi example with Azure NYTaxi
-
-On the VM, run the Nytaxi example with `run_azure_nytaxi.sh`.
-
-```bash
-docker run --rm -it \
-    --name=azure-ppml-example-with-occlum \
-    --device=/dev/sgx/enclave \
-    --device=/dev/sgx/provision \
-    myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-azure-occlum:latest bash 
-bash run_azure_nytaxi.sh
-```
-
-You should get Nytaxi dataframe count and aggregation duration when succeed.
-
-## Distributed Spark Examples on AKS
-Clone the repository to the VM:
-```bash
-git clone https://github.com/intel-analytics/BigDL-PPML-Azure-Occlum-Example.git
-```
-### SparkPi on AKS
-
-In `run_spark_pi.sh` script, update `IMAGE` variable to `myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-azure-occlum:latest`, and configure your AKS address. In addition, configure environment variables in `driver.yaml` and `executor.yaml` too. Then you can submit SparkPi task with `run_spark_pi.sh`.
-
-```bash
-bash run_spark_pi.sh
-```
-
-### Nytaxi on AKS
-
-In `run_nytaxi_k8s.sh` script, update `IMAGE` variable to `myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-azure-occlum:latest`, and configure your AKS address. In addition, configure environment variables in `driver.yaml` and `executor.yaml` too. Then you can submit Nytaxi query task with `run_nytaxi_k8s.sh`.
-```bash
-bash run_nytaxi_k8s.sh
-```
-
-## Known issues
-
-1. If you meet the following error when running the docker image:
-
-    ```bash
-    aesm_service[10]: Failed to set logging callback for the quote provider library.
-    aesm_service[10]: The server sock is 0x5624fe742330
-    ```
-
-    This may be associated with [SGX DCAP](https://github.com/intel/linux-sgx/issues/812). And it's expected error message if not all interfaces in quote provider library are valid, and will not cause a failure.
-
-2. If you meet the following error when running MAA example:
-
-    ```bash
-    [get_platform_quote_cert_data ../qe_logic.cpp:352] p_sgx_get_quote_config returned NULL for p_pck_cert_config.
-    thread 'main' panicked at 'IOCTRL IOCTL_GET_DCAP_QUOTE_SIZE failed', /opt/src/occlum/tools/toolchains/dcap_lib/src/occlum_dcap.rs:70:13
-    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
-    [ERROR] occlum-pal: The init process exit with code: 101 (line 62, file src/pal_api.c)
-    [ERROR] occlum-pal: Failed to run the init process: EINVAL (line 150, file src/pal_api.c)
-    [ERROR] occlum-pal: Failed to do ECall: occlum_ecall_broadcast_interrupts with error code 0x2002: Invalid enclave identification. (line 26, file src/pal_interrupt_thread.c)
-    /opt/occlum/build/bin/occlum: line 337:  3004 Segmentation fault      (core dumped) RUST_BACKTRACE=1 "$instance_dir/build/bin/occlum-run" "$@"
-    ```
-
-    This may be associated with [[RFC] IOCTRL IOCTL_GET_DCAP_QUOTE_SIZE failed](https://github.com/occlum/occlum/issues/899).
-
-## Reference
-
-1.	<https://github.com/intel-analytics/BigDL-PPML-Azure-Occlum-Example> 
-2.	<https://www.intel.com/content/www/us/en/developer/tools/software-guard-extensions/overview.html> 
-3.	<https://www.databricks.com/glossary/what-are-spark-applications>
-4.	<https://github.com/occlum/occlum> 
-5.	<https://github.com/intel-analytics/BigDL>
-6.	<https://docs.microsoft.com/en-us/azure/open-datasets/dataset-taxi-yellow>
-7.	<https://azure.microsoft.com/en-us/services/storage/data-lake-storage/>
-8.	<https://azure.microsoft.com/en-us/services/key-vault/>
-9.	<https://azure.microsoft.com/en-us/services/azure-attestation/>
-10.	<https://github.com/Azure-Samples/confidential-container-samples/blob/main/confidential-big-data-spark/README.md>
-11.	<https://bigdl.readthedocs.io/en/latest/doc/PPML/Overview/ppml.html> 
diff --git a/docs/readthedocs/source/doc/PPML/Overview/devguide.md b/docs/readthedocs/source/doc/PPML/Overview/devguide.md
deleted file mode 100644
index ec3fe7b5..00000000
--- a/docs/readthedocs/source/doc/PPML/Overview/devguide.md
+++ /dev/null
@@ -1,537 +0,0 @@
-# Develop your own Big Data & AI applications with BigDL PPML
-
-### 0. Understand E2E Security with PPML
-
-Basic design guidelines for PPML applications are as follows:
-
-* Data in use/computation should be protected by SGX.
-* Data in transmit/network should be protected by encryption or TLS.
-* Data at rest/storage should be protected by encryption.
-
-This design ensures plain text data only be used in SGX, while in all others stages data is fully encrypted.
-
-![](../images/ppml_dev_basic.png)
-
-To our knowledge, most existing big data frameworks or systems have already provided network or storage protection. You can find more details in [Secure Your Services](https://bigdl.readthedocs.io/en/latest/doc/PPML/QuickStart/secure_your_services.html).
-
-Please check with your admin or security department for security features and services available. We recommend building PPML applications based on the following conditions:
-
-1. If you have network and storage protection enabled, and you want to secure computation with SGX. Then you can directly migrate your application into SGX with BigDL PPML. Please jump to [Migrate existing applications with BigDL PPML](#1-migrate-existing-applications-with-bigdl-ppml).
-2. If you don't have any security features enabled, especially storage protection. Then you can use PPMLContext and recommended KMS. Please jump to [Enhance your applications with PPMLContext](#2-enhance-your-applications-with-ppmlcontext).
-
-### 1. Migrate existing applications with BigDL PPML
-
-This working model doesn't require any code change. You can reuse existing code and applications. The only difference is that your cluster manager/admin needs to set up a new execution environment for PPML applications.
-
-You can find more details in these articles: 
-
-* [Installation for PPML](https://bigdl.readthedocs.io/en/latest/doc/PPML/Overview/install.html).
-* [Hello World Example](https://bigdl.readthedocs.io/en/latest/doc/PPML/Overview/quicktour.html).
-* [Deployment for production](https://bigdl.readthedocs.io/en/latest/doc/PPML/QuickStart/deploy_ppml_in_production.html).
-
-### 2. Enhance your applications with PPMLContext
-
-In this section, we will introduce how to secure your applications with `PPMLContext`. It requires a few code changes and configurations for your applications.
-
-First, you need to create a `PPMLContext`, which wraps `SparkSession` and provides methods to read encrypted data files into plain-text RDD/DataFrame and write DataFrame to encrypted data files. Then you can read & write data through `PPMLContext`.
-
-If you are familiar with Spark, you may find that the usage of `PPMLContext` is very similar to Spark.
-
-#### 2.1 Create PPMLContext
-
-- create a PPMLContext with `appName`
-
-   This is the simplest way to create a `PPMLContext`. When you don't need to read/write encrypted files, you can use this way to create a `PPMLContext`.
-
-   <details open>
-    <summary>scala</summary>
-
-   ```scala
-   import com.intel.analytics.bigdl.ppml.PPMLContext
-
-   val sc = PPMLContext.initPPMLContext("MyApp")
-   ```
-
-   </details>
-
-  <details>
-    <summary>python</summary>
-
-   ```python
-   from bigdl.ppml.ppml_context import *
-
-   sc = PPMLContext("MyApp")
-   ```
-
-   </details>
-
-   If you want to read/write encrypted files, then you need to provide more information.
-
-- create a PPMLContext with `appName` & `ppmlArgs`
-
-   `ppmlArgs` is ppml arguments in a Map, `ppmlArgs` varies according to the kind of Key Management Service (KMS) you are using. Key Management Service (KMS) is used to generate `primaryKey` and `dataKey` to encrypt/decrypt data. We provide 3 types of KMS ——SimpleKeyManagementService, EHSMKeyManagementService, AzureKeyManagementService.
-
-   Refer to [KMS Utils](https://github.com/intel-analytics/BigDL/blob/main/ppml/services/kms-utils/docker/README.md) to use KMS to generate `primaryKey` and `dataKey`, then you are ready to create **PPMLContext** with `ppmlArgs`.
-
-  - For `SimpleKeyManagementService`:
-
-      <details open>
-       <summary>scala</summary>
-
-      ```scala
-      import com.intel.analytics.bigdl.ppml.PPMLContext
-
-      val ppmlArgs: Map[String, String] = Map(
-             "spark.bigdl.kms.type" -> "SimpleKeyManagementService",
-             "spark.bigdl.kms.simple.id" -> "your_app_id",
-             "spark.bigdl.kms.simple.key" -> "your_app_key",
-             "spark.bigdl.kms.key.primary" -> "/your/primary/key/path/primaryKey",
-             "spark.bigdl.kms.key.data" -> "/your/data/key/path/dataKey"
-         )
-
-      val sc = PPMLContext.initPPMLContext("MyApp", ppmlArgs)
-      ```
-
-      </details>
-
-
-      <details>
-       <summary>python</summary>
-
-      ```python
-      from bigdl.ppml.ppml_context import *
-
-      ppml_args = {"kms_type": "SimpleKeyManagementService",
-                   "simple_app_id": "your_app_id",
-                   "simple_app_key": "your_app_key",
-                   "primary_key_path": "/your/primary/key/path/primaryKey",
-                   "data_key_path": "/your/data/key/path/dataKey"
-                  }
-
-      sc = PPMLContext("MyApp", ppml_args)
-      ```
-
-      </details>
-
-   - For `EHSMKeyManagementService`:
-
-      <details open>
-       <summary>scala</summary>
-
-      ```scala
-      import com.intel.analytics.bigdl.ppml.PPMLContext
-
-      val ppmlArgs: Map[String, String] = Map(
-             "spark.bigdl.kms.type" -> "EHSMKeyManagementService",
-             "spark.bigdl.kms.ehs.ip" -> "your_server_ip",
-             "spark.bigdl.kms.ehs.port" -> "your_server_port",
-             "spark.bigdl.kms.ehs.id" -> "your_app_id",
-             "spark.bigdl.kms.ehs.key" -> "your_app_key",
-             "spark.bigdl.kms.key.primary" -> "/your/primary/key/path/primaryKey",
-             "spark.bigdl.kms.key.data" -> "/your/data/key/path/dataKey"
-      )
-
-      val sc = PPMLContext.initPPMLContext("MyApp", ppmlArgs)
-      ```
-
-     </details>
-
-     <details>
-       <summary>python</summary>
-
-      ```python
-      from bigdl.ppml.ppml_context import *
-
-      ppml_args = {"kms_type": "EHSMKeyManagementService",
-                   "kms_server_ip": "your_server_ip",
-                   "kms_server_port": "your_server_port"
-                   "ehsm_app_id": "your_app_id",
-                   "ehsm_app_key": "your_app_key",
-                   "primary_key_path": "/your/primary/key/path/primaryKey",
-                   "data_key_path": "/your/data/key/path/dataKey"
-                  }
-
-      sc = PPMLContext("MyApp", ppml_args)
-      ```
-
-      </details>
-
-   - For `AzureKeyManagementService`
-
-
-     the parameter `clientId` is not necessary, you don't have to provide this parameter.
-
-      <details open>
-       <summary>scala</summary>
-
-      ```scala
-      import com.intel.analytics.bigdl.ppml.PPMLContext
-
-      val ppmlArgs: Map[String, String] = Map(
-             "spark.bigdl.kms.type" -> "AzureKeyManagementService",
-             "spark.bigdl.kms.azure.vault" -> "key_vault_name",
-             "spark.bigdl.kms.azure.clientId" -> "client_id",
-             "spark.bigdl.kms.key.primary" -> "/your/primary/key/path/primaryKey",
-             "spark.bigdl.kms.key.data" -> "/your/data/key/path/dataKey"
-         )
-
-      val sc = PPMLContext.initPPMLContext("MyApp", ppmlArgs)
-      ```
-
-     </details>
-
-     <details>
-       <summary>python</summary>
-
-       ```python
-       from bigdl.ppml.ppml_context import *
-
-       ppml_args = {"kms_type": "AzureKeyManagementService",
-                    "azure_vault": "your_azure_vault",
-                    "azure_client_id": "your_azure_client_id",
-                    "primary_key_path": "/your/primary/key/path/primaryKey",
-                    "data_key_path": "/your/data/key/path/dataKey"
-                   }
-
-       sc = PPMLContext("MyApp", ppml_args)
-       ```
-
-     </details>
-
-- create a PPMLContext with `sparkConf` & `appName` & `ppmlArgs`
-
-   If you need to set Spark configurations, you can provide a `SparkConf` with Spark configurations to create a `PPMLContext`.
-
-   <details open>
-    <summary>scala</summary>
-
-   ```scala
-   import com.intel.analytics.bigdl.ppml.PPMLContext
-   import org.apache.spark.SparkConf
-
-   val ppmlArgs: Map[String, String] = Map(
-       "spark.bigdl.kms.type" -> "SimpleKeyManagementService",
-       "spark.bigdl.kms.simple.id" -> "your_app_id",
-       "spark.bigdl.kms.simple.key" -> "your_app_key",
-       "spark.bigdl.kms.key.primary" -> "/your/primary/key/path/primaryKey",
-       "spark.bigdl.kms.key.data" -> "/your/data/key/path/dataKey"
-   )
-
-   val conf: SparkConf = new SparkConf().setMaster("local[4]")
-
-   val sc = PPMLContext.initPPMLContext(conf, "MyApp", ppmlArgs)
-   ```
-
-  </details>
-
-  <details>
-    <summary>python</summary>
-
-   ```python
-   from bigdl.ppml.ppml_context import *
-   from pyspark import SparkConf
-
-   ppml_args = {"kms_type": "SimpleKeyManagementService",
-                "simple_app_id": "your_app_id",
-                "simple_app_key": "your_app_key",
-                "primary_key_path": "/your/primary/key/path/primaryKey",
-                "data_key_path": "/your/data/key/path/dataKey"
-               }
-
-   conf = SparkConf()
-   conf.setMaster("local[4]")
-
-   sc = PPMLContext("MyApp", ppml_args, conf)
-   ```
-
-  </details>
-
-#### 2.2 Read and Write Files
-
-To read/write data, you should set the `CryptoMode`:
-
-- `plain_text`: no encryption
-- `AES/CBC/PKCS5Padding`: for CSV, JSON and text file
-- `AES_GCM_V1`: for PARQUET only
-- `AES_GCM_CTR_V1`: for PARQUET only
-
-To write data, you should set the `write` mode:
-
-- `overwrite`: Overwrite existing data with the content of dataframe.
-- `append`: Append new content of the dataframe to existing data or table.
-- `ignore: Ignore the current write operation if data/table already exists without any error.
-- `error`: Throw an exception if data or table already exists.
-- `errorifexists`: Throw an exception if data or table already exists.
-
-<details open>
-  <summary>scala</summary>
-
-```scala
-import com.intel.analytics.bigdl.ppml.crypto.{AES_CBC_PKCS5PADDING, PLAIN_TEXT}
-
-// read data
-val df = sc.read(cryptoMode = PLAIN_TEXT)
-         ...
-
-// write data
-sc.write(dataFrame = df, cryptoMode = AES_CBC_PKCS5PADDING)
-.mode("overwrite")
-...
-```
-
-</details>
-
-<details>
-  <summary>python</summary>
-
-```python
-from bigdl.ppml.ppml_context import *
-
-# read data
-df = sc.read(crypto_mode = CryptoMode.PLAIN_TEXT)
-  ...
-
-# write data
-sc.write(dataframe = df, crypto_mode = CryptoMode.AES_CBC_PKCS5PADDING)
-.mode("overwrite")
-...
-```
-
-</details>
-
-<details><summary>expand to see the examples of reading/writing CSV, PARQUET, JSON and text file</summary>
-
-The following examples use `sc` to represent an initialized `PPMLContext`
-
-**read/write CSV file**
-
-<details open>
-  <summary>scala</summary>
-
-```scala
-import com.intel.analytics.bigdl.ppml.PPMLContext
-import com.intel.analytics.bigdl.ppml.crypto.{AES_CBC_PKCS5PADDING, PLAIN_TEXT}
-
-// read a plain csv file and return a DataFrame
-val plainCsvPath = "/plain/csv/path"
-val df1 = sc.read(cryptoMode = PLAIN_TEXT).option("header", "true").csv(plainCsvPath)
-
-// write a DataFrame as a plain csv file
-val plainOutputPath = "/plain/output/path"
-sc.write(df1, PLAIN_TEXT)
-.mode("overwrite")
-.option("header", "true")
-.csv(plainOutputPath)
-
-// read a encrypted csv file and return a DataFrame
-val encryptedCsvPath = "/encrypted/csv/path"
-val df2 = sc.read(cryptoMode = AES_CBC_PKCS5PADDING).option("header", "true").csv(encryptedCsvPath)
-
-// write a DataFrame as a encrypted csv file
-val encryptedOutputPath = "/encrypted/output/path"
-sc.write(df2, AES_CBC_PKCS5PADDING)
-.mode("overwrite")
-.option("header", "true")
-.csv(encryptedOutputPath)
-```
-
-</details>
-
-<details>
-  <summary>python</summary>
-
-```python
-# import
-from bigdl.ppml.ppml_context import *
-
-# read a plain csv file and return a DataFrame
-plain_csv_path = "/plain/csv/path"
-df1 = sc.read(CryptoMode.PLAIN_TEXT).option("header", "true").csv(plain_csv_path)
-
-# write a DataFrame as a plain csv file
-plain_output_path = "/plain/output/path"
-sc.write(df1, CryptoMode.PLAIN_TEXT)
-.mode('overwrite')
-.option("header", True)
-.csv(plain_output_path)
-
-# read a encrypted csv file and return a DataFrame
-encrypted_csv_path = "/encrypted/csv/path"
-df2 = sc.read(CryptoMode.AES_CBC_PKCS5PADDING).option("header", "true").csv(encrypted_csv_path)
-
-# write a DataFrame as a encrypted csv file
-encrypted_output_path = "/encrypted/output/path"
-sc.write(df2, CryptoMode.AES_CBC_PKCS5PADDING)
-.mode('overwrite')
-.option("header", True)
-.csv(encrypted_output_path)
-```
-
-</details>
-
-**read/write PARQUET file**
-
-<details open>
-  <summary>scala</summary>
-
-```scala
-import com.intel.analytics.bigdl.ppml.PPMLContext
-import com.intel.analytics.bigdl.ppml.crypto.{AES_GCM_CTR_V1, PLAIN_TEXT}
-
-// read a plain parquet file and return a DataFrame
-val plainParquetPath = "/plain/parquet/path"
-val df1 = sc.read(PLAIN_TEXT).parquet(plainParquetPath)
-
-// write a DataFrame as a plain parquet file
-plainOutputPath = "/plain/output/path"
-sc.write(df1, PLAIN_TEXT)
-.mode("overwrite")
-.parquet(plainOutputPath)
-
-// read a encrypted parquet file and return a DataFrame
-val encryptedParquetPath = "/encrypted/parquet/path"
-val df2 = sc.read(AES_GCM_CTR_V1).parquet(encryptedParquetPath)
-
-// write a DataFrame as a encrypted parquet file
-val encryptedOutputPath = "/encrypted/output/path"
-sc.write(df2, AES_GCM_CTR_V1)
-.mode("overwrite")
-.parquet(encryptedOutputPath)
-```
-
-</details>
-
-
-<details>
-  <summary>python</summary>
-
-```python
-# import
-from bigdl.ppml.ppml_context import *
-
-# read a plain parquet file and return a DataFrame
-plain_parquet_path = "/plain/parquet/path"
-df1 = sc.read(CryptoMode.PLAIN_TEXT).parquet(plain_parquet_path)
-
-# write a DataFrame as a plain parquet file
-plain_output_path = "/plain/output/path"
-sc.write(df1, CryptoMode.PLAIN_TEXT)
-.mode('overwrite')
-.parquet(plain_output_path)
-
-# read a encrypted parquet file and return a DataFrame
-encrypted_parquet_path = "/encrypted/parquet/path"
-df2 = sc.read(CryptoMode.AES_GCM_CTR_V1).parquet(encrypted_parquet_path)
-
-# write a DataFrame as a encrypted parquet file
-encrypted_output_path = "/encrypted/output/path"
-sc.write(df2, CryptoMode.AES_GCM_CTR_V1)
-.mode('overwrite')
-.parquet(encrypted_output_path)
-```
-
-</details>
-
-**read/write JSON file**
-
-<details open>
-  <summary>scala</summary>
-
-```scala
-import com.intel.analytics.bigdl.ppml.PPMLContext
-import com.intel.analytics.bigdl.ppml.crypto.{AES_CBC_PKCS5PADDING, PLAIN_TEXT}
-
-// read a plain json file and return a DataFrame
-val plainJsonPath = "/plain/json/path"
-val df1 = sc.read(PLAIN_TEXT).json(plainJsonPath)
-
-// write a DataFrame as a plain json file
-val plainOutputPath = "/plain/output/path"
-sc.write(df1, PLAIN_TEXT)
-.mode("overwrite")
-.json(plainOutputPath)
-
-// read a encrypted json file and return a DataFrame
-val encryptedJsonPath = "/encrypted/parquet/path"
-val df2 = sc.read(AES_CBC_PKCS5PADDING).json(encryptedJsonPath)
-
-// write a DataFrame as a encrypted parquet file
-val encryptedOutputPath = "/encrypted/output/path"
-sc.write(df2, AES_CBC_PKCS5PADDING)
-.mode("overwrite")
-.json(encryptedOutputPath)
-```
-
-</details>
-
-<details>
-  <summary>python</summary>
-
-```python
-# import
-from bigdl.ppml.ppml_context import *
-
-# read a plain json file and return a DataFrame
-plain_json_path = "/plain/json/path"
-df1 = sc.read(CryptoMode.PLAIN_TEXT).json(plain_json_path)
-
-# write a DataFrame as a plain json file
-plain_output_path = "/plain/output/path"
-sc.write(df1, CryptoMode.PLAIN_TEXT)
-.mode('overwrite')
-.json(plain_output_path)
-
-# read a encrypted json file and return a DataFrame
-encrypted_json_path = "/encrypted/parquet/path"
-df2 = sc.read(CryptoMode.AES_CBC_PKCS5PADDING).json(encrypted_json_path)
-
-# write a DataFrame as a encrypted parquet file
-encrypted_output_path = "/encrypted/output/path"
-sc.write(df2, CryptoMode.AES_CBC_PKCS5PADDING)
-.mode('overwrite')
-.json(encrypted_output_path)
-```
-
-</details>
-
-**read textfile**
-
-<details open>
-  <summary>scala</summary>
-
-```scala
-import com.intel.analytics.bigdl.ppml.PPMLContext
-import com.intel.analytics.bigdl.ppml.crypto.{AES_CBC_PKCS5PADDING, PLAIN_TEXT}
-
-// read from a plain csv file and return a RDD
-val plainCsvPath = "/plain/csv/path"
-val rdd1 = sc.textfile(plainCsvPath) // the default cryptoMode is PLAIN_TEXT
-
-// read from a encrypted csv file and return a RDD
-val encryptedCsvPath = "/encrypted/csv/path"
-val rdd2 = sc.textfile(path=encryptedCsvPath, cryptoMode=AES_CBC_PKCS5PADDING)
-```
-
-</details>
-
-<details>
-  <summary>python</summary>
-
-```python
-# import
-from bigdl.ppml.ppml_context import *
-
-# read from a plain csv file and return a RDD
-plain_csv_path = "/plain/csv/path"
-rdd1 = sc.textfile(plain_csv_path) # the default crypto_mode is "plain_text"
-
-# read from a encrypted csv file and return a RDD
-encrypted_csv_path = "/encrypted/csv/path"
-rdd2 = sc.textfile(path=encrypted_csv_path, crypto_mode=CryptoMode.AES_CBC_PKCS5PADDING)
-```
-
-</details>
-
-</details>
-
-For more usage with `PPMLContext` Python API, please refer to [PPMLContext Python API](https://github.com/intel-analytics/BigDL/blob/main/python/ppml/src/bigdl/ppml/README.md).
diff --git a/docs/readthedocs/source/doc/PPML/Overview/examples.rst b/docs/readthedocs/source/doc/PPML/Overview/examples.rst
deleted file mode 100644
index f1c56360..00000000
--- a/docs/readthedocs/source/doc/PPML/Overview/examples.rst
+++ /dev/null
@@ -1,14 +0,0 @@
-Tutorials & Examples
-=====================================
-
-* `A Hello World Example <../Overview/quicktour.html>`__ is a very simple exmaple for getting started.
-
-* `PPML E2E Example <../QuickStart/end-to-end.html>`__ introduces the end-to-end PPML workflow using SimpleQuery as an example.
-
-* `Develop PPML application <../Overview/devguide.html>`__ introduces how to migrate/develop PPML applications.
-
-* `PPML E2E Example on Azure <../Overview/azure_ppml.html>`__ introduces the end-to-end PPML workflow on Azure Cloud using TPC-H as an example.
-
-* `PPML Occlum E2E Example on Alicloud <../Overview/ali_ecs_occlum_cn.html>`__ introduces the end-to-end PPML with Occlum workflow on Alibaba ECS.
-
-* You can also find Trusted Data Analysis, Trusted ML, Trusted DL and Trusted FL examples in `more examples <https://github.com/intel-analytics/BigDL/tree/main/ppml/docs/examples.md>`__.
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/PPML/Overview/install.md b/docs/readthedocs/source/doc/PPML/Overview/install.md
deleted file mode 100644
index c8db1098..00000000
--- a/docs/readthedocs/source/doc/PPML/Overview/install.md
+++ /dev/null
@@ -1,85 +0,0 @@
-# PPML Installation
-
----
-
-#### OS requirement
-
-
-```eval_rst
-.. note::
-    **Hardware requirements**:
-
-     Intel SGX: PPML's features (except Homomorphic Encryption) are mainly built upon Intel SGX. Intel SGX requires Intel CPU with SGX feature, e.g., IceLake (3rd Xeon Platform). `Check if your CPU has SGX feature <https://www.intel.com/content/www/us/en/support/articles/000028173/processors.html>`_
-```
-```eval_rst
-.. note::
-    **Supported OS**:
-
-     PPML is thoroughly tested on Ubuntu (18.04/20.04), and should works fine on CentOS/Redhat 8. Note that UEFI (Unified Extensible Firmware Interface) is required for remote attestation registration stage.
-```
-
-#### Enable SGX for your Cluster
-
-```eval_rst
-.. mermaid::
-   
-   graph TD
-      usesgx{Use SGX?} -- Yes --> installsgx(Install SGX Driver for Node)
-      usesgx{Use SGX?} -- No --> he(Homomorphic Encryption)
-      installsgx --> installaesm(Install AESM for Node)
-      installaesm --> needatt{Need Attestation?}
-      needatt -- Yes --> installPCCS(Install PCCS for Cluster)
-```
-
-
-##### Install SGX Driver
-
-Please refer to [Install SGX (Software Guard Extensions) Driver for Xeon Server](https://bigdl.readthedocs.io/en/latest/doc/PPML/QuickStart/install_sgx_driver.html).
-
-##### Install AESM (Architectural Enclave Service Manager)
-
-```bash
-echo 'deb [arch=amd64] https://download.01.org/intel-sgx/sgx_repo/ubuntu focal main' | sudo tee /etc/apt/sources.list.d/intel-sgx.list > /dev/null
-wget -O - https://download.01.org/intel-sgx/sgx_repo/ubuntu/intel-sgx-deb.key | sudo apt-key add -
-sudo apt update
-sudo apt-get install libsgx-urts libsgx-dcap-ql libsgx-dcap-default-qpl
-```
-
-##### Install PCCS (Provisioning Certificate Caching Service) (for attestation)
-
-Please refer to [Intel® Software Guard Extensions Data Center Attestation Primitives (Intel® SGX DCAP): A Quick Install Guide](https://www.intel.com/content/www/us/en/developer/articles/guide/intel-software-guard-extensions-data-center-attestation-primitives-quick-install-guide.html)
-
-Note that PCCS requires Internet connection for downloading certificates from Intel PCS. PCCS is fully [open sourced on Github](https://github.com/intel/SGXDataCenterAttestationPrimitives/blob/master/QuoteGeneration/pccs), you can build your own PCCS based on these codes.
-
-```eval_rst
-.. mermaid::
-   
-   graph BT
-      pcs(Intel PCS) --> PCCS
-      PCCS --> pcs
-      subgraph Internet
-         pcs
-      end
-      subgraph Data Center
-         PCCS --> sgx(SGX Server)
-         sgx --> PCCS
-      end
-```
-
-##### Install Kubernetes SGX Plugin (K8S only)
-
-Please refer to [Deploy the Intel SGX Device Plugin for Kubernetes](https://bigdl.readthedocs.io/en/latest/doc/PPML/QuickStart/deploy_intel_sgx_device_plugin_for_kubernetes.html).
-
-### FAQs
-
-1. Is SGX supported on CentOS 6/7?
-No. Please upgrade your OS if possible.
-
-2. Do we need Internet connection for SGX node?
-No. We can use PCCS for registration and certificate download. Only PCCS need Internet connection.
-
-3. Does PCCS require SGX or other hardware?
-No. PCCS can be installed on any server with Internet connection.
-
-4. Can we turn off the attestation?
-Of course. But, turning off attestation will break the integrity provided by SGX. Attestation is turned off to simplify installation for quick start.
diff --git a/docs/readthedocs/source/doc/PPML/Overview/intro.md b/docs/readthedocs/source/doc/PPML/Overview/intro.md
deleted file mode 100644
index 99b2e02a..00000000
--- a/docs/readthedocs/source/doc/PPML/Overview/intro.md
+++ /dev/null
@@ -1,35 +0,0 @@
-# PPML Introduction
-
-## 1. What is BigDL PPML?
-
-<video src="https://user-images.githubusercontent.com/61072813/184758908-da01f8ea-8f52-4300-9736-8c5ee981d4c0.mp4" width="100%" controls></video>
-
----
-
-Protecting data privacy and confidentiality is critical in a world where data is everywhere. In recent years, more and more countries have enacted data privacy legislation or are expected to pass comprehensive legislation to protect data privacy, the importance of privacy and data protection is increasingly recognized.
-
-To better protect sensitive data, it's necessary to ensure security for all dimensions of data lifecycle: data at rest, data in transit, and data in use. Data being transferred on a network is `in transit`, data in storage is `at rest`, and data being processed is `in use`.
-
-<p align="center">
-  <img src="https://user-images.githubusercontent.com/61072813/177720405-60297d62-d186-4633-8b5f-ff4876cc96d6.png" alt="data lifecycle" width='390px' height='260px'/>
-</p>
-
-To protect data in transit, enterprises often choose to encrypt sensitive data prior to moving or use encrypted connections (HTTPS, SSL, TLS, FTPS, etc) to protect the contents of data in transit. For protecting data at rest, enterprises can simply encrypt sensitive files prior to storing them or choose to encrypt the storage drive itself. However, the third state, data in use has always been a weakly protected target. There are three emerging solutions seek to reduce the data-in-use attack surface: homomorphic encryption, multi-party computation, and confidential computing.
-
-Among these security technologies, [Confidential computing](https://www.intel.com/content/www/us/en/security/confidential-computing.html) protects data in use by performing computation in a hardware-based [Trusted Execution Environment (TEE)](https://en.wikipedia.org/wiki/Trusted_execution_environment). [Intel® SGX](https://www.intel.com/content/www/us/en/developer/tools/software-guard-extensions/overview.html) is Intel's Trusted Execution Environment (TEE), offering hardware-based memory encryption that isolates specific application code and data in memory. [Intel® TDX](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html) is the next generation Intel's Trusted Execution Environment (TEE), introducing new, architectural elements to help deploy hardware-isolated, virtual machines (VMs) called trust domains (TDs).
-
-[PPML](https://bigdl.readthedocs.io/en/latest/doc/PPML/Overview/ppml.html) (Privacy Preserving Machine Learning) in [BigDL 2.0](https://github.com/intel-analytics/BigDL) provides a Trusted Cluster Environment for secure Big Data & AI applications, even on untrusted cloud environment. By combining Intel Software Guard Extensions (SGX) with several other security technologies (e.g., attestation, key management service, private set intersection, federated learning, homomorphic encryption, etc.), BigDL PPML ensures end-to-end security enabled for the entire distributed workflows, such as Apache Spark, Apache Flink, XGBoost, TensorFlow, PyTorch, etc.
-
-
-## 2. Why BigDL PPML?
-PPML allows organizations to explore powerful AI techniques while working to minimize the security risks associated with handling large amounts of sensitive data. PPML protects data at rest, in transit and in use: compute and memory protected by SGX Enclaves, storage (e.g., data and model) protected by encryption, network communication protected by remote attestation and Transport Layer Security (TLS), and optional Federated Learning support.
-
-<p align="left">
-  <img src="https://user-images.githubusercontent.com/61072813/177922914-f670111c-e174-40d2-b95a-aafe92485024.png" alt="data lifecycle" width='600px' />
-</p>
-
-With BigDL PPML, you can run trusted Big Data & AI applications
-- **Trusted Spark SQL & Dataframe**: with the trusted Big Data analytics and ML/DL support, users can run standard Spark data analysis (such as Spark SQL, Dataframe, MLlib, etc.) in a secure and trusted fashion.
-- **Trusted ML (Machine Learning)**: with the trusted Big Data analytics and ML/DL support, users can run distributed machine learning (such as MLlib, XGBoost) in a secure and trusted fashion.
-- **Trusted DL (Deep Learning)**: with the trusted Big Data analytics and ML/DL support, users can run distributed deep learning (such as BigDL, Orca, Nano, DLlib) in a secure and trusted fashion.
-- **Trusted FL (Federated Learning)**: with PSI (Private Set Intersection), Secured Aggregation and trusted federated learning support, users can build united model across different parties without compromising privacy, even if these parities have different datasets or features.
diff --git a/docs/readthedocs/source/doc/PPML/Overview/misc.rst b/docs/readthedocs/source/doc/PPML/Overview/misc.rst
deleted file mode 100644
index eef51670..00000000
--- a/docs/readthedocs/source/doc/PPML/Overview/misc.rst
+++ /dev/null
@@ -1,16 +0,0 @@
-Advanced Topic
-====================
-
-* `PPML User Guide <ppml.html>`_
-* `Ensure Integrity and Build Trust with Attestation <attestation_basic.html>`_
-* `Trusted Big Data Analytics and ML <trusted_big_data_analytics_and_ml.html>`_
-* `Trusted FL (Federated Learning) <trusted_fl.html>`_
-* `Secure Your Services <../QuickStart/secure_your_services.html>`_
-* `Deploy PPML Applications in the Production Environment <../QuickStart/deploy_ppml_in_production.html>`_
-* `Install SGX Driver for Xeon Server <../QuickStart/install_sgx_driver.html>`_
-* `Deploy the Intel SGX Device Plugin for Kubernetes <../QuickStart/deploy_intel_sgx_device_plugin_for_kubernetes.html>`_
-* `Trusted Cluster Serving with Graphene on Kubernetes <../QuickStart/trusted-serving-on-k8s-guide.html>`_
-* `TPC-H with Trusted SparkSQL on Kubernetes <../QuickStart/tpc-h_with_sparksql_on_k8s.html>`_
-* `TPC-DS with Trusted SparkSQL on Kubernetes <../QuickStart/tpc-ds_with_sparksql_on_k8s.html>`_
-* `PPML on Azure with Occlum <azure_ppml_occlum.html>`_
-* `Secure LightGBM on Spark <secure_lightgbm_on_spark.html>`_
diff --git a/docs/readthedocs/source/doc/PPML/Overview/ppml.md b/docs/readthedocs/source/doc/PPML/Overview/ppml.md
deleted file mode 100644
index df61d00a..00000000
--- a/docs/readthedocs/source/doc/PPML/Overview/ppml.md
+++ /dev/null
@@ -1,826 +0,0 @@
-# Privacy Preserving Machine Learning (PPML) User Guide
-
-## 1. Introduction
-Protecting privacy and confidentiality is critical for large-scale data analysis and machine learning. BigDL ***PPML*** combines various low-level hardware and software security technologies (e.g., [Intel® Software Guard Extensions (Intel® SGX)](https://www.intel.com/content/www/us/en/architecture-and-technology/software-guard-extensions.html), [Library Operating System (LibOS)](https://events19.linuxfoundation.org/wp-content/uploads/2017/12/Library-OS-is-the-New-Container-Why-is-Library-OS-A-Better-Option-for-Compatibility-and-Sandboxing-Chia-Che-Tsai-UC-Berkeley.pdf) such as [Graphene](https://github.com/gramineproject/graphene) and [Occlum](https://github.com/occlum/occlum), [Federated Learning](https://en.wikipedia.org/wiki/Federated_learning), etc.), so that users can continue to apply standard Big Data and AI technologies (such as Apache Spark, Apache Flink, Tensorflow, PyTorch, etc.) without sacrificing privacy.
-
-## 1.1 PPML for Big Data AI
-BigDL provides a distributed PPML platform for protecting the *end-to-end Big Data AI pipeline* (from data ingestion, data analysis, all the way to machine learning and deep learning). In particular, it extends the single-node [Trusted Execution Environment](https://en.wikipedia.org/wiki/Trusted_execution_environment) to provide a *Trusted Cluster Environment*, so as to run unmodified Big Data analysis and ML/DL programs in a secure fashion on (private or public) cloud:
-
- * Compute and memory protected by SGX Enclaves
- * Network communication protected by remote attestation and [Transport Layer Security (TLS)](https://en.wikipedia.org/wiki/Transport_Layer_Security)
- * Storage (e.g., data and model) protected by encryption
- * Optional Federated Learning support
-
-That is, even when the program runs in an untrusted cloud environment, all the data and models are protected (e.g., using encryption) on disk and network, and the compute and memory are also protected using SGX Enclaves, so as to preserve confidentiality and privacy during data analysis and machine learning.
-
-In the current release, two types of trusted Big Data AI applications are supported:
-
-1. Big Data analytics and ML/DL (supporting Apache Spark and BigDL)
-2. Realtime compute and ML/DL (supporting Apache Flink and [BigDL Cluster Serving](https://www.usenix.org/conference/opml20/presentation/song))
-
-## 2. Trusted Big Data Analytics and ML
-With the trusted Big Data analytics and Machine Learning(ML)/Deep Learning(DL) support, users can run standard Spark data analysis (such as Spark SQL, Dataframe, Spark MLlib, etc.) and distributed deep learning (using BigDL) in a secure and trusted fashion.
-
-### 2.1 Prerequisite
-
-Download scripts and dockerfiles from [here](https://github.com/intel-analytics/BigDL). And do the following commands:
-```bash
-cd BigDL/ppml/
-```
-
-1. Install SGX Driver
-
-    Please check if the current processor supports SGX from [here](https://www.intel.com/content/www/us/en/support/articles/000028173/processors/intel-core-processors.html). Then, enable SGX feature in BIOS. Note that after SGX is enabled, a portion of memory will be assigned to SGX (this memory cannot be seen/used by OS and other applications).
-
-    Check SGX driver with `ls /dev | grep sgx`. If SGX driver is not installed, please install SGX Data Center Attestation Primitives driver from [here](https://github.com/intel/SGXDataCenterAttestationPrimitives/tree/master/driver/linux):
-
-    ```bash
-    cd scripts/
-    ./install-graphene-driver.sh
-    cd ..
-    ```
-
-2. Generate the signing key for SGX Enclaves
-
-   Generate the enclave key using the command below, keep it safely for future remote attestations and to start SGX Enclaves more securely. It will generate a file `enclave-key.pem` in the current working directory, which will be the enclave key. To store the key elsewhere, modify the output file path.
-
-    ```bash
-    cd scripts/
-    openssl genrsa -3 -out enclave-key.pem 3072
-    cd ..
-    ```
-
-3. Prepare keys for TLS with root permission (test only, need input security password for keys). Please also install JDK/OpenJDK and set the environment path of the java path to get `keytool`.
-
-    ```bash
-    cd scripts/
-    ./generate-keys.sh
-    cd ..
-    ```
-    When entering the passphrase or password, you could input the same password by yourself; and these passwords could also be used for the next step of generating other passwords. Password should be longer than 6 bits and contain numbers and letters, and one sample password is "3456abcd". These passwords would be used for future remote attestations and to start SGX enclaves more securely. And This script will generate 6 files in `./ppml/scripts/keys` dir (you can replace them with your own TLS keys).
-
-    ```bash
-    keystore.jks
-    keystore.pkcs12
-    server.crt
-    server.csr
-    server.key
-    server.pem
-    ```
-
-4. Generate `password` to avoid plain text security password (used for key generation in `generate-keys.sh`) transfer.
-
-    ```bash
-    cd scripts/
-    ./generate-password.sh used_password_when_generate_keys
-    cd ..
-    ```
-    This script will generate 2 files in `./ppml/scripts/password` dir.
-
-    ```bash
-    key.txt
-    output.bin
-    ```
-### 2.2 Trusted Big Data Analytics and ML on JVM
-
-#### 2.2.1 Prepare Docker Image
-
-Pull Docker image from Dockerhub
-```bash
-docker pull intelanalytics/bigdl-ppml-trusted-big-data-ml-scala-graphene:2.1.0-SNAPSHOT
-```
-
-Alternatively, you can build Docker image from Dockerfile (this will take some time):
-
-```bash
-cd trusted-big-data-ml/python/docker-graphene
-./build-docker-image.sh
-```
-
-#### 2.2.2 Run Trusted Big Data and ML on Single Node
-
-##### 2.2.2.1 Start PPML Container
-
-Enter `BigDL/ppml/trusted-big-data-ml/python/docker-graphene` dir.
-
-1. Copy `keys` and `password`
-    ```bash
-    cd trusted-big-data-ml/python/docker-graphene
-    # copy keys and password into the current directory
-    cp -r ../.././../scripts/keys/ .
-    cp -r ../.././../scripts/password/ .
-    ```
-2. Prepare the data
-   To train a model with PPML in BigDL, you need to prepare the data first. The Docker image is taking lenet and mnist as examples. <br>
-   You can download the MNIST Data from [here](http://yann.lecun.com/exdb/mnist/). Unzip all the files and put them in one folder(e.g. mnist). <br>
-   There are four files. **train-images-idx3-ubyte** contains train images, **train-labels-idx1-ubyte** is train label file, **t10k-images-idx3-ubyte** has validation images    and **t10k-labels-idx1-ubyte** contains validation labels. For more detail, please refer to the download page. <br>
-   After you decompress the gzip files, these files may be renamed by some decompress tools, e.g. **train-images-idx3-ubyte** is renamed to **train-images.idx3-ubyte**. Please change the name back before you run the example.  <br>
-   
-3. To start the container, modify the paths in deploy-local-spark-sgx.sh, and then run the following commands:
-    ```bash
-    ./deploy-local-spark-sgx.sh
-    sudo docker exec -it spark-local bash
-    cd /ppml/trusted-big-data-ml
-    ./init.sh
-    ```
-    **ENCLAVE_KEY_PATH** means the absolute path to the "enclave-key.pem", according to the above commands, the path would be like "BigDL/ppml/scripts/enclave-key.pem". <br>
-    **DATA_PATH** means the absolute path to the data(like mnist) that would use later in the spark program. According to the above commands, the path would be like "BigDL/ppml/trusted-big-data-ml/python/docker-graphene/mnist" <br>
-    **KEYS_PATH** means the absolute path to the keys you just created and copied to. According to the above commands, the path would be like "BigDL/ppml/trusted-big-data-ml/python/docker-graphene/keys" <br>
-    **LOCAL_IP** means your local IP address. <br>
-
-##### 2.2.2.2 Run Your Spark Applications with BigDL PPML on SGX
-
-To run your PySpark application, you need to prepare your PySpark application and put it under the trusted directory in SGX  `/ppml/trusted-big-data-ml/work`. Then run with `bigdl-ppml-submit.sh` using the command:
-
-```bash
-./bigdl-ppml-submit.sh work/YOUR_PROMGRAM.py | tee YOUR_PROGRAM-sgx.log
-```
-
-When the program finishes, check the results with the log `YOUR_PROGRAM-sgx.log`.
-
-##### 2.2.2.3 Run Trusted Spark Examples with BigDL PPML SGX
-
-##### 2.2.2.3.1 Run Trusted Spark Pi
-
-This example runs a simple Spark PI program, which is an easy way to verify if the Trusted PPML environment is ready.  
-
-Run the script to run trusted Spark Pi:
-
-```bash
-bash start-spark-local-pi-sgx.sh
-```
-
-Open another terminal and check the log:
-
-```bash
-sudo docker exec -it spark-local cat /ppml/trusted-big-data-ml/spark.local.pi.sgx.log | egrep "###|INFO|Pi"
-```
-
-The result should look something like this:
-
->   Pi is roughly 3.1422957114785572
-
-##### 2.2.2.3.2 Run Trusted Spark SQL
-
-This example shows how to run trusted Spark SQL (e.g.,  TPC-H queries).
-
-First, download and install sbt from [here](https://www.scala-sbt.org/download.html) and deploy a Hadoop Distributed File System(HDFS) from [here](https://hadoop.apache.org/docs/r2.7.7/hadoop-project-dist/hadoop-common/ClusterSetup.html) for the Transaction Processing Performance Council Benchmark H (TPC-H) dataset and output, then build the source codes with SBT and generate the TPC-H dataset according to the TPC-H example from [here](https://github.com/intel-analytics/zoo-tutorials/tree/master/tpch-spark). After that, check if there is  `spark-tpc-h-queries_2.11-1.0.jar` under `tpch-spark/target/scala-2.11`; if so, we have successfully packaged the project.
-
-Copy the TPC-H package to the container:
-
-```bash
-docker cp tpch-spark/ spark-local:/ppml/trusted-big-data-ml/work
-docker cp tpch-spark/start-spark-local-tpc-h-sgx.sh spark-local:/ppml/trusted-big-data-ml/
-sudo docker exec -it spark-local bash
-cd /ppml/trusted-big-data-ml/
-```
-Then run the script below:
-
-```bash
-bash start-spark-local-tpc-h-sgx.sh [your_hdfs_tpch_data_dir] [your_hdfs_output_dir]
-```
-
-Open another terminal and check the log:
-
-```bash
-sudo docker exec -it spark-local cat /ppml/trusted-big-data-ml/spark.local.tpc.h.sgx.log | egrep "###|INFO|finished"
-```
-
-The result should look like this:
-
->   ----------------22 finished--------------------
-
-##### 2.2.2.3.3 Run Trusted Deep Learning
-
-This example shows how to run trusted deep learning (using a BigDL LetNet program).
-
-First, download the MNIST Data from [here](http://yann.lecun.com/exdb/mnist/). Use `gzip -d` to unzip all the downloaded files (train-images-idx3-ubyte.gz, train-labels-idx1-ubyte.gz, t10k-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz) and put them into folder `/ppml/trusted-big-data-ml/work/data`.
-
-Then run the following script:  
-
-```bash
-bash start-spark-local-train-sgx.sh
-```
-
-Open another terminal and check the log:
-```bash
-sudo docker exec -it spark-local cat /ppml/trusted-big-data-ml/spark.local.sgx.log | egrep "###|INFO"
-```
-or
-```bash
-sudo docker logs spark-local | egrep "###|INFO"
-```
-
-The result should look like this:
-
-```bash
-############# train optimized[P1182:T2:java] ---- end time: 310534 ms return from shim_write(...) = 0x1d
-############# ModuleLoader.saveToFile File.saveBytes end, used 827002 ms[P1182:T2:java] ---- end time: 1142754 ms return from shim_write(...) = 0x48
-############# ModuleLoader.saveToFile saveWeightsToFile end, used 842543 ms[P1182:T2:java] ---- end time: 1985297 ms return from shim_write(...) = 0x4b
-############# model saved[P1182:T2:java] ---- end time: 1985297 ms return from shim_write(...) = 0x19
-```
-
-#### 2.2.3 Run Trusted Big Data and ML on Cluster
-
-WARNING: If you want spark standalone mode, please refer to [standalone/README.md](https://github.com/intel-analytics/BigDL/blob/main/ppml/trusted-big-data-ml/python/docker-graphene/standalone/README.md). But it is not recommended.
-
-Follow the guide below to run Spark on Kubernetes manually. Alternatively, you can also use Helm to set everything up automatically. See [Kubernetes/README.md](https://github.com/intel-analytics/BigDL/blob/main/ppml/trusted-big-data-ml/python/docker-graphene/kubernetes/README.md).
-
-##### 2.2.3.1 Configure the Environment
-
-1. Enter `BigDL/ppml/trusted-big-data-ml/python/docker-graphene` dir. Refer to the previous section about [preparing data, keys and passwords](#2221-start-ppml-container). Then run the following commands to generate your enclave key and add it to your Kubernetes cluster as a secret. 
-
-   ```bash
-   kubectl apply -f keys/keys.yaml
-   kubectl apply -f password/password.yaml
-   cd kubernetes
-   bash enclave-key-to-secret.sh
-   ```
-2. Create the [RBAC(Role-based access control)](https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac) :
-
-   ```bash
-   kubectl create serviceaccount spark
-   kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
-   ```
-
-3. Generate K8s config file, modify `YOUR_DIR` to the location you want to store the config:
-
-   ```bash
-   kubectl config view --flatten --minify > /YOUR_DIR/kubeconfig
-   ```
-
-4. Create K8s secret, the secret created `YOUR_SECRET` should be the same as the password you specified in step 1:
-
-   ```bash
-   kubectl create secret generic spark-secret --from-literal secret=YOUR_SECRET
-   ```
-
-##### 2.2.3.2  Start the client container
-
-Configure the environment variables in the following script before running it. Check [BigDL PPML SGX related configurations](https://github.com/intel-analytics/BigDL/tree/main/ppml/trusted-big-data-ml/python/docker-graphene#1-bigdl-ppml-sgx-related-configurations) for detailed memory configurations. Modify `YOUR_DIR` to the location you specify in section 2.2.3.1. Modify `$LOCAL_IP` to the IP address of your machine.
-
-```bash
-export K8S_MASTER=k8s://$( sudo kubectl cluster-info | grep 'https.*' -o -m 1 )
-echo The k8s master is $K8S_MASTER .
-export ENCLAVE_KEY=/YOUR_DIR/enclave-key.pem
-export DATA_PATH=/YOUR_DIR/data
-export KEYS_PATH=/YOUR_DIR/keys
-export SECURE_PASSWORD_PATH=/YOUR_DIR/password
-export KUBECONFIG_PATH=/YOUR_DIR/kubeconfig
-export LOCAL_IP=$LOCAL_IP
-export DOCKER_IMAGE=intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT
-sudo docker run -itd \
-    --privileged \
-    --net=host \
-    --name=spark-local-k8s-client \
-    --cpuset-cpus="0-4" \
-    --oom-kill-disable \
-    --device=/dev/sgx/enclave \
-    --device=/dev/sgx/provision \
-    -v /var/run/aesmd/aesm.socket:/var/run/aesmd/aesm.socket \
-    -v $ENCLAVE_KEY:/graphene/Pal/src/host/Linux-SGX/signer/enclave-key.pem \
-    -v $DATA_PATH:/ppml/trusted-big-data-ml/work/data \
-    -v $KEYS_PATH:/ppml/trusted-big-data-ml/work/keys \
-    -v $SECURE_PASSWORD_PATH:/ppml/trusted-big-data-ml/work/password \
-    -v $KUBECONFIG_PATH:/root/.kube/config \
-    -e RUNTIME_SPARK_MASTER=$K8S_MASTER \
-    -e RUNTIME_K8S_SERVICE_ACCOUNT=spark \
-    -e RUNTIME_K8S_SPARK_IMAGE=$DOCKER_IMAGE \
-    -e RUNTIME_DRIVER_HOST=$LOCAL_IP \
-    -e RUNTIME_DRIVER_PORT=54321 \
-    -e RUNTIME_DRIVER_CORES=1 \
-    -e RUNTIME_EXECUTOR_INSTANCES=1 \
-    -e RUNTIME_EXECUTOR_CORES=8 \
-    -e RUNTIME_EXECUTOR_MEMORY=1g \
-    -e RUNTIME_TOTAL_EXECUTOR_CORES=4 \
-    -e RUNTIME_DRIVER_CORES=4 \
-    -e RUNTIME_DRIVER_MEMORY=1g \
-    -e SGX_DRIVER_MEM=32g \
-    -e SGX_DRIVER_JVM_MEM=8g \
-    -e SGX_EXECUTOR_MEM=32g \
-    -e SGX_EXECUTOR_JVM_MEM=12g \
-    -e SGX_ENABLED=true \
-    -e SGX_LOG_LEVEL=error \
-    -e SPARK_MODE=client \
-    -e LOCAL_IP=$LOCAL_IP \
-    $DOCKER_IMAGE bash
-```
-
-##### 2.2.3.3 Init the client and run Spark applications on K8s
-
-1. Run `docker exec -it spark-local-k8s-client bash` to enter the container. Then run the following command to init the Spark local K8s client.
-
-   ```bash
-   ./init.sh
-   ```
-
-2. We assume you have a working Network File System (NFS) configured for your Kubernetes cluster. Configure the `nfsvolumeclaim` on the last line to the name of the Persistent Volume Claim (PVC) of your NFS. Please prepare the following and put them in your NFS directory:
-
-   - The data (in a directory called `data`)
-   - The kubeconfig file.
-
-3. Run the following command to start Spark-Pi example. When the application runs in `cluster` mode, you can run ` kubectl get pod ` to get the name and status of your K8s pod(e.g., driver-xxxx). Then you can run ` kubectl logs -f driver-xxxx ` to get the output of your application.
-
-   ```bash
-   #!/bin/bash
-   secure_password=`openssl rsautl -inkey /ppml/trusted-big-data-ml/work/password/key.txt -decrypt </ppml/trusted-big-data-ml/work/password/output.bin` && \
-   export TF_MKL_ALLOC_MAX_BYTES=10737418240 && \
-     export SPARK_LOCAL_IP=$LOCAL_IP && \
-     /opt/jdk8/bin/java \
-       -cp '/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/*' \
-       -Xmx8g \
-       org.apache.spark.deploy.SparkSubmit \
-       --master $RUNTIME_SPARK_MASTER \
-       --deploy-mode $SPARK_MODE \
-       --name spark-pi-sgx \
-       --conf spark.driver.host=$SPARK_LOCAL_IP \
-       --conf spark.driver.port=$RUNTIME_DRIVER_PORT \
-       --conf spark.driver.memory=$RUNTIME_DRIVER_MEMORY \
-       --conf spark.driver.cores=$RUNTIME_DRIVER_CORES \
-       --conf spark.executor.cores=$RUNTIME_EXECUTOR_CORES \
-       --conf spark.executor.memory=$RUNTIME_EXECUTOR_MEMORY \
-       --conf spark.executor.instances=$RUNTIME_EXECUTOR_INSTANCES \
-       --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
-       --conf spark.kubernetes.container.image=$RUNTIME_K8S_SPARK_IMAGE \
-       --conf spark.kubernetes.driver.podTemplateFile=/ppml/trusted-big-data-ml/spark-driver-template.yaml \
-       --conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/spark-executor-template.yaml \
-       --conf spark.kubernetes.executor.deleteOnTermination=false \
-       --conf spark.network.timeout=10000000 \
-       --conf spark.executor.heartbeatInterval=10000000 \
-       --conf spark.python.use.daemon=false \
-       --conf spark.python.worker.reuse=false \
-       --conf spark.kubernetes.sgx.enabled=$SGX_ENABLED \
-       --conf spark.kubernetes.sgx.driver.mem=$SGX_DRIVER_MEM \
-       --conf spark.kubernetes.sgx.driver.jvm.mem=$SGX_DRIVER_JVM_MEM \
-       --conf spark.kubernetes.sgx.executor.mem=$SGX_EXECUTOR_MEM \
-       --conf spark.kubernetes.sgx.executor.jvm.mem=$SGX_EXECUTOR_JVM_MEM \
-       --conf spark.kubernetes.sgx.log.level=$SGX_LOG_LEVEL \
-       --conf spark.authenticate=true \
-       --conf spark.authenticate.secret=$secure_password \
-       --conf spark.kubernetes.executor.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
-       --conf spark.kubernetes.driver.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
-       --conf spark.authenticate.enableSaslEncryption=true \
-       --conf spark.network.crypto.enabled=true \
-       --conf spark.network.crypto.keyLength=128 \
-       --conf spark.network.crypto.keyFactoryAlgorithm=PBKDF2WithHmacSHA1 \
-       --conf spark.io.encryption.enabled=true \
-       --conf spark.io.encryption.keySizeBits=128 \
-       --conf spark.io.encryption.keygen.algorithm=HmacSHA1 \
-       --conf spark.ssl.enabled=true \
-       --conf spark.ssl.port=8043 \
-       --conf spark.ssl.keyPassword=$secure_password \
-       --conf spark.ssl.keyStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \
-       --conf spark.ssl.keyStorePassword=$secure_password \
-       --conf spark.ssl.keyStoreType=JKS \
-       --conf spark.ssl.trustStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \
-       --conf spark.ssl.trustStorePassword=$secure_password \
-       --conf spark.ssl.trustStoreType=JKS \
-       --class org.apache.spark.examples.SparkPi \
-       --verbose \
-       local:///ppml/trusted-big-data-ml/work/spark-3.1.2/examples/jars/spark-examples_2.12-3.1.2.jar 100 2>&1 | tee spark-pi-sgx-$SPARK_MODE.log
-   ```
-
-You can run your own Spark application after changing `--class` and jar path.
-
-1. `local:///ppml/trusted-big-data-ml/work/spark-3.1.2/examples/jars/spark-examples_2.12-3.1.2.jar` => `your_jar_path`
-2. `--class org.apache.spark.examples.SparkPi` => `--class your_class_path`
-
-### 2.3 Trusted Big Data Analytics and ML with Python
-
-#### 2.3.1 Prepare Docker Image
-
-Pull Docker image from Dockerhub
-
-```bash
-docker pull intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT
-```
-
-Alternatively, you can build Docker image from Dockerfile (this will take some time):
-
-```bash
-cd ppml/trusted-big-data-ml/python/docker-graphene
-./build-docker-image.sh
-```
-
-#### 2.3.2 Run Trusted Big Data and ML on Single Node
-
-##### 2.3.2.1 Start PPML Container
-
-Enter `BigDL/ppml/trusted-big-data-ml/python/docker-graphene` directory.
-
-1. Copy `keys` and `password` to the current directory
-
-   ```bash
-   cd ppml/trusted-big-data-ml/python/docker-graphene
-   # copy keys and password into the current directory
-   cp -r ../keys .
-   cp -r ../password .
-   ```
-
-2. To start the container, modify the paths in deploy-local-spark-sgx.sh, and then run the following commands:
-
-   ```bash
-   ./deploy-local-spark-sgx.sh
-   sudo docker exec -it spark-local bash
-   cd /ppml/trusted-big-data-ml
-   ./init.sh
-   ```
-
-##### 2.3.2.2 Run Your PySpark Applications with BigDL PPML on SGX
-
-To run your PySpark application, you need to prepare your PySpark application and put it under the trusted directory in SGX  `/ppml/trusted-big-data-ml/work`. Then run with `bigdl-ppml-submit.sh` using the command:
-
-```bash
-./bigdl-ppml-submit.sh work/YOUR_PROMGRAM.py | tee YOUR_PROGRAM-sgx.log
-```
-
-When the program finishes, check the results with the log `YOUR_PROGRAM-sgx.log`.
-
-##### 2.3.2.3 Run Python and PySpark Examples with BigDL PPML on SGX
-
-##### 2.3.2.3.1 Run Trusted Python Helloworld
-
-This example runs a simple native python program, which is an easy way to verify if the Trusted PPML environment is correctly set up.
-
-Run the script to run trusted Python Helloworld:
-
-```bash
-bash work/start-scripts/start-python-helloworld-sgx.sh
-```
-
-Open another terminal and check the log:
-
-```bash
-sudo docker exec -it spark-local cat /ppml/trusted-big-data-ml/test-helloworld-sgx.log | egrep "Hello World"
-```
-
-The result should look something like this:
-
-> Hello World
-
-##### 2.3.2.3.2 Run Trusted Python Numpy
-
-This example shows how to run trusted native python numpy.
-
-Run the script to run trusted Python Numpy:
-
-```bash
-bash work/start-scripts/start-python-numpy-sgx.sh
-```
-
-Open another terminal and check the log:
-
-```bash
-sudo docker exec -it spark-local cat /ppml/trusted-big-data-ml/test-numpy-sgx.log | egrep "numpy.dot"
-```
-
-The result should look something like this:
-
->  numpy.dot: 0.034211914986371994 sec
-
-##### 2.3.2.3.3 Run Trusted Spark Pi
-
-This example runs a simple Spark PI program.
-
-Run the script to run trusted Spark Pi:
-
-```bash
-bash work/start-scripts/start-spark-local-pi-sgx.sh
-```
-
-Open another terminal and check the log:
-
-```bash
-sudo docker exec -it spark-local cat /ppml/trusted-big-data-ml/test-pi-sgx.log | egrep "roughly"
-```
-
-The result should look something like this:
-
-> Pi is roughly 3.146760
-
-##### 2.3.2.3.4 Run Trusted Spark Wordcount
-
-This example runs a simple Spark Wordcount program.
-
-Run the script to run trusted Spark Wordcount:
-
-```bash
-bash work/start-scripts/start-spark-local-wordcount-sgx.sh
-```
-
-Open another terminal and check the log:
-
-```bash
-sudo docker exec -it spark-local cat /ppml/trusted-big-data-ml/test-wordcount-sgx.log | egrep "print"
-```
-
-The result should look something like this:
-
-> print("Hello: 1
->
-> print(sys.path);: 1
-
-##### 2.3.2.3.5 Run Trusted Spark SQL
-
-This example shows how to run trusted Spark SQL.
-
-First, make sure that the paths of resource in `/ppml/trusted-big-data-ml/work/spark-2.4.6/examples/src/main/python/sql/basic.py` are the same as the paths of `people.json`  and `people.txt`.
-
-Run the script to run trusted Spark SQL:
-
-```bash
-bash work/start-scripts/start-spark-local-sql-sgx.sh
-```
-
-Open another terminal and check the log:
-
-```bash
-sudo docker exec -it spark-local cat /ppml/trusted-big-data-ml/test-sql-basic-sgx.log | egrep "Justin"
-```
-
-The result should look something like this:
-
->| 19| Justin|
->
->| Justin|
->
->| Justin| 20|
->
->| 19| Justin|
->
->| 19| Justin|
->
->| 19| Justin|
->
->Name: Justin
->
->| Justin|
-
-##### 2.3.2.3.6 Run Trusted Spark BigDL
-
-This example shows how to run trusted Spark BigDL.
-
-Run the script to run trusted Spark BigDL and it would take some time to show the final results:
-
-```bash
-bash work/start-scripts/start-spark-local-bigdl-sgx.sh
-```
-
-Open another terminal and check the log:
-
-```bash
-sudo docker exec -it spark-local cat /ppml/trusted-big-data-ml/test-bigdl-lenet-sgx.log | egrep "Accuracy"
-```
-
-The result should look something like this:
-
-> creating: createTop1Accuracy
->
-> 2021-06-18 01:39:45 INFO DistriOptimizer$:180 - [Epoch 1 60032/60000][Iteration 469][Wall Clock 457.926565s] Top1Accuracy is Accuracy(correct: 9488, count: 10000, accuracy: 0.9488)
->
-> 2021-06-18 01:46:20 INFO DistriOptimizer$:180 - [Epoch 2 60032/60000][Iteration 938][Wall Clock 845.747782s] Top1Accuracy is Accuracy(correct: 9696, count: 10000, accuracy: 0.9696)
-
-##### 2.3.2.3.7 Run Trusted Spark Orca Data
-
-This example shows how to run trusted Spark Orca Data.
-
-Before running the example, download the NYC Taxi dataset in Numenta Anomaly Benchmark from [here](https://raw.githubusercontent.com/numenta/NAB/master/data/realKnownCause/nyc_taxi.csv) for demo. After downloading the dataset, make sure that `nyc_taxi.csv` is under `work/data` directory or the same path in the `start-spark-local-orca-data-sgx.sh`. Replace  `path_of_nyc_taxi_csv` with your path of `nyc_taxi.csv` in the script.
-
-Run the script to run trusted Spark Orca Data and it would take some time to show the final results:
-
-```bash
-bash start-spark-local-orca-data-sgx.sh
-```
-
-Open another terminal and check the log:
-
-```bash
-sudo docker exec -it spark-local cat /ppml/trusted-big-data-ml/test-orca-data-sgx.log | egrep -a "INFO data|Stopping" -A10
-```
-
-The result should contain the content look like this:
-
->INFO data collected: [        timestamp value
->
->0   2014-07-01 00:00:00 10844
->
->1   2014-07-01 00:30:00  8127
->
->2   2014-07-01 01:00:00  6210
->
->3   2014-07-01 01:30:00  4656
->
->4   2014-07-01 02:00:00  3820
->
->...          ...  ...
->
->10315 2015-01-31 21:30:00 24670
->
->10316 2015-01-31 22:00:00 25721
->
->10317 2015-01-31 22:30:00 27309
->
->10318 2015-01-31 23:00:00 26591
->
->\--
->
->INFO data2 collected: [        timestamp value      datetime hours awake
->
->0  2014-07-01 00:00:00 10844 2014-07-01 00:00:00   0   1
->
->1  2014-07-01 00:30:00  8127 2014-07-01 00:30:00   0   1
->
->2  2014-07-01 03:00:00  2369 2014-07-01 03:00:00   3   0
->
->3  2014-07-01 04:30:00  2158 2014-07-01 04:30:00   4   0
->
->4  2014-07-01 05:00:00  2515 2014-07-01 05:00:00   5   0
->
->...         ...  ...         ...  ...  ...
->
->5215 2015-01-31 17:30:00 23595 2015-01-31 17:30:00   17   1
->
->5216 2015-01-31 18:30:00 27286 2015-01-31 18:30:00   18   1
->
->5217 2015-01-31 19:00:00 28804 2015-01-31 19:00:00   19   1
->
->5218 2015-01-31 19:30:00 27773 2015-01-31 19:30:00   19   1
->
->\--
->
->Stopping orca context
-
-##### 2.3.2.3.8 Run Trusted Spark Orca Tensorflow Text Classification
-
-This example shows how to run Trusted Spark Orca Tensorflow text classification.
-
-Run the script to run Trusted Spark Orca Tensorflow text classification and it would take some time to show the final results. To run this example in standalone mode, replace `-e SGX_MEM_SIZE=32G \` with `-e SGX_MEM_SIZE=64G \` in `start-distributed-spark-driver.sh`
-
-```bash
-bash start-spark-local-orca-tf-text.sh
-```
-
-Open another terminal and check the log:
-
-```bash
-sudo docker exec -it spark-local cat test-orca-tf-text.log | egrep "results"
-```
-
-The result should be similar to:
-
->INFO results: {'loss': 0.6932533979415894, 'acc Top1Accuracy': 0.7544000148773193}
-
-#### 2.3.3 Run Trusted Big Data and ML on Cluster
-
-##### 2.3.3.1 Configure the Environment
-
-Prerequisite: [no password ssh login](http://www.linuxproblem.org/art_9.html) to all the nodes needs to be properly set up first.
-
-```bash
-nano environments.sh
-```
-
-##### 2.3.3.2 Start Distributed Big Data and ML Platform
-
-First, run the following command to start the service:
-
-```bash
-./deploy-distributed-standalone-spark.sh
-```
-
-Then start the service:
-
-```bash
-./start-distributed-spark-driver.sh
-```
-
-After that, you can run previous examples on the cluster by replacing `--master 'local[4]'` in the start scripts with
-
-```bash
---master 'spark://your_master_url' \
---conf spark.authenticate=true \
---conf spark.authenticate.secret=your_secret_key \
-```
-
-##### 2.3.3.3 Stop Distributed Big Data and ML Platform
-
-First, stop the training:
-
-```bash
-./stop-distributed-standalone-spark.sh
-```
-
-Then stop the service:
-
-```bash
-./undeploy-distributed-standalone-spark.sh
-```
-
-## 3. Trusted Realtime Compute and ML
-
-With the Trusted Realtime Compute and ML/DL support, users can run standard Flink stream processing and distributed DL model inference (using Cluster Serving in a secure and trusted fashion. In this feature, both Graphene and Occlum are supported, users can choose one of them as LibOS layer.
-
-### 3.1 Prerequisite
-
-Please refer to [Section 2.1 Prerequisite](#prerequisite). For the Occlum backend, if your kernel version is below 5.11, please install enable_rdfsbase from [here](https://github.com/occlum/enable_rdfsbase).
-
-### 3.2 Prepare Docker Image
-
-Pull Docker image from Dockerhub
-
-```bash
-# For Graphene
-docker pull intelanalytics/bigdl-ppml-trusted-realtime-ml-scala-graphene:2.1.0-SNAPSHOT
-```
-
-```bash
-# For Occlum
-docker pull intelanalytics/bigdl-ppml-trusted-realtime-ml-scala-occlum:2.1.0-SNAPSHOT
-```
-
-Also, you can build Docker image from Dockerfile (this will take some time).
-
-```bash
-# For Graphene
-cd ppml/trusted-realtime-ml/scala/docker-graphene
-./build-docker-image.sh
-```
-
-```bash
-# For Occlum
-cd ppml/trusted-realtime-ml/scala/docker-occlum
-./build-docker-image.sh
-```
-
-### 3.3 Run Trusted Realtime Compute and ML
-
-#### 3.3.1 Configure the Environment
-
-Enter `BigDL/ppml/trusted-realtime-ml/scala/docker-graphene` or `BigDL/ppml/trusted-realtime-ml/scala/docker-occlum` dir.
-
-Modify `environments.sh`. Change MASTER, WORKER IP and file paths (e.g., `keys` and `password`).
-
-```bash
-nano environments.sh
-```
-
-#### 3.3.2 Start the service
-
-Start Flink service:
-
-```bash
-./deploy-flink.sh
-```
-
-#### 3.3.3 Run Trusted Flink Program
-
-Submit Flink jobs:
-
-```bash
-cd ${FLINK_HOME}
-./bin/flink run ./examples/batch/WordCount.jar
-```
-
-If Jobmanager is not running on the current node, please add `-m ${FLINK_JOB_MANAGER_IP}`.
-
-The result should look like this:
-
-```bash
-(a,5)    
-(action,1) 
-(after,1)
-(against,1)  
-(all,2) 
-(and,12) 
-(arms,1)   
-(arrows,1)  
-(awry,1)   
-(ay,1)    
-(bare,1)  
-(be,4)      
-(bear,3)      
-(bodkin,1) 
-(bourn,1)  
-```
-
-#### 3.3.4 Run Trusted Cluster Serving
-
-Start Cluster Serving as follows:
-
-```bash
-./start-local-cluster-serving.sh
-```
-
-After all cluster serving services are ready, you can directly push inference requests into the queue with [Restful API](https://analytics-zoo.github.io/master/#ClusterServingGuide/ProgrammingGuide/#restful-api). Also, you can push image/input into the queue with Python API
-
-```python
-from bigdl.serving.client import InputQueue
-input_api = InputQueue()
-input_api.enqueue('my-image1', user_define_key={"path": 'path/to/image1'})
-```
-
-Cluster Serving service is a long-running service in containers, you can stop it as follows:
-
-```bash
-docker stop trusted-cluster-serving-local
-```
diff --git a/docs/readthedocs/source/doc/PPML/Overview/quicktour.md b/docs/readthedocs/source/doc/PPML/Overview/quicktour.md
deleted file mode 100644
index ab6fb427..00000000
--- a/docs/readthedocs/source/doc/PPML/Overview/quicktour.md
+++ /dev/null
@@ -1,92 +0,0 @@
-# A Hello World Example
-
-
-In this section, you can get started with running a simple native python HelloWorld program and a simple native Spark Pi program locally in a BigDL PPML client container to get an initial understanding of the usage of ppml.
-
-
-
-## a. Prepare Keys
-
-* generate ssl_key
-
-  Download scripts from [here](https://github.com/intel-analytics/BigDL).
-
-  ```
-  cd BigDL/ppml/
-  sudo bash scripts/generate-keys.sh
-  ```
-  This script will generate keys under keys/ folder
-
-* generate enclave-key.pem
-
-  ```
-  openssl genrsa -3 -out enclave-key.pem 3072
-  ```
-  This script generates a file enclave-key.pem which is used to sign image.
-
-
-## b. Start the BigDL PPML client container
-
-```
-#!/bin/bash
-
-# ENCLAVE_KEY_PATH means the absolute path to the "enclave-key.pem" in step a
-# KEYS_PATH means the absolute path to the keys folder in step a
-# LOCAL_IP means your local IP address.
-export ENCLAVE_KEY_PATH=YOUR_LOCAL_ENCLAVE_KEY_PATH
-export KEYS_PATH=YOUR_LOCAL_KEYS_PATH
-export LOCAL_IP=YOUR_LOCAL_IP
-export DOCKER_IMAGE=intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:devel
-
-sudo docker pull $DOCKER_IMAGE
-
-sudo docker run -itd \
-    --privileged \
-    --net=host \
-    --cpuset-cpus="0-5" \
-    --oom-kill-disable \
-    --device=/dev/gsgx \
-    --device=/dev/sgx/enclave \
-    --device=/dev/sgx/provision \
-    -v $ENCLAVE_KEY_PATH:/graphene/Pal/src/host/Linux-SGX/signer/enclave-key.pem \
-    -v /var/run/aesmd/aesm.socket:/var/run/aesmd/aesm.socket \
-    -v $KEYS_PATH:/ppml/trusted-big-data-ml/work/keys \
-    --name=bigdl-ppml-client-local \
-    -e LOCAL_IP=$LOCAL_IP \
-    -e SGX_MEM_SIZE=64G \
-    $DOCKER_IMAGE bash
-```
-
-## c. Run Python HelloWorld in BigDL PPML Client Container
-
-Run the [script](https://github.com/intel-analytics/BigDL/blob/main/ppml/trusted-big-data-ml/python/docker-graphene/start-scripts/start-python-helloworld-sgx.sh) to run trusted [Python HelloWorld](https://github.com/intel-analytics/BigDL/blob/main/ppml/trusted-big-data-ml/python/docker-graphene/examples/helloworld.py) in BigDL PPML client container:
-```
-sudo docker exec -it bigdl-ppml-client-local bash work/start-scripts/start-python-helloworld-sgx.sh
-```
-Check the log:
-```
-sudo docker exec -it bigdl-ppml-client-local cat /ppml/trusted-big-data-ml/test-helloworld-sgx.log | egrep "Hello World"
-```
-The result should look something like this:
-> Hello World
-
-
-## d. Run Spark Pi in BigDL PPML Client Container
-
-Run the [script](https://github.com/intel-analytics/BigDL/blob/main/ppml/trusted-big-data-ml/python/docker-graphene/start-scripts/start-spark-local-pi-sgx.sh) to run trusted [Spark Pi](https://github.com/apache/spark/blob/v3.1.2/examples/src/main/python/pi.py) in BigDL PPML client container:
-
-```bash
-sudo docker exec -it bigdl-ppml-client-local bash work/start-scripts/start-spark-local-pi-sgx.sh
-```
-
-Check the log:
-
-```bash
-sudo docker exec -it bigdl-ppml-client-local cat /ppml/trusted-big-data-ml/test-pi-sgx.log | egrep "roughly"
-```
-
-The result should look something like this:
-
-> Pi is roughly 3.146760
-
-<br />
diff --git a/docs/readthedocs/source/doc/PPML/Overview/secure_lightgbm_on_spark.md b/docs/readthedocs/source/doc/PPML/Overview/secure_lightgbm_on_spark.md
deleted file mode 100644
index be21260f..00000000
--- a/docs/readthedocs/source/doc/PPML/Overview/secure_lightgbm_on_spark.md
+++ /dev/null
@@ -1,123 +0,0 @@
-# Secure LightGBM on Spark
-
-
-we provide an option to combine the two ML kits (SparkML and LightGBM), that seamlessly runs LighGBM applications on existing Spark cluster.
-
-
-In such a scenario, LightGBM will utilize DataFrame etc. distribution abstractions to read and process big datasets in parallel, and ML pipeline etc. tools to do preprocessing and feature engineering efficiently. Meanwhile, Intel SGX, Gramine/Occlum LibOS, Key Management Service, and SSL/TLS etc. security tools are applied to protect key steps in cluster computing, such as parameter synchronization in training, model and data storage, and container runtime.
-
-
-The Spark and LightGBM dependencies have already been installed in the custom image prepared in previous steps. For Gramine user, please use [trusted-machine-learning image](https://github.com/intel-analytics/BigDL/tree/main/ppml/trusted-machine-learning#gramine-machine-learning-toolkit), and [trusted-big-data-ml occlum](https://github.com/intel-analytics/BigDL/tree/main/ppml/trusted-big-data-ml/scala/docker-occlum#trusted-big-data-ml-with-occlum) for Occlum user.
-
-
-## End-to-End LightGBM Fitting and Predication on Spark
-
-
-Here, We illustrate the progress with a Pyspark demo, and Scala is also supported. 
-
-### 1. Overall
-
-
-- In the following example, a **PPMLContext** (entry for kinds of distributed APIs) is initialized first, and it will read CSV-ciphertext dataset with a schema specified in code, where encrypted data will be decrypted automatically and load into memory as DataFrame.
-
-
-- Next, `transform` etc. APIs provided by **SparkML** kit are applied to do preprocessing like feature transformation and dataset splitting.
-
-
-- Then, processed dataframe is feeded to **LightGBMClassifier**, and a training is invoked by `fit`.
-
-
-- Finally, trained classification model is saved in ciphertext on disk, and we demonstrate that by loading the encrypted model into memory (and decrypted automatically) and using the reloaded model to predict on test set. The whole encryption/decryption process here applies the key specified by user configurations when submitting this Spark job.
-
-
-For full-link protection, follow [here](https://github.com/intel-analytics/BigDL/tree/main/ppml#41-create-ppmlcontext) to deploy a KMS (Key Management Service) where you have many kinds of implementation type to choose, and generate a primary key firstly (the below uses `SimpleKeyManagementService`).
-
-
-Next, before start training, download dataset [here](https://github.com/intel-analytics/BigDL/tree/main/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/example/nnframes/lightGBM#uci-irisdata).
-
-
-### 2. Start Pyspark Example
-
-
-Moving on, there is an application to fit a LightGBM classification model, and save the trained model in ciphertext, and then reload the encrypted model to predict. The source code can be seen [here](https://github.com/intel-analytics/BigDL/blob/main/python/ppml/example/lightgbm/encrypted_lightgbm_model_io.py), and you can follow the APIs to write your own privacy-preserving applications:
-
-
-```python
-sc = PPMLContext.init(...)
-model = an instance of LightGBMClassficationModel/LightGBMRegressionModel/LightGBMRankerModel
-
-# save trained model to file
-sc.saveLightGBMModel(
-    lightgbm_model = model,
-    path = ...,
-    crypto_mode = "PLAIN_TEXT" / "AES/CBC/PKCS5Padding"
-)
-
-# load model from file
-classficationModel = sc.loadLightGBMClassificationModel(
-    model_path = ...,
-    crypto_mode = "PLAIN_TEXT" / "AES/CBC/PKCS5Padding")
-
-regressionModel = sc.loadLightGBMRegressionModel(...)
-
-rankerModel = sc.loadLightGBMRankerModel(...)
-```
-
-
-**Mechanism:** BigDL PPML extract `Boosters` inside LightGBM models, serially convert them to `DataFrames` on Spark JVM, and encrypt them transparently through `codec in Hadoop IO compression`. The decryption is the deserialization process against it.
-
-
-Now, it is time to **submit the spark job and start the LightGBM application**:
-
-
-```bash
-java \
--cp "${SPARK_HOME}/conf/:${SPARK_HOME}/jars/*" \
--Xmx512m \
-org.apache.spark.deploy.SparkSubmit \
-/ppml/examples/encrypted_lightgbm_model_io.py \
---app_id <your_simple_app_id> \
---api_key <your_simple_api_key> \
---primary_key_material <your_encrypted_primary_key_file_from_simple_kms> \
---input_path <path_to_iris.csv> \
---output_path <model_save_path> \
---output_encrypt_mode AES/CBC/PKCS5Padding \
---input_encrypt_mode PLAIN_TEXT
-```
-
-
-Parameter `--output_encrypt_mode` means how you want to save the trained model, and `--input_encrypt_mode` is the status of input dataset. Finally, you will get predications output from Spark driver, and find an encrypted classification model file saved on disk.
-
-### 3. Start Scala Example
-
-You can also submit a similar [Scala example](https://github.com/intel-analytics/BigDL/blob/main/scala/ppml/src/main/scala/com/intel/analytics/bigdl/ppml/examples/EncryptedLightGBMModelIO.scala), which has the same logic as the Pyspark one, using [PPML CLI](https://github.com/intel-analytics/BigDL/blob/main/ppml/docs/submit_job.md#ppml-cli) like below:
-
-```shell
-bash bigdl-ppml-submit.sh \
- --master local[2] \
- --sgx-enabled false \
- --driver-memory 16g \
- --driver-cores 1 \
- --executor-memory 16g \
- --executor-cores 2 \
- --num-executors 8 \
- --conf spark.cores.max=8 \
- --conf spark.network.timeout=10000000 \
- --conf spark.executor.heartbeatInterval=10000000 \
- --conf spark.hadoop.io.compression.codecs="com.intel.analytics.bigdl.ppml.crypto.CryptoCodec" \
- --conf spark.bigdl.primaryKey.defaultPK.plainText=<a_base64_256b_AES_key_string> \
- --class com.intel.analytics.bigdl.ppml.examples.EncryptedLightGBMModelIO \
- --jars ${BIGDL_HOME}/jars/bigdl-ppml-spark_${SPARK_VERSION}-${BIGDL_VERSION}.jar \
- local://${BIGDL_HOME}/jars/bigdl-ppml-spark_${SPARK_VERSION}-${BIGDL_VERSION}.jar \
-<path_to_iris.csv>
-```
-
-For demo purpose, we directly apply a plaintext data key `spark.bigdl.primaryKey.defaultPK.plainText`, you can simply generate such a string by:
-
-```shell
-openssl enc -aes-256-cbc -k secret -P -md sha1
-# you will get a key, and copy it to below field
-echo <key_generated_above> | base64
-```
-
-Otherwise, only more safe key configurations are allowed in production environment, and please refer to [advanced Crypto in PPMLContext](https://github.com/intel-analytics/BigDL/tree/main/ppml#configurations-of-key-and-kms-in-ppmlcontext).
diff --git a/docs/readthedocs/source/doc/PPML/Overview/trusted_big_data_analytics_and_ml.md b/docs/readthedocs/source/doc/PPML/Overview/trusted_big_data_analytics_and_ml.md
deleted file mode 100644
index 4a443a77..00000000
--- a/docs/readthedocs/source/doc/PPML/Overview/trusted_big_data_analytics_and_ml.md
+++ /dev/null
@@ -1,30 +0,0 @@
-# Trusted Big Data Analytics and ML
-
-Artificial intelligence on big data is increasingly important to many real-world applications. Many machine learning and data analytics applications are benefiting from the private data in different domains. Most of these applications leverage the private data to offer certain valuable services to the users. But the private data could be repurposed to infer sensitive information, which would jeopardize the privacy of individuals. Privacy-Preserving Machine Learning (PPML) helps address these risks. Using techniques such as cryptography differential privacy, and hardware technologies, PPML aims to protect the privacy of sensitive user data and of the trained model as it performs ML tasks.
-
-BigDL helps to build PPML applications (including big data analytics, machine learning, and cluster serving etc) on top of Intel® SGX Software Guard Extensions (Intel® SGX) and library OSes such as Graphene and Occlum. In the current release, two types of trusted Big Data AI applications are supported:
-
-1. Big Data analytics and ML/DL (supporting [Apache Spark](https://spark.apache.org/) and [BigDL](https://github.com/intel-analytics/BigDL))
-2. Realtime compute and ML/DL (supporting [Apache Flink](https://flink.apache.org/) and BigDL [Cluster Serving](https://www.usenix.org/conference/opml20/presentation/song))
-
-## [1. Trusted Big Data ML](https://github.com/intel-analytics/BigDL/tree/main/ppml/trusted-big-data-ml)
-
-With trusted Big Data analytics and ML/DL support, users can run standard Spark data analysis (such as Spark SQL, Dataframe, MLlib, etc.) and distributed deep learning (using BigDL) in a secure and trusted fashion.
-
-## [2. Trusted Real Time ML](https://github.com/intel-analytics/BigDL/tree/main/ppml/trusted-realtime-ml/scala)
-
-With the trusted real time compute and ML/DL support, users can run standard Flink stream processing and distributed DL model inference (using Cluster Serving) in a secure and trusted fashion.
-
-## 3. Intel SGX and LibOS
-
-### [Intel® SGX](https://software.intel.com/content/www/us/en/develop/topics/software-guard-extensions.html)
-
-Intel® SGX runs on Intel’s Trusted Execution Environment (TEE), offering hardware-based memory encryption that isolates specific application code and data in memory. Intel® SGX enables user-level code to allocate private regions of memory, called enclaves, which are designed to be protected from processes running at higher privilege levels.
-
-### [Graphene-SGX](https://github.com/oscarlab/graphene)
-
-Graphene is a lightweight guest OS, designed to run a single application with minimal host requirements. Graphene can run applications in an isolated environment with benefits comparable to running a complete OS in a virtual machine -- including guest customization, ease of porting to different OSes, and process migration. Graphene supports native, unmodified Linux applications on any platform. Currently, Graphene runs on Linux and Intel SGX enclaves on Linux platforms. With Intel SGX support, Graphene can secure a critical application in a hardware-encrypted memory region. Graphene can protect applications from a malicious system stack with minimal porting effort.
-
-### [Occlum](https://github.com/occlum/occlum)
-
-Occlum is a memory-safe, multi-process library OS (LibOS) for Intel SGX. As a LibOS, it enables legacy applications to run on SGX with little or even no modifications of source code, thus protecting the confidentiality and integrity of user workloads transparently.
diff --git a/docs/readthedocs/source/doc/PPML/Overview/trusted_fl.md b/docs/readthedocs/source/doc/PPML/Overview/trusted_fl.md
deleted file mode 100644
index ce3491a0..00000000
--- a/docs/readthedocs/source/doc/PPML/Overview/trusted_fl.md
+++ /dev/null
@@ -1,149 +0,0 @@
-# Trusted FL (Federated Learning)
-
-[Federated Learning](https://en.wikipedia.org/wiki/Federated_learning) is a new tool in PPML (Privacy Preserving Machine Learning), which empowers multi-parities to build a united model across different parties without compromising privacy, even if these parties have different datasets or features. In FL training stage, sensitive data will be kept locally, and only temp gradients or weights will be safely aggregated by a trusted third-party. In our design, this trusted third-parity is fully protected by Intel SGX.
-
-A number of FL tools or frameworks have been proposed to enable FL in different areas, i.e., OpenFL, TensorFlow Federated, FATE, Flower and PySyft etc. However, none of them is designed for Big Data scenarios. To enable FL in big data ecosystem, BigDL PPML provides a SGX-based End-to-end Trusted FL platform. With this platform, data scientists and developers can easily setup FL applications upon distributed large-scale datasets with a few clicks. To achieve this goal, we provide the following features:
-
- * ID & feature align: figure out portions of local data that will participate in the training stage
- * Horizontal FL: training across multi-parties with the same features and different entities
- * Vertical FL: training across multi-parties with the same entries and different features.
-
-To ensure sensitive data are fully protected in the training and inference stages, we make sure:
-
- * Sensitive data and weights are kept local, only temp gradients or weights will be safely aggregated by a trusted third-party
- * Trusted third-party, i.e., FL Server, is protected by SGX Enclaves
- * Local training environment is protected by SGX Enclaves (recommended but not enforced)
- * Network communication and Storage (e.g., data and model) protected by encryption and Transport Layer Security (TLS)](https://en.wikipedia.org/wiki/Transport_Layer_Security)
-
-That is, even when the program runs in an untrusted cloud environment, all the data and models are protected (e.g., using encryption) on disk and network, and the compute and memory are also protected using SGX Enclaves.
-
-## Prerequisite
-
-Please ensure SGX is properly enabled, and SGX driver is installed. If not, please refer to the [Install SGX Driver](https://bigdl.readthedocs.io/en/latest/doc/PPML/Overview/ppml.html#prerequisite).
-
-### Prepare Keys & Dataset
-
-1. Generate the signing key for SGX Enclaves
-
-   Generate the enclave key using the command below, keep it safely for future remote attestations and to start SGX Enclaves more securely. It will generate a file `enclave-key.pem` in the current working directory, which will be the enclave key. To store the key elsewhere, modify the output file path.
-
-    ```bash
-    cd scripts/
-    openssl genrsa -3 -out enclave-key.pem 3072
-    cd ..
-    ```
-
-    Then modify `ENCLAVE_KEY_PATH` in `deploy_fl_container.sh` with your path to `enclave-key.pem`.
-
-2. Prepare keys for TLS with root permission (test only, need input security password for keys). Please also install JDK/OpenJDK and set the environment path of the java path to get `keytool`.
-
-    ```bash
-    cd scripts/
-    ./generate-keys.sh
-    cd ..
-    ```
-
-    When entering the passphrase or password, you could input the same password by yourself; and these passwords could also be used for the next step of generating other passwords. Password should be longer than 6 bits and contain numbers and letters, and one sample password is "3456abcd". These passwords would be used for future remote attestations and to start SGX enclaves more securely. And This script will generate 6 files in `./ppml/scripts/keys` dir (you can replace them with your own TLS keys).
-
-    ```bash
-    keystore.jks
-    keystore.pkcs12
-    server.crt
-    server.csr
-    server.key
-    server.pem
-    ```
-
-    If run in container, please modify `KEYS_PATH` to `keys/` you generated in last step in `deploy_fl_container.sh`. This dir will mount to container's `/ppml/trusted-big-data-ml/work/keys`, then modify the `privateKeyFilePath` and `certChainFilePath` in `ppml-conf.yaml` with container's absolute path. If not in container, just modify the `privateKeyFilePath` and `certChainFilePath` in `ppml-conf.yaml` with your local path. If you don't want to build tls channel with certificate, just delete the `privateKeyFilePath` and `certChainFilePath` in `ppml-conf.yaml`.
-
-3. Prepare dataset for FL training. For demo purposes, we have added a public dataset in [BigDL PPML Demo data](https://github.com/intel-analytics/BigDL/tree/main/scala/ppml/demo/data). Please download these data into your local machine. Then modify `DATA_PATH` to `./data` with absolute path in your machine and your local ip in `deploy_fl_container.sh`. The `./data` path will mount to container's `/ppml/trusted-big-data-ml/work/data`, so if you don't run in container, you need to modify the data path in `runH_VflClient1_2.sh`.
-
-### Prepare Docker Image
-
-Pull image from Dockerhub
-
-```bash
-docker pull intelanalytics/bigdl-ppml-trusted-fl-graphene:2.1.0-SNAPSHOT
-```
-
-If Dockerhub is not accessible, you can build docker image. Modify your `http_proxy` in `build-image.sh` then run:
-
-```bash
-./build-image.sh
-```
-
-## Start FLServer
-
-Before starting any local training client or worker, we need to start a Trusted third-parity, i.e., FL Server, for secure aggregation. In our design, this FL Server is running in SGX with help of Graphene or Occlum. Local workers/Clients can verify its integrity with SGX Remote Attestation.
-
-Running this command will start a docker container and initialize the SGX environment.
-
-```bash
-bash deploy_fl_container.sh
-sudo docker exec -it flDemo bash
-./init.sh
-```
-
-In container, run:
-
-```bash
-./runFlServer.sh
-```
-
-The fl-server will start and listen on 8980 port. Both horizontal fl-demo and vertical fl-demo need two clients. You can change the listening port and client number by editing `BigDL/scala/ppml/demo/ppml-conf.yaml`'s `serverPort` and `clientNum`.  
-
-Note that we skip ID & Feature for simplifying demo. In practice, before we start Federated Learning, we need to align ID & Feature, and figure out portions of local data that will participate in later training stages. In horizontal FL, feature alignment is required to ensure each party is training on the same features. In vertical FL, both ID and feature alignment are required to ensure each party training on different features of the same record.
-
-## HFL Logistic Regression
-
-Open two new terminals, run:
-
-```bash
-sudo docker exec -it flDemo bash
-```
-
-to enter the container, then in a terminal run:
-
-```bash
-./runHflClient1.sh
-```
-
-in another terminal run:
-
-```bash
-./runHflClient2.sh
-```
-
-Then we start two horizontal fl-clients to cooperate in training a model.
-
-## VFL Logistic Regression
-
-Open two new terminals, run:
-
-```bash
-sudo docker exec -it flDemo bash
-```
-
-to enter the container, then in a terminal run:
-
-```bash
-./runVflClient1.sh
-```
-
-in another terminal run:
-
-```bash
-./runVflClient2.sh
-```
-
-Then we start two vertical fl-clients to cooperate in training a model.
-
-## References
-
-1. [Intel SGX](https://software.intel.com/content/www/us/en/develop/topics/software-guard-extensions.html)
-2. Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated Machine Learning: Concept and Applications. ACM Trans. Intell. Syst. Technol. 10, 2, Article 12 (February 2019), 19 pages. DOI:https://doi.org/10.1145/3298981
-3. [Federated Learning](https://en.wikipedia.org/wiki/Federated_learning)
-4. [TensorFlow Federated](https://www.tensorflow.org/federated)
-5. [FATE](https://github.com/FederatedAI/FATE)
-6. [PySyft](https://github.com/OpenMined/PySyft)
-7. [Federated XGBoost](https://github.com/mc2-project/federated-xgboost)
diff --git a/docs/readthedocs/source/doc/PPML/QuickStart/deploy_intel_sgx_device_plugin_for_kubernetes.md b/docs/readthedocs/source/doc/PPML/QuickStart/deploy_intel_sgx_device_plugin_for_kubernetes.md
deleted file mode 100644
index 665a6640..00000000
--- a/docs/readthedocs/source/doc/PPML/QuickStart/deploy_intel_sgx_device_plugin_for_kubernetes.md
+++ /dev/null
@@ -1,27 +0,0 @@
-# Deploy the Intel SGX Device Plugin for Kubernetes
-
-The instructions in this section are modified from the [Intel SGX Device Plugin homepage][intelSGX], to which please refer should questions arise.
-
-## Prerequisites
-Prerequisites for building and running these device plugins include:
-- Appropriate hardware. ([3rd Gen Intel Xeon Scalable Processors][GIXSP])
-- A fully configured Kubernetes cluster
-- A working Go environment, of at least version v1.16
-
-Here we would want to deploy the plugin as a DaemonSet, so pull the [source code][pluginCode]. In the working directory, compile with 
-``` bash
-make intel-sgx-plugin
-make intel-sgx-initcontainer
-```
-Deploy the DaemonSet with
-```bash
-kubectl apply -k deployments/sgx_plugin/overlays/epc-register/
-```
-Verify with (replace the `<node name>` with your own node name)
-```
-kubectl describe node <node name> | grep sgx.intel.com
-```
-
-[intelSGX]: https://intel.github.io/intel-device-plugins-for-kubernetes/cmd/sgx_plugin/README.html
-[GIXSP]: https://www.intel.com/content/www/us/en/products/docs/processors/xeon/3rd-gen-xeon-scalable-processors-brief.html
-[pluginCode]: https://github.com/intel/intel-device-plugins-for-kubernetes
diff --git a/docs/readthedocs/source/doc/PPML/QuickStart/deploy_ppml_in_production.md b/docs/readthedocs/source/doc/PPML/QuickStart/deploy_ppml_in_production.md
deleted file mode 100644
index a724ba92..00000000
--- a/docs/readthedocs/source/doc/PPML/QuickStart/deploy_ppml_in_production.md
+++ /dev/null
@@ -1,91 +0,0 @@
-# Deploy PPML (Privacy Preserving Machine Learning) Applications in the Production Environment
-
-PPML applications built on Intel SGX (Software Guard Extensions) are quite different from normal machine learning applications during deployment. More specifically, user applications are packaged with BigDL, Spark and LibOS etc into SGX enclave. This SGX enclave is runnable in Intel SGX.
-
-![](../images/ppml_sgx_enclave.png)
-
-However, Intel SGX requires applications (enclave) to be signed by a user-specified key, i.e., `enclave-key`. This requirement helps SGX applications ensure their integrity and build trust with attestation. However, it also separates PPML deployment into 2 stages:
-
-1. Stage 1: Test & Development with BigDL PPML. This stage focuses on functionality and performance. Users/customers can use a randomly generated key or a user-specified key for signing, development and testing.
-2. Stage 2: Build & Deployment. This stage focuses on safety and security. That means we have to separate signing out of deployment.
-  * Build & sign applications with `enclave-key` in a secured environment. * Note that `enclave-key` is only involved in this sub-stage. *
-  * Deploy applications in the production environment.
-
-![](../images/ppml_scope.png)
-
-Due to security and privacy considerations (e.g., `enclave-key` security), only stage 1 is fully covered by BigDL PPML image. Customers/users need to handle Stage 2 carefully by themselves, especially when they are building their applications with `enclave-key`. Because `enclave-key` is related to `MRENCLAVE` and `MRSIGNER`. When setting up SGX attestation for integrity, you need to verify MRENCLAVE or MRSIGNER.
-
-* MRENCLAVE, i.e., Enclave Identity. MRENCLAVE uniquely identifies any particular enclave, so using the Enclave Identity will restrict access to the sealed data only to instances of that enclave.
-* MRSIGNER, i.e., Signing Identity. MRSIGNER will be the same for all enclaves signed with the same authority.
-
-You can find more details in [Intel SGX Developer Guide](https://download.01.org/intel-sgx/linux-1.5/docs/Intel_SGX_Developer_Guide.pdf).
-
-
-```eval_rst
-.. mermaid::
-
-   graph LR
-      subgraph SGX enclave
-      MRENCLAVE(fa:fa-file-signature MRENCLAVE)
-      MRSIGNER(fa:fa-file-signature MRSIGNER)
-      end
-      subgraph enclave-key
-      private_key(fa:fa-key private key)
-      public_key(fa:fa-key public key)
-      end
-      private_key --> MRENCLAVE
-      ppml_application(PPML Applicaiton) --> MRENCLAVE
-      public_key --> MRSIGNER
-```
-
-In this guide, we will demonstrate how to go through these 2 stages step by step.
-
-## 0. Prerequisite
-
-* Intel Xeon Server with SGX enabled. You can find more details in [Install SGX Driver for Xeon Server](https://bigdl.readthedocs.io/en/latest/doc/PPML/QuickStart/install_sgx_driver.html).
-* Docker & Kubernetes.
-* BigDL PPML image, e.g., `intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene` or `intelanalytics/bigdl-ppml-trusted-big-data-ml-scala-occlum`. You can pull these images from DockerHub.
-* `enclave-key` for signing SGX applications. It should be generated by RSA, with at least [2048 bits](https://en.wikipedia.org/wiki/RSA_numbers#RSA-2048).
-
-
-## 1. Test & Development with PPML image
-
-BigDL PPML provides necessary dependencies for building, signing, debugging and testing SGX applications. In this stage, we recommend using a random key (RSA-2048) provided by BigDL PPML. This key will be used for building & signing SGX enclave. The whole workflow is as follows:
-
-1. Configurations.
-2. Build & Sign SGX enclave with key (randomly generated key or user-provided key).
-3. Run applications.
-
-![](../images/ppml_test_dev.png)
-
-Note that all PPML examples are following this workflow. It will greatly accelerate testing and debugging. But, accessing or mounting `enclave-key` in the deployment environment is not safe. Don't use this workflow in production.
-
-## 2. Build & Deployment your applications
-
-After finishing development and testing, almost all parameters or configurations are fixed. In that case, we can build customer image with these settings.
-
-### Build & sign applications with `enclave-key` in a secured environment
-
-1. Configurations.
-2. Build & Sign SGX enclave with `enclave-key` in BigDL PPML image.
-3. Package SGX enclave into `customer image`.
-
-Note that a `secured environment` is required for signing applications & build image. This environment has access to `enclave-key` and can build image based on BigDL PPML image. This environment doesn't need SGX.
-
-After building & signing, we can get `MRSIGNER` or `MRENCLAVE` in the command line or logs.
-
-### Deploy applications in the production environment
-
-During application deployment, users/customers can enable attestation for integrity. To avoid changing applications or frameworks, you can set up an open-source attestation service. This attestation service will verify `MRSIGNER` or `MRENCLAVE` of your applications. We recommend using [Intel eHSM](https://github.com/intel/ehsm) for both key management and attestation service.
-
-1. Deploy the `customer image`.
-2. Run PPML applications in customer image.
-
-![](../images/ppml_build_deploy.png)
-
-## References
-
-1. [Intel SGX (Software Guard Extensions)](https://www.intel.com/content/www/us/en/developer/tools/software-guard-extensions/overview.html)
-2. [Install SGX Driver for Xeon Server](https://bigdl.readthedocs.io/en/latest/doc/PPML/QuickStart/install_sgx_driver.html)
-3. [RSA-2048](https://en.wikipedia.org/wiki/RSA_numbers#RSA-2048)
-4. [Intel SGX Developer Guide](https://download.01.org/intel-sgx/linux-1.5/docs/Intel_SGX_Developer_Guide.pdf)
diff --git a/docs/readthedocs/source/doc/PPML/QuickStart/end-to-end.md b/docs/readthedocs/source/doc/PPML/QuickStart/end-to-end.md
deleted file mode 100644
index c6e4e476..00000000
--- a/docs/readthedocs/source/doc/PPML/QuickStart/end-to-end.md
+++ /dev/null
@@ -1,175 +0,0 @@
-# PPML End-to-End Workflow Example
-
-## E2E Architecture Overview
-
-In this section we take SimpleQuery as an example to go through the entire BigDL PPML end-to-end workflow. SimpleQuery is simple example to query developers between the ages of 20 and 40 from people.csv.
-
-
-<p align="center">
-  <img src="https://user-images.githubusercontent.com/61072813/178393982-929548b9-1c4e-4809-a628-10fafad69628.png" alt="data lifecycle" />
-</p>
-
-<video src="https://user-images.githubusercontent.com/61072813/184758702-4b9809f9-50ac-425e-8def-0ea1c5bf1805.mp4" width="100%" controls></video>
-
----
-
-## Step 0. Preparation your environment
-To secure your Big Data & AI applications in BigDL PPML manner, you should prepare your environment first, including K8s cluster setup, K8s-SGX plugin setup, key/password preparation, key management service (KMS) and attestation service (AS) setup, BigDL PPML client container preparation. **Please follow the detailed steps in** [Prepare Environment](./docs/prepare_environment.md).
-
-
-## Step 1. Encrypt and Upload Data
-Encrypt the input data of your Big Data & AI applications (here we use SimpleQuery) and then upload encrypted data to the nfs server. More details in [Encrypt Your Data](./services/kms-utils/docker/README.md#3-enroll-generate-key-encrypt-and-decrypt).
-
-1. Generate the input data `people.csv` for SimpleQuery application
-you can use [generate_people_csv.py](https://github.com/analytics-zoo/ppml-e2e-examples/blob/main/spark-encrypt-io/generate_people_csv.py). The usage command of the script is `python generate_people.py </save/path/of/people.csv> <num_lines>`.
-
-2. Encrypt `people.csv`
-    ```
-    docker exec -i $KMSUTIL_CONTAINER_NAME bash -c "bash /home/entrypoint.sh encrypt $appid $apikey $input_file_path"
-    ```
-## Step 2. Build Big Data & AI applications
-To build your own Big Data & AI applications, refer to [develop your own Big Data & AI applications with BigDL PPML](#4-develop-your-own-big-data--ai-applications-with-bigdl-ppml). The code of SimpleQuery is in [here](https://github.com/intel-analytics/BigDL/blob/main/scala/ppml/src/main/scala/com/intel/analytics/bigdl/ppml/examples/SimpleQuerySparkExample.scala), it is already built into bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT.jar, and the jar is put into PPML image.
-
-## Step 3. Attestation
-
-To enable attestation, you should have a running Attestation Service (EHSM-KMS here for example) in your environment. (You can start a KMS  refering to [this link](https://github.com/intel-analytics/BigDL/tree/main/ppml/services/kms-utils/docker)). Configure your KMS app_id and app_key with `kubectl`, and then configure KMS settings in `spark-driver-template.yaml` and `spark-executor-template.yaml` in the container.
-``` bash
-kubectl create secret generic kms-secret --from-literal=app_id=your-kms-app-id --from-literal=app_key=your-kms-app-key
-```
-Configure `spark-driver-template.yaml` for example. (`spark-executor-template.yaml` is similar)
-``` yaml
-apiVersion: v1
-kind: Pod
-spec:
-  containers:
-  - name: spark-driver
-    securityContext:
-      privileged: true
-    env:
-      - name: ATTESTATION
-        value: true
-      - name: ATTESTATION_URL
-        value: your_attestation_url
-      - name: ATTESTATION_ID
-        valueFrom:
-          secretKeyRef:
-            name: kms-secret
-            key: app_id
-      - name: ATTESTATION_KEY
-        valueFrom:
-          secretKeyRef:
-            name: kms-secret
-            key: app_key
-...
-```
-You should get `Attestation Success!` in logs after you [submit a PPML job](#step-4-submit-job) if the quote generated with user report is verified successfully by Attestation Service, or you will get `Attestation Fail! Application killed!` and the job will be stopped.
-
-## Step 4. Submit Job
-When the Big Data & AI application and its input data is prepared, you are ready to submit BigDL PPML jobs. You need to choose the deploy mode and the way to submit job first.
-
-* **There are 4 modes to submit job**:
-
-    1. **local mode**: run jobs locally without connecting to cluster. It is exactly same as using spark-submit to run your application: `$SPARK_HOME/bin/spark-submit --class "SimpleApp" --master local[4] target.jar`, driver and executors are not protected by SGX.
-        <p align="left">
-          <img src="https://user-images.githubusercontent.com/61072813/174703141-63209559-05e1-4c4d-b096-6b862a9bed8a.png" width='250px' />
-        </p>
-
-
-    2. **local SGX mode**: run jobs locally with SGX guarded. As the picture shows, the client JVM is running in a SGX Enclave so that driver and executors can be protected.
-        <p align="left">
-          <img src="https://user-images.githubusercontent.com/61072813/174703165-2afc280d-6a3d-431d-9856-dd5b3659214a.png" width='250px' />
-        </p>
-
-
-    3. **client SGX mode**: run jobs in k8s client mode with SGX guarded. As we know, in K8s client mode, the driver is deployed locally as an external client to the cluster. With **client SGX mode**, the executors running in K8S cluster are protected by SGX, the driver running in client is also protected by SGX.
-        <p align="left">
-          <img src="https://user-images.githubusercontent.com/61072813/174703216-70588315-7479-4b6c-9133-095104efc07d.png" width='500px' />
-        </p>
-
-
-    4. **cluster SGX mode**: run jobs in k8s cluster mode with SGX guarded. As we know, in K8s cluster mode, the driver is deployed on the k8s worker nodes like executors. With **cluster SGX mode**, the driver and  executors running in K8S cluster are protected by SGX.
-        <p align="left">
-          <img src="https://user-images.githubusercontent.com/61072813/174703234-e45b8fe5-9c61-4d17-93ef-6b0c961a2f95.png" width='500px' />
-        </p>
-
-
-* **There are two options to submit PPML jobs**:
-    * use [PPML CLI](./docs/submit_job.md#ppml-cli) to submit jobs manually
-    * use [helm chart](./docs/submit_job.md#helm-chart) to submit jobs automatically
-
-Here we use **k8s client mode** and **PPML CLI** to run SimpleQuery. Check other modes, please see [PPML CLI Usage Examples](./docs/submit_job.md#usage-examples). Alternatively, you can also use Helm to submit jobs automatically, see the details in [Helm Chart Usage](./docs/submit_job.md#helm-chart).
-
-  <details><summary>expand to see details of submitting SimpleQuery</summary>
-
-  1. enter the ppml container
-      ```
-      docker exec -it bigdl-ppml-client-k8s bash
-      ```
-  2. run simplequery on k8s client mode
-      ```
-      #!/bin/bash
-      export secure_password=`openssl rsautl -inkey /ppml/trusted-big-data-ml/work/password/key.txt -decrypt </ppml/trusted-big-data-ml/work/password/output.bin`
-      bash bigdl-ppml-submit.sh \
-              --master $RUNTIME_SPARK_MASTER \
-              --deploy-mode client \
-              --sgx-enabled true \
-              --sgx-log-level error \
-              --sgx-driver-memory 64g \
-              --sgx-driver-jvm-memory 12g \
-              --sgx-executor-memory 64g \
-              --sgx-executor-jvm-memory 12g \
-              --driver-memory 32g \
-              --driver-cores 8 \
-              --executor-memory 32g \
-              --executor-cores 8 \
-              --num-executors 2 \
-              --conf spark.kubernetes.container.image=$RUNTIME_K8S_SPARK_IMAGE \
-              --name simplequery \
-              --verbose \
-              --class com.intel.analytics.bigdl.ppml.examples.SimpleQuerySparkExample \
-              --jars local:///ppml/trusted-big-data-ml/spark-encrypt-io-0.3.0-SNAPSHOT.jar \
-              local:///ppml/trusted-big-data-ml/work/data/simplequery/spark-encrypt-io-0.3.0-SNAPSHOT.jar \
-              --inputPath /ppml/trusted-big-data-ml/work/data/simplequery/people_encrypted \
-              --outputPath /ppml/trusted-big-data-ml/work/data/simplequery/people_encrypted_output \
-              --inputPartitionNum 8 \
-              --outputPartitionNum 8 \
-              --inputEncryptModeValue AES/CBC/PKCS5Padding \
-              --outputEncryptModeValue AES/CBC/PKCS5Padding \
-              --primaryKeyPath /ppml/trusted-big-data-ml/work/data/simplequery/keys/primaryKey \
-              --dataKeyPath /ppml/trusted-big-data-ml/work/data/simplequery/keys/dataKey \
-              --kmsType EHSMKeyManagementService
-              --kmsServerIP your_ehsm_kms_server_ip \
-              --kmsServerPort your_ehsm_kms_server_port \
-              --ehsmAPPID your_ehsm_kms_appid \
-              --ehsmAPIKEY your_ehsm_kms_apikey
-      ```
-
-
-  3. check runtime status: exit the container or open a new terminal
-
-      To check the logs of the Spark driver, run
-      ```
-      sudo kubectl logs $( sudo kubectl get pod | grep "simplequery.*-driver" -m 1 | cut -d " " -f1 )
-      ```
-      To check the logs of an Spark executor, run
-      ```
-      sudo kubectl logs $( sudo kubectl get pod | grep "simplequery-.*-exec" -m 1 | cut -d " " -f1 )
-      ```
-
-  4. If you setup [PPML Monitoring](docs/prepare_environment.md#optional-k8s-monitioring-setup), you can check PPML Dashboard to monitor the status in http://kubernetes_master_url:3000
-
-    ![image](https://user-images.githubusercontent.com/61072813/179948818-a2f6844f-0009-49d1-aeac-2e8c5a7ef677.png)
-
-  </details>
-<br />
-
-## Step 5. Decrypt and Read Result
-When the job is done, you can decrypt and read result of the job. More details in [Decrypt Job Result](./services/kms-utils/docker/README.md#3-enroll-generate-key-encrypt-and-decrypt).
-
-  ```
-  docker exec -i $KMSUTIL_CONTAINER_NAME bash -c "bash /home/entrypoint.sh decrypt $appid $apikey $input_path"
-  ```
-
-## Video Demo
-
-<video src="https://user-images.githubusercontent.com/61072813/184758643-821026c3-40e0-4d4c-bcd3-8a516c55fc01.mp4" width="100%" controls></video>
diff --git a/docs/readthedocs/source/doc/PPML/QuickStart/install_sgx_driver.md b/docs/readthedocs/source/doc/PPML/QuickStart/install_sgx_driver.md
deleted file mode 100644
index 610cf17c..00000000
--- a/docs/readthedocs/source/doc/PPML/QuickStart/install_sgx_driver.md
+++ /dev/null
@@ -1,115 +0,0 @@
-# Install SGX (Software Guard Extensions) Driver for Xeon Server
-
-Checklist for SGX Driver:
-
-1. Please [check if your CPU has SGX feature](https://www.intel.com/content/www/us/en/support/articles/000028173/processors.html).
-2. Check if SGX feature is correctly enabled on BIOS. Please ensure enough memory is installed.
-   * Disable `UMA-Based Clustering`.
-   * Enable `SGX` or `SW Guard Extensions(SGX)`. Set `PRMRR` to the max. Please ensure Reserved Memory Range Registers (PRMRR) are configured for SGX.
-   * SGX will reserve some memory from the installed memory. This memory (PRMRR) can not be seen by your system (total memory), e.g., `free -h`. So, `Installed Memory = Total Memory + 2 * PRMRR`.
-   * Enable `Auto MP Registration`. This setting is for remote attestation.
-3. Recommended OS (Operating System): Ubuntu 18.04/20.04, CentOS 8, Redhat 8.
-
-**Note that SGX driver has been merged to Linux Kernel from 5.11+. After enabling SGX feature during kernel building, SGX driver will be automatically enabled.** So, we recommend our customers upgrade their kernel to 5.14+ with SGX enabled. See [Building Linux Kernel from Source with SGX Enabled](#building-linux-kernel-from-source-with-sgx-enabled).
-
-If your data center cannot upgrade OS or kernel, then you can [Install SGX Driver through the Installation Package](#install-sgx-driver-through-the-installation-package).
-
-## Building Linux Kernel from Source with SGX Enabled
-
-In this guide, we show how to build Kernel 5.14 from the source code and enable SGX feature on Ubuntu 18.04/20.04. You can change the kernel version, i.e., 5.14 if necessary.
-
-### Prerequisite for kernel build
-
-Install prerequisites for kernel build. Please follow your distro instruction or your favorite way to build the kernel.
-
-```bash
-sudo apt-get install flex bison git build-essential kernel-package fakeroot libncurses5-dev libssl-dev ccache libelf-dev
-```
-
-### Main steps
-
-Clone Linux Kernel source code.
-
-```bash
-# Obtain Linux kernel source tree
-mkdir kernel && cd kernel
-git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
-cd linux
-# You can change this version
-git checkout v5.14
-```
-
-Build Kernel from source code with SGX enabled.
-
-```bash
-cp /boot/config-`uname -r` .config
-yes '' | make oldconfig
-# Enable SGX and SGX KVM
-/bin/sed -i 's/^# CONFIG_X86_SGX is not set/CONFIG_X86_SGX=y/g' .config
-echo 'CONFIG_X86_SGX_KVM=y' >> .config
-make -j `getconf _NPROCESSORS_ONLN` deb-pkg
-```
-
-Install kernel from deb and reboot
-
-```bash
-cd ..
-sudo dpkg -i linux-headers-5.14.0_5.14.0-1_amd64.deb linux-image-5.14.0_5.14.0-1_amd64.deb
-sudo reboot
-```
-
-Check if Kernel was installed correctly and the SGX driver is working
-
-```bash
-uname -r
-ls -l /dev/ | grep sgx
-```
-
-### Uninstall this kernel
-
-Uninstall kernel with dpkg (if you want to change back to the previous kernel)
-
-```bash
-sudo dpkg --purge linux-image-5.14.0 linux-headers-5.14.0
-sudo reboot
-```
-
-## Install SGX Driver through the Installation Package
-
-**Warning:** This guide is only for customers who cannot enable SGX driver in kernel.
-
-In this guide, we show how to install SGX driver with the installation package. This allows customers to enable SGX without upgrading their OS or kernel. More details in [Intel_SGX_SW_Installation_Guide_for_Linux.pdf](https://download.01.org/intel-sgx/latest/dcap-latest/linux/docs/Intel_SGX_SW_Installation_Guide_for_Linux.pdf).
-
-### Prerequisite for SGX Driver
-
-```bash
-sudo apt-get install build-essential ocaml automake autoconf libtool wget python libssl-dev dkms 
-```
-
-### Download & Install SGX Driver binary file
-
-```bash
-wget - https://download.01.org/intel-sgx/latest/linux-latest/distro/ubuntu20.04-server/sgx_linux_x64_driver_1.41.bin
-chmod 777 sgx_linux_x64_driver_1.41.bin
-sudo ./sgx_linux_x64_driver_1.41.bin
-```
-
-Check if the SGX driver is installed correctly
-
-```bash
-ls -l /dev/ | grep sgx
-```
-
-If you encounter any issue during installation, please open an issue on [Intel Software Guard Extensions Data Center Attestation Primitives](https://github.com/intel/SGXDataCenterAttestationPrimitives)
-
-## Trouble Shooting
-
-* Building on Ubuntu 5.4.X may encounter
-  * "dpkg-source: error: cannot represent change to vmlinux-gdb.py:". Remove `vmlinux-gdb.py`, then build again.
-  * "make[2]: *** No rule to make target 'debian/certs/benh@debian.org.cert.pem', needed by 'certs/x509_certificate_list'.  Stop.". Please disable `SYSTEM_TRUSTED_KEYS`, i.e., `CONFIG_SYSTEM_TRUSTED_KEYS=""` in `.config`. Refer to [CONFIG_SYSTEM_TRUSTED_KEYS](https://askubuntu.com/questions/1329538/compiling-the-kernel-5-11-11).
-  * "make[4]: *** No rule to make target 'debian/canonical-revoked-certs.pem', needed by 'certs/x509_revocation_list'.  Stop.". Please disable `SYSTEM_REVOCATION_KEYS`, i.e., `CONFIG_SYSTEM_REVOCATION_KEYS=""` in `.config`.
-  * "BTF: .tmp_vmlinux.btf: pahole (pahole) is not available. Failed to generate BTF for vmlinux". `dwarves` are missing. `sudo apt-get install dwarves`.
-* In some kernels, SGX option is `CONFIG_INTEL_SGX`.
-* 5.13 Kernel may encounter nfs problem [Can't mount NFS-shares from Linux-5.13.0](https://forums.gentoo.org/viewtopic-p-8629887.html?sid=f7359b869fb71849d64f3e69bb48503a)
-* [Mellanox interface may be disabled on 5.14.0](https://bugzilla.redhat.com/show_bug.cgi?id=2014094). Changes to 5.15.5 will fix this issue.
-* Error 404 when downloading binary file. Please go to [intel-sgx-linux](https://download.01.org/intel-sgx/latest/linux-latest/distro) for the latest download link.
diff --git a/docs/readthedocs/source/doc/PPML/QuickStart/secure_your_services.md b/docs/readthedocs/source/doc/PPML/QuickStart/secure_your_services.md
deleted file mode 100644
index 34dddd49..00000000
--- a/docs/readthedocs/source/doc/PPML/QuickStart/secure_your_services.md
+++ /dev/null
@@ -1,62 +0,0 @@
-# Secure Your Services
-
-This document is a gentle reminder for enabling security & privacy features for your services. To avoid privacy & security issues during deployment, we recommend Developer/Admin go through this document, which suits users/customers who want to apply BigDL into their production environment (not just for PPML).
-
-## Security in the data lifecycle
-
-Almost all Big Data & AI applications are built upon large-scale datasets, we can simply go through security key steps in the data lifecycle. That is data protection:
-* In transit, i.e., network.
-* At rest, i.e., storage.
-* In use, i.e., computation.
-
-### Secure Network (in transit)
-
-Big Data & AI applications are mainly distributed applications, which means we need to use lots of nodes to run our applications and get jobs done. During that period, not just control flows (commands used to control applications running on different nodes), data partitions (a division of data) may also go through different nodes. So, we need to ensure all network traffic is fully protected.
-
-Talking about secure data transit, TLS is commonly used. The server would provide a private key and certificate chain. To make sure it is fully secured, a complete certificate chain is needed (with two or more certificates built). In addition, SSL/TLS protocol and secure cipher tools would be used. It is also recommended to use forward secrecy and strong key exchange. However, it is general that secure approaches would bring some performance problems. To mitigate these problems, a series of approaches are available, including session resumption, cache, etc. For the details of this section, please see [SSL-and-TLS-Deployment-Best-Practices](https://github.com/ssllabs/research/wiki/SSL-and-TLS-Deployment-Best-Practices).
-### Secure Storage (in storage)
-
-Besides network traffic, we also need to ensure data is safely stored in storage. In Big Data & AI applications, data is mainly stored in distributed storage or cloud storage, e.g., HDFS, Ceph and AWS S3 etc. This makes storage security a bit different. We need to ensure each storage node is secured by the correct settings, meanwhile, we need to ensure the whole storage system is secured (network, access control, authentication etc).
-
-### Secure Computation (in use)
-
-Even if data is fully encrypted in transit and storage, we still need to decrypt it when we make some computations. If this stage is not safe, then security & secrets never exist. That's why TEE (SGX/TDX) is so important. In Big Data & AI, applications and data are distributed into different nodes. If any of these nodes are controlled by an adversary, he can simply dump sensitive data from memory or crash your applications. There are lots of security technologies to ensure computation safety. Please check if they are correctly enabled.
-
-## Example: Spark on Kubernetes with data stored on HDFS
-
-WARNING: This example lists minimum security features that should be enabled for your applications. In production, please confirm with your cluster admin or security reviewer.
-
-### Prepare & Manage Your keys
-
-Ensure you are generating, using & managing your keys in the right way. Check with your admin or security reviewer about that. Using Key Management Service (KMS) in your deployment environment is recommended. It will reduce a lot of effort and potential issues.
-
-Back to our example, please prepare SSL & TLS keys based on [SSL & TLS Private Key and Certificate](https://github.com/ssllabs/research/wiki/SSL-and-TLS-Deployment-Best-Practices#1-private-key-and-certificate). Ensure these keys are correctly configured and stored.
-
-In most cases, AES encryption key is not necessary, because Hadoop KMS and Spark will automatically generate keys for your applications or files. However, if you want to use your own keys, please please refer to [generate keys for encryption and decryption](https://docs.microsoft.com/en-us/dotnet/standard/security/generating-keys-for-encryption-and-decryption).
-
-### [HDFS Security](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SecureMode.html)
-
-Please ensure authentication and [access control](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html) are correctly configured. Note that HDFS authentication relies on [Kerberos](http://web.mit.edu/kerberos/krb5-1.12/doc/user/user_commands/kinit.html).
-
-Enable [Data_confidentiality](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SecureMode.html#Data_confidentiality) for network. This will protect PRC, block transfer and http.
-
-When storing sensitive data in HDFS, please enable [Transparent Encryption](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html) in HDFS. This feature ensures all data blocks are encrypted on data nodes.
-
-### [Spark Security](https://spark.apache.org/docs/latest/security.html)
-
-Please ensure [network crypto](https://spark.apache.org/docs/latest/security.html#encryption) and [spark.authenticate](https://spark.apache.org/docs/latest/security.html#spark-rpc-communication-protocol-between-spark-processes) are enabled.
-
-Enable [Local Storage Encryption](https://spark.apache.org/docs/latest/security.html#local-storage-encryption) to protect local temp data.
-
-Enable [SSL](https://spark.apache.org/docs/latest/security.html#ssl-configuration) to secure Spark Webui.
-
-You can enable [Kerberos related settings](https://spark.apache.org/docs/latest/security.html#kerberos) if you have Kerberos service.
-
-### [Kubernetes Security](https://kubernetes.io/docs/concepts/security/)
-
-As a huge resource management service, Kubernetes has lots of security features.
-
-Enable [RBAC](https://kubernetes.io/docs/concepts/security/rbac-good-practices/) to ensure that cluster users and workloads have only access to resources required to execute their roles.
-
-Enable [Encrypting Secret Data at Rest](https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/) to protect data in rest API.
-When mounting key & sensitive configurations into pods, use [Kubernetes Secret](https://kubernetes.io/docs/concepts/configuration/secret/).
diff --git a/docs/readthedocs/source/doc/PPML/QuickStart/tpc-ds_with_sparksql_on_k8s.md b/docs/readthedocs/source/doc/PPML/QuickStart/tpc-ds_with_sparksql_on_k8s.md
deleted file mode 100644
index 18bdf659..00000000
--- a/docs/readthedocs/source/doc/PPML/QuickStart/tpc-ds_with_sparksql_on_k8s.md
+++ /dev/null
@@ -1,221 +0,0 @@
-## TPC-DS with Trusted SparkSQL on Kubernetes
-
-### Prerequisites
-
-- Hardware that supports SGX
-- A fully configured Kubernetes cluster
-- Intel SGX Device Plugin to use SGX in K8S cluster (install following instructions [here](https://bigdl.readthedocs.io/en/latest/doc/PPML/QuickStart/deploy_intel_sgx_device_plugin_for_kubernetes.html "here"))
-
-### Prepare TPC-DS kit and data
-
-1. Download and compile TPC-DS kit
-
-```bash
-git clone --recursive https://github.com/intel-analytics/zoo-tutorials.git
-cd zoo-tutorials/tpcds-spark
-git clone https://github.com/databricks/tpcds-kit.git
-cd tpcds-kit/tools
-make OS=LINUX
-cd ../../
-sbt package
-```
-
-2. Generate data
-
-```bash
-cd /path/to/zoo-tutorials/tpcds-spark/spark-sql-perf
-sbt "test:runMain com.databricks.spark.sql.perf.tpcds.GenTPCDSData -d <dsdgenDir> -s <scaleFactor> -l <dataDir> -f parquet"
-```
-
-`dsdgenDir` is the path of `tpcds-kit/tools`, `scaleFactor` indicates data size, for example `-s 1` will generate data of 1GB scale factor, `dataDir` is the path to store generated data.
-
-### Deploy PPML TPC-DS on Kubernetes
-1. Pull docker image
-
-```bash
-sudo docker pull intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT
-```
-
-2. Prepare keys, password and k8s configurations (follow instructions [here](https://github.com/intel-analytics/BigDL/tree/main/ppml/trusted-big-data-ml/python/docker-graphene#11-prepare-the-keyspassworddataenclave-keypem "here")), make sure keys, `tpcds-spark` and generated tpc-ds data can be accessed on each K8S node, e.g. deploy on distributed storage inclusing NFS and HDFS. 
-3. Start a bigdl-ppml enabled Spark K8S client container with configured local IP, key, tpc-ds and kubeconfig path, also configure data path if your data is stored on local FS
-
-```bash
-export ENCLAVE_KEY=/YOUR_DIR/keys/enclave-key.pem
-export TPCDS_PATH=/YOUR_DIR/zoo-tutorials/tpcds-spark
-export DATA_PATH=/YOUR_DIR/data
-export KEYS_PATH=/YOUR_DIR/keys
-export SECURE_PASSWORD_PATH=/YOUR_DIR/password
-export KUBECONFIG_PATH=/YOUR_DIR/kubeconfig
-export LOCAL_IP=$local_ip
-export DOCKER_IMAGE=intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT
-sudo docker run -itd \
-        --privileged \
-        --net=host \
-        --name=spark-k8s-client \
-        --oom-kill-disable \
-        --device=/dev/sgx/enclave \
-        --device=/dev/sgx/provision \
-        -v /var/run/aesmd/aesm.socket:/var/run/aesmd/aesm.socket \
-        -v $ENCLAVE_KEY:/graphene/Pal/src/host/Linux-SGX/signer/enclave-key.pem \
-        -v $TPCDS_PATH:/ppml/trusted-big-data-ml/work/tpcds-spark \
-        -v $DATA_PATH:/ppml/trusted-big-data-ml/work/data \
-        -v $KEYS_PATH:/ppml/trusted-big-data-ml/work/keys \
-        -v $SECURE_PASSWORD_PATH:/ppml/trusted-big-data-ml/work/password \
-        -v $KUBECONFIG_PATH:/root/.kube/config \
-        -e RUNTIME_SPARK_MASTER=k8s://https://$LOCAL_IP:6443 \
-        -e RUNTIME_K8S_SERVICE_ACCOUNT=spark \
-        -e RUNTIME_K8S_SPARK_IMAGE=$DOCKER_IMAGE \
-        -e RUNTIME_DRIVER_HOST=$LOCAL_IP \
-        -e RUNTIME_DRIVER_PORT=54321 \
-        -e RUNTIME_EXECUTOR_INSTANCES=1 \
-        -e RUNTIME_EXECUTOR_CORES=4 \
-        -e RUNTIME_EXECUTOR_MEMORY=20g \
-        -e RUNTIME_TOTAL_EXECUTOR_CORES=4 \
-        -e RUNTIME_DRIVER_CORES=4 \
-        -e RUNTIME_DRIVER_MEMORY=10g \
-        -e SGX_MEM_SIZE=64G \
-        -e SGX_LOG_LEVEL=error \
-        -e LOCAL_IP=$LOCAL_IP \
-        $DOCKER_IMAGE bash
-```
-
-4. Attach to the client container
-
-```bash
-sudo docker exec -it spark-local-k8s-client bash
-```
-
-5. Create external tables
-
-```bash
-cd /ppml/trusted-big-data-ml/work/tpcds-spark
-$SPARK_HOME/bin/spark-submit \
-        --class "createTables" \
-        --master <spark-master> \
-        --driver-memory 20G \
-        --executor-cores <executor-cores> \
-        --total-executor-cores <total-cores> \
-        --executor-memory 20G \
-        --jars spark-sql-perf/target/scala-2.12/spark-sql-perf_2.12-0.5.1-SNAPSHOT.jar \
-        target/scala-2.12/tpcds-benchmark_2.12-0.1.jar <dataDir> <dsdgenDir> <scaleFactor>
-```
-`<dataDir>` and `<dsdgenDir>` are the generated data path and `tpcds-kit/tools` path, both should be accessible in the container. After successfully creating tables, there should be a directory `metastore_db` in the current working path. 
-
-6. Modify `/ppml/trusted-big-data-ml/spark-executor-template.yaml`, add path of `enclave-key`, `tpcds-spark` and `kubeconfig`. If data is not stored on HDFS, also configure mount volume `data` and make sure `mountPath` is the same as `<dataDir>` used in create table step.
-
-```yaml
-apiVersion: v1
-kind: Pod
-spec:
-  containers:
-  - name: spark-executor
-    securityContext:
-      privileged: true
-    volumeMounts:
-      - name: enclave-key
-        mountPath: /graphene/Pal/src/host/Linux-SGX/signer/enclave-key.pem
-    ...
-      - name: tpcds
-        mountPath: /ppml/trusted-big-data-ml/work/tpcds-spark
-      - name: data
-        mountPath: /mounted/path/to/data
-      - name: kubeconf
-        mountPath: /root/.kube/config
-  volumes:
-    - name: enclave-key
-      hostPath:
-        path:  /path/to/keys/enclave-key.pem
-    ...
-    - name: tpcds
-      hostPath:
-        path: /path/to/tpcds-spark
-    - name: data
-      hostPath:
-        path: /path/to/data
-    - name: kubeconf
-      hostPath:
-        path: /path/to/kubeconfig
-```
-
-7. Execute TPC-DS queries
-
-Optional argument `QUERY` is the query number to run. Multiple query numbers should be separated by space, e.g. `1 2 3`. If no query number is specified, all 1-99 queries would be executed. Configure `$hdfs_host_ip` and `$hdfs_port` if the output is stored on HDFS. 
-
-```bash
-cd /ppml/trusted-big-data-ml/work/tpcds-spark
-secure_password=`openssl rsautl -inkey /ppml/trusted-big-data-ml/work/password/key.txt -decrypt </ppml/trusted-big-data-ml/work/password/output.bin` && \
-export TF_MKL_ALLOC_MAX_BYTES=10737418240 && \
-export SPARK_LOCAL_IP=$LOCAL_IP && \
-export HDFS_HOST=$hdfs_host_ip && \
-export HDFS_PORT=$hdfs_port && \
-export TPCDS_DIR=/ppml/trusted-big-data-ml/work/tpcds-spark \
-export OUTPUT_DIR=hdfs://$HDFS_HOST:$HDFS_PORT/tpc-ds/output \
-export QUERY=3
-  /opt/jdk8/bin/java \
-    -cp '$TPCDS_DIR/target/scala-2.12/tpcds-benchmark_2.12-0.1.jar:$TPCDS_DIR/spark-sql-perf/target/scala-2.12/spark-sql-perf_2.12-0.5.1-SNAPSHOT.jar:/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/*' \
-    -Xmx10g \
-    -Dbigdl.mklNumThreads=1 \
-    org.apache.spark.deploy.SparkSubmit \
-    --master $RUNTIME_SPARK_MASTER \
-    --deploy-mode client \
-    --name spark-tpcds-sgx \
-    --conf spark.driver.host=$LOCAL_IP \
-    --conf spark.driver.port=54321 \
-    --conf spark.driver.memory=10g \
-    --conf spark.driver.blockManager.port=10026 \
-    --conf spark.blockManager.port=10025 \
-    --conf spark.scheduler.maxRegisteredResourcesWaitingTime=5000000 \
-    --conf spark.worker.timeout=600 \
-    --conf spark.python.use.daemon=false \
-    --conf spark.python.worker.reuse=false \
-    --conf spark.network.timeout=10000000 \
-    --conf spark.starvation.timeout=250000 \
-    --conf spark.rpc.askTimeout=600 \
-    --conf spark.sql.autoBroadcastJoinThreshold=-1 \
-    --conf spark.io.compression.codec=lz4 \
-    --conf spark.sql.shuffle.partitions=8 \
-    --conf spark.speculation=false \
-    --conf spark.executor.heartbeatInterval=10000000 \
-    --conf spark.executor.instances=24 \
-    --executor-cores 8 \
-    --total-executor-cores 192 \
-    --executor-memory 16G \
-    --properties-file /ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/conf/spark-bigdl.conf \
-    --conf spark.kubernetes.authenticate.serviceAccountName=spark \
-    --conf spark.kubernetes.container.image=$RUNTIME_K8S_SPARK_IMAGE \
-    --conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/spark-executor-template.yaml \
-    --conf spark.kubernetes.executor.deleteOnTermination=false \
-    --conf spark.kubernetes.executor.podNamePrefix=spark-tpcds-sgx \
-    --conf spark.kubernetes.sgx.enabled=true \
-    --conf spark.kubernetes.sgx.executor.mem=32g \
-    --conf spark.kubernetes.sgx.executor.jvm.mem=6g \
-    --conf spark.kubernetes.sgx.log.level=$SGX_LOG_LEVEL \
-    --conf spark.authenticate=true \
-    --conf spark.authenticate.secret=$secure_password \
-    --conf spark.kubernetes.executor.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
-    --conf spark.kubernetes.driver.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
-    --conf spark.authenticate.enableSaslEncryption=true \
-    --conf spark.network.crypto.enabled=true \
-    --conf spark.network.crypto.keyLength=128 \
-    --conf spark.network.crypto.keyFactoryAlgorithm=PBKDF2WithHmacSHA1 \
-    --conf spark.io.encryption.enabled=true \
-    --conf spark.io.encryption.keySizeBits=128 \
-    --conf spark.io.encryption.keygen.algorithm=HmacSHA1 \
-    --conf spark.ssl.enabled=true \
-    --conf spark.ssl.port=8043 \
-    --conf spark.ssl.keyPassword=$secure_password \
-    --conf spark.ssl.keyStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \
-    --conf spark.ssl.keyStorePassword=$secure_password \
-    --conf spark.ssl.keyStoreType=JKS \
-    --conf spark.ssl.trustStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \
-    --conf spark.ssl.trustStorePassword=$secure_password \
-    --conf spark.ssl.trustStoreType=JKS \
-    --class "TPCDSBenchmark" \
-    --jars $TPCDS_DIR/spark-sql-perf/target/scala-2.12/spark-sql-perf_2.12-0.5.1-SNAPSHOT.jar \
-    --verbose \
-    $TPCDS_DIR/target/scala-2.12/tpcds-benchmark_2.12-0.1.jar \
-    $OUTPUT_DIR $QUERY
-```
-Note: For Spark cluster mode, the `metastore_db` directory generated in table create step needs to be mounted into driver pod, and the path in the container needs to specified by adding  `--conf spark.hadoop.javax.jdo.option.ConnectionURL="jdbc:derby:;databaseName=/path/to/metastore_db;create=true" \` to `spark-submit` command.
-
-After benchmark is finished, the performance result is saved as `part-*.csv` file under `<OUTPUT_DIR>/performance` directory.
diff --git a/docs/readthedocs/source/doc/PPML/QuickStart/tpc-h_with_sparksql_on_k8s.md b/docs/readthedocs/source/doc/PPML/QuickStart/tpc-h_with_sparksql_on_k8s.md
deleted file mode 100644
index 0ede024c..00000000
--- a/docs/readthedocs/source/doc/PPML/QuickStart/tpc-h_with_sparksql_on_k8s.md
+++ /dev/null
@@ -1,205 +0,0 @@
-## TPC-H with Trusted SparkSQL on Kubernetes ##
-
-### Prerequisites ###
-- Hardware that supports SGX
-- A fully configured Kubernetes cluster
-- Intel SGX Device Plugin to use SGX in K8S cluster (install following instructions [here](https://bigdl.readthedocs.io/en/latest/doc/PPML/QuickStart/deploy_intel_sgx_device_plugin_for_kubernetes.html "here"))
-
-### Prepare TPC-H kit and data ###
-1. Generate data
-
-   Go to [TPC Download](https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp) site, choose `TPC-H` source code, then download the TPC-H toolkits. **Follow the download instructions carefully.**
-   After you download the tpc-h tools zip and uncompressed the zip file. Go to `dbgen` directory, and create `makefile` based on `makefile.suite`, and modify `makefile` according to the prompts inside, and run `make`.
-
-   This should generate an executable called `dbgen`
-   ```
-   ./dbgen -h
-   ```
-
-   gives you the various options for generating the tables. The simplest case is running:
-   ```
-   ./dbgen
-   ```
-   which generates tables with extension `.tbl` with scale 1 (default) for a total of rougly 1GB size across all tables. For different size tables you can use the `-s` option:
-   ```
-   ./dbgen -s 10
-   ```
-   will generate roughly 10GB of input data.
-
-   You need to move all .tbl files to a new directory as raw data.
-
-   You can then either upload your data to remote file system or read them locally.
-
-2. Encrypt Data
-
-   Encrypt data with specified Key Management Service (`SimpleKeyManagementService`, or `EHSMKeyManagementService` , or `AzureKeyManagementService`). Details can be found here: https://github.com/intel-analytics/BigDL/tree/main/ppml/services/kms-utils/docker
-
-   The example code of encrypt data with `SimpleKeyManagementService` is like below:
-   ```
-   java -cp "$BIGDL_HOME/jars/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT.jar:$SPARK_HOME/conf/:$SPARK_HOME/jars/*:$BIGDL_HOME/jars/*"  \
-     -Xmx10g \
-     com.intel.analytics.bigdl.ppml.examples.tpch.EncryptFiles \
-     --inputPath xxx/dbgen-input \
-     --outputPath xxx/dbgen-encrypted
-     --kmsType SimpleKeyManagementService
-     --simpleAPPID xxxxxxxxxxxx \
-     --simpleAPPKEY xxxxxxxxxxxx \
-     --primaryKeyPath /path/to/simple_encrypted_primary_key \
-     --dataKeyPath /path/to/simple_encrypted_data_key
-   ```
-
-### Deploy PPML TPC-H on Kubernetes ###
-1. Pull docker image
-   ```
-   sudo docker pull intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT
-   ```
-2. Prepare SGX keys (following instructions [here](https://github.com/intel-analytics/BigDL/tree/main/ppml/trusted-big-data-ml/python/docker-graphene#11-prepare-the-keyspassworddataenclave-keypem "here")), make sure keys and tpch-spark can be accessed on each K8S node
-3. Start a bigdl-ppml enabled Spark K8S client container with configured local IP, key, tpch and kuberconfig path
-   ```
-   export ENCLAVE_KEY=/path/to/enclave-key.pem
-   export SECURE_PASSWORD_PATH=/path/to/password
-   export DATA_PATH=/path/to/data
-   export KEYS_PATH=/path/to/keys
-   export KUBERCONFIG_PATH=/path/to/kuberconfig
-   export LOCAL_IP=$local_ip
-   export DOCKER_IMAGE=intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT
-   sudo docker run -itd \
-           --privileged \
-           --net=host \
-           --name=spark-local-k8s-client \
-           --oom-kill-disable \
-           --device=/dev/sgx/enclave \
-           --device=/dev/sgx/provision \
-           -v /var/run/aesmd/aesm.socket:/var/run/aesmd/aesm.socket \
-           -v $SECURE_PASSWORD_PATH:/ppml/trusted-big-data-ml/work/password \
-           -v $ENCLAVE_KEY:/graphene/Pal/src/host/Linux-SGX/signer/enclave-key.pem \
-           -v $DATA_PATH:/ppml/trusted-big-data-ml/work/data \
-           -v $KEYS_PATH:/ppml/trusted-big-data-ml/work/keys \
-           -v $KUBERCONFIG_PATH:/root/.kube/config \
-           -e RUNTIME_SPARK_MASTER=k8s://https://$LOCAL_IP:6443 \
-           -e RUNTIME_K8S_SERVICE_ACCOUNT=spark \
-           -e RUNTIME_K8S_SPARK_IMAGE=$DOCKER_IMAGE \
-           -e RUNTIME_DRIVER_HOST=$LOCAL_IP \
-           -e RUNTIME_DRIVER_PORT=54321 \
-           -e RUNTIME_EXECUTOR_INSTANCES=1 \
-           -e RUNTIME_EXECUTOR_CORES=4 \
-           -e RUNTIME_EXECUTOR_MEMORY=20g \
-           -e RUNTIME_TOTAL_EXECUTOR_CORES=4 \
-           -e RUNTIME_DRIVER_CORES=4 \
-           -e RUNTIME_DRIVER_MEMORY=10g \
-           -e SGX_MEM_SIZE=64G \
-           -e SGX_LOG_LEVEL=error \
-           -e LOCAL_IP=$LOCAL_IP \
-           $DOCKER_IMAGE bash
-   ``` 
-4. Attach to the client container
-   ```
-   sudo docker exec -it spark-local-k8s-client bash
-   ```
-5. Modify `spark-executor-template.yaml`, add path of `enclave-key`, `tpch-spark` and `kuberconfig` on host
-   ```
-   apiVersion: v1
-   kind: Pod
-   spec:
-     containers:
-     - name: spark-executor
-       securityContext:
-         privileged: true
-       volumeMounts:
-         ...
-         - name: tpch
-           mountPath: /ppml/trusted-big-data-ml/work/tpch-spark
-         - name: kubeconf
-           mountPath: /root/.kube/config
-     volumes:
-       - name: enclave-key
-         hostPath:
-           path:  /root/keys/enclave-key.pem
-         ...
-       - name: tpch
-         hostPath:
-           path: /path/to/tpch-spark
-       - name: kubeconf
-         hostPath:
-           path: /path/to/kuberconfig
-   ```
-6. Run PPML TPC-H
-   ```bash
-   secure_password=`openssl rsautl -inkey /ppml/trusted-big-data-ml/work/password/key.txt -decrypt </ppml/trusted-big-data-ml/work/password/output.bin` && \
-   export TF_MKL_ALLOC_MAX_BYTES=10737418240 && \
-   export SPARK_LOCAL_IP=$LOCAL_IP && \
-   export INPUT_DIR=xxx/dbgen-encrypted && \
-   export OUTPUT_DIR=xxx/dbgen-output && \
-     /opt/jdk8/bin/java \
-       -cp '/ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/lib/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar:/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/*' \
-       -Xmx10g \
-       -Dbigdl.mklNumThreads=1 \
-       org.apache.spark.deploy.SparkSubmit \
-       --master $RUNTIME_SPARK_MASTER \
-       --deploy-mode client \
-       --name spark-tpch-sgx \
-       --conf spark.driver.host=$LOCAL_IP \
-       --conf spark.driver.port=54321 \
-       --conf spark.driver.memory=10g \
-       --conf spark.driver.blockManager.port=10026 \
-       --conf spark.blockManager.port=10025 \
-       --conf spark.scheduler.maxRegisteredResourcesWaitingTime=5000000 \
-       --conf spark.worker.timeout=600 \
-       --conf spark.python.use.daemon=false \
-       --conf spark.python.worker.reuse=false \
-       --conf spark.network.timeout=10000000 \
-       --conf spark.starvation.timeout=250000 \
-       --conf spark.rpc.askTimeout=600 \
-       --conf spark.sql.autoBroadcastJoinThreshold=-1 \
-       --conf spark.io.compression.codec=lz4 \
-       --conf spark.sql.shuffle.partitions=8 \
-       --conf spark.speculation=false \
-       --conf spark.executor.heartbeatInterval=10000000 \
-       --conf spark.executor.instances=24 \
-       --executor-cores 8 \
-       --total-executor-cores 192 \
-       --executor-memory 16G \
-       --properties-file /ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/conf/spark-bigdl.conf \
-       --conf spark.kubernetes.authenticate.serviceAccountName=spark \
-       --conf spark.kubernetes.container.image=$RUNTIME_K8S_SPARK_IMAGE \
-       --conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/spark-executor-template.yaml \
-       --conf spark.kubernetes.executor.deleteOnTermination=false \
-       --conf spark.kubernetes.executor.podNamePrefix=spark-tpch-sgx \
-       --conf spark.kubernetes.sgx.enabled=true \
-       --conf spark.kubernetes.sgx.executor.mem=32g \
-       --conf spark.kubernetes.sgx.executor.jvm.mem=10g \
-       --conf spark.kubernetes.sgx.log.level=$SGX_LOG_LEVEL \
-       --conf spark.authenticate=true \
-       --conf spark.authenticate.secret=$secure_password \
-       --conf spark.kubernetes.executor.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
-       --conf spark.kubernetes.driver.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
-       --conf spark.authenticate.enableSaslEncryption=true \
-       --conf spark.network.crypto.enabled=true \
-       --conf spark.network.crypto.keyLength=128 \
-       --conf spark.network.crypto.keyFactoryAlgorithm=PBKDF2WithHmacSHA1 \
-       --conf spark.io.encryption.enabled=true \
-       --conf spark.io.encryption.keySizeBits=128 \
-       --conf spark.io.encryption.keygen.algorithm=HmacSHA1 \
-       --conf spark.ssl.enabled=true \
-       --conf spark.ssl.port=8043 \
-       --conf spark.ssl.keyPassword=$secure_password \
-       --conf spark.ssl.keyStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \
-       --conf spark.ssl.keyStorePassword=$secure_password \
-       --conf spark.ssl.keyStoreType=JKS \
-       --conf spark.ssl.trustStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \
-       --conf spark.ssl.trustStorePassword=$secure_password \
-       --conf spark.ssl.trustStoreType=JKS \
-       --conf spark.bigdl.kms.type=SimpleKeyManagementService \
-       --conf spark.bigdl.kms.simple.id=simpleAPPID \
-       --conf spark.bigdl.kms.simple.key=simpleAPIKEY \
-       --conf spark.bigdl.kms.key.primary=xxxx/primaryKey \
-       --conf spark.bigdl.kms.key.data=xxxx/dataKey \
-       --class com.intel.analytics.bigdl.ppml.examples.tpch.TpchQuery \
-       --verbose \
-       /ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/lib/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \
-       $INPUT_DIR $OUTPUT_DIR aes/cbc/pkcs5padding plain_text [QUERY]
-   ```
-   The optional parameter [QUERY] is the number of the query to run e.g 1, 2, ..., 22.
-
-   The result is in OUTPUT_DIR. There should be a file called TIMES.TXT with content formatted like:
-   >Q01     39.80204010
diff --git a/docs/readthedocs/source/doc/PPML/QuickStart/trusted-serving-on-k8s-guide.md b/docs/readthedocs/source/doc/PPML/QuickStart/trusted-serving-on-k8s-guide.md
deleted file mode 100644
index aec83359..00000000
--- a/docs/readthedocs/source/doc/PPML/QuickStart/trusted-serving-on-k8s-guide.md
+++ /dev/null
@@ -1,153 +0,0 @@
-# Trusted Cluster Serving with Graphene on Kubernetes #
-
-## Prerequisites ##
-Prior to deploying PPML Cluster Serving, please make sure the following is setup
-- Hardware that supports SGX
-- A fully configured Kubernetes cluster
-- Intel SGX Device Plugin to use SGX in K8S cluster (install following instructions [here](https://github.com/intel-analytics/BigDL/tree/main/ppml/trusted-realtime-ml/scala/docker-graphene/kubernetes#deploy-the-intel-sgx-device-plugin-for-kubernetes "here"))
-- Java
-
-## Deploy Trusted Realtime ML for Kubernetes ##
-1. Pull docker image from dockerhub
-	```
-	$ docker pull intelanalytics/bigdl-ppml-trusted-realtime-ml-scala-graphene:2.1.0-SNAPSHOT
-	```
-2. Pull the source code of BigDL and enter PPML graphene k8s directory
-	```
-	$ git clone https://github.com/intel-analytics/BigDL.git
-	$ cd BigDL/ppml/trusted-realtime-ml/scala/docker-graphene/kubernetes
-	```
-3. Generate secure keys and passwords, and deploy as secrets (Refer [here](https://github.com/intel-analytics/BigDL/tree/main/ppml/trusted-realtime-ml/scala/docker-graphene/kubernetes#secure-keys-and-password) for details)
-	1. Generate keys and passwords
-		
-		Note: Make sure to add `${JAVA_HOME}/bin` to `$PATH` to avoid `keytool: command not found` error.
-		```
-		$ sudo ../../../../scripts/generate-keys.sh
-		$ openssl genrsa -3 -out enclave-key.pem 3072
-		$ ../../../../scripts/generate-password.sh <used_password_when_generate_keys>
-		```
-	2. Deploy as secrets for Kubernetes
-		```
-		$ kubectl apply -f keys/keys.yaml
-		$ kubectl apply -f password/password.yaml
-		```
-
-4. In `values.yaml`, configure pulled image name, path of `enclave-key.pem` generated in step 3 and path of script `start-all-but-flink.sh`.
-5. If kernel version is 5.11+ with built-in SGX support, create soft links for SGX device
-	```
-	$ sudo ln -s /dev/sgx_enclave /dev/sgx/enclave
-	$ sudo ln -s /dev/sgx_provision /dev/sgx/provision
-	```
-
-### Configure SGX mode ###
-In `templates/flink-configuration-configmap.yaml`, configure `sgx.mode` to `sgx` or `nonsgx` to determine whether to run the workload with SGX.
-
-### Configure Resource for Components ###
-1.  Configure jobmanager resource allocation in `templates/jobmanager-deployment.yaml`
-	```
-	...
-	env:
-      - name: SGX_MEM_SIZE
-        value: "16G"
-	...
-    resources:
-      requests:
-        cpu: 2
-        memory: 16Gi
-        sgx.intel.com/enclave: "1"
-        sgx.intel.com/epc: 16Gi
-      limits:
-        cpu: 2
-        memory: 16Gi
-        sgx.intel.com/enclave: "1"
-        sgx.intel.com/epc: 16Gi
-	...
-	```
-	
-2.  Configure Taskmanager resource allocation
-	- Memory allocation in `templates/flink-configuration-configmap.yaml`
-		```
-		taskmanager.memory.managed.size: 4gb
-	    taskmanager.memory.task.heap.size: 5gb
-	    xmx.size: 5g
-	 	```
-	- Pod resource allocation
-		
-		Use `taskmanager-deployment.yaml` instead of `taskmanager-statefulset.yaml` for functionality test
-		```
-		$ mv templates/taskmanager-statefulset.yaml ./
-		$ mv taskmanager-deployment.yaml.back templates/taskmanager-deployment.yaml
-		``` 
-		Configure resource in `templates/taskmanager-deployment.yaml` (allocate 16 cores in this example, please configure according to scenario)
-		```
-		...
-		env:
-	      - name: CORE_NUM
-	        value: "16"
-	      - name: SGX_MEM_SIZE
-	        value: "32G"
-		...
-	    resources:
-	      requests:
-	        cpu: 16
-	        memory: 32Gi
-	        sgx.intel.com/enclave: "1"
-	        sgx.intel.com/epc: 32Gi
-	      limits:
-	        cpu: 16
-	        memory: 32Gi
-	        sgx.intel.com/enclave: "1"
-	        sgx.intel.com/epc: 32Gi
-		...
-		```
-3. Configure Redis and client resource allocation
-   - SGX memory allocation in `start-all-but-flink.sh`
-	   ```
-		...
-		cd /ppml/trusted-realtime-ml/java
-		export SGX_MEM_SIZE=16G
-		test "$SGX_MODE" = sgx && ./init.sh
-		echo "java initiated"
-		...
-		```
-   - Pod resource allocation in `templates/master-deployment.yaml`
-		```
-		...
-		env:
-	      - name: CORE_NUM  #batchsize per instance
-	        value: "16"
-		...
-	    resources:
-	      requests:
-	        cpu: 12
-	        memory: 32Gi
-	        sgx.intel.com/enclave: "1"
-	        sgx.intel.com/epc: 32Gi
-	      limits:
-	        cpu: 12
-	        memory: 32Gi
-	        sgx.intel.com/enclave: "1"
-	        sgx.intel.com/epc: 32Gi
-		...
-		```
-
-### Deploy Cluster Serving ###
-1. Deploy all components and start job
-	1. Download helm from [release page](https://github.com/helm/helm/releases) and install
-	2. Deploy cluster serving
-		```
-		$ helm install ppml ./
-		```   
-2. Port forwarding
-
-   Set up port forwarding of jobmanager Rest port for access to Flink WebUI on host
-   1. Run `kubectl port-forward <flink-jobmanager-pod> --address 0.0.0.0 8081:8081` to forward jobmanager’s web UI port to 8081 on host.
-   2. Navigate to `http://<host-IP>:8081` in web browser to check status of Flink cluster and job.
-3. Performance benchmark
-	```
-	$ kubectl exec <master-deployment-pod> -it -- bash
-	$ cd /ppml/trusted-realtime-ml/java/work/benchmark/
-	$ bash init-benchmark.sh
-	$ python3 e2e_throughput.py -n <image_num> -i ../data/ILSVRC2012_val_00000001.JPEG
-	```
-	The `e2e_throughput.py` script pushes test image for `-n` times (default 1000 if not manually set), and time the process from push images (enqueue) to retrieve all inference results (dequeue), to calculate cluster serving end-to-end throughput. The output should look like `Served xxx images in xxx sec, e2e throughput is xxx images/sec`
diff --git a/docs/readthedocs/source/doc/PPML/VFL/overview.md b/docs/readthedocs/source/doc/PPML/VFL/overview.md
deleted file mode 100644
index b1345a25..00000000
--- a/docs/readthedocs/source/doc/PPML/VFL/overview.md
+++ /dev/null
@@ -1,23 +0,0 @@
-# Vertical Federated Learning
-Vertical Federated Learning (VFL) is a federated machine learning case where multiple data sets share the same sample ID space but differ in feature space. 
-
-VFL is supported in BigDL PPML. It allows users to train a federated machine learning model where data features are held by different parties. In BigDL PPML, the following VFL scenarios are supported.
-* Private Set Intersection: To get data intersection of different VFL parties.
-* Neural Network Model: To train common neural network model with Pytorch or Tensorflow backend across VFL parties.
-* FGBoost Model: To train gradient boosted decision tree (GBDT) model across multiple VFL parties.
-
-## Quick Start Examples
-For each scenario, an quick start example is available in following links.
-* [Private Set Intersection](https://github.com/intel-analytics/BigDL/blob/main/python/ppml/example/psi/psi-tutorial.md): A PSI example of getting intersection of two parties
-* [Pytorch Neural Network Model](https://github.com/intel-analytics/BigDL/blob/main/python/ppml/example/pytorch_nn_lr/pytorch-nn-lr-tutorial.md): An Pytorch based Logistic Regression application by two parties
-* [Tensorflow Neural Network Model](https://github.com/intel-analytics/BigDL/blob/main/python/ppml/example/tensorflow_nn_lr/tensorflow-nn-lr-tutorial.md): An Tensorflow based Logistic Regression application by two parties
-* [FGBoost Model](https://github.com/intel-analytics/BigDL/blob/main/python/ppml/example/fgboost_regression/fgboost-tutorial.md): An federated Gradient Boosted Regression Tree application by two parties
-
-## System Architecture
-The high-level architecture is shown in the diagram below. This includes the components of the BigDL PPML FL and [SGX](https://www.intel.com/content/www/us/en/developer/tools/software-guard-extensions/overview.html) for Privacy Preservation.
-
-![](../images/fl_architecture.png)
-
-## Next steps
-For detailed usage of BigDL PPML VFL, please see [User Guide](user_guide.md)  
-For BigDL PPML VFL with Homomorphic Encryption, please see [VFL HE](vfl_he.md) 
diff --git a/docs/readthedocs/source/doc/PPML/VFL/user_guide.md b/docs/readthedocs/source/doc/PPML/VFL/user_guide.md
deleted file mode 100644
index c49a9a20..00000000
--- a/docs/readthedocs/source/doc/PPML/VFL/user_guide.md
+++ /dev/null
@@ -1,29 +0,0 @@
-# BigDL PPML VFL User Guide
-## Deployment
-### SGX
-FL Server is protected by SGX, please see [PPML Prerequisite](https://github.com/intel-analytics/BigDL/blob/main/docs/readthedocs/source/doc/PPML/Overview/ppml.md#21-prerequisite) to get SGX environment ready.
-
-### FL Server
-You could set configurations of FL Server by editting `ppml-conf.yaml`
-#### Configuration
-##### clientNum
-an integer, the total client number of this FL application
-##### serverPort
-an integer, the port used by FL Server
-##### privateKeyFilePath
-a string, the file path of TLS private key
-##### certChainFilePath
-a string, the file path of TLS certificate chain
-#### Start
-You can run FL Server in SGX with the following command:
-```bash
-docker exec -it YOUR_DOCKER bash /ppml/trusted-big-data-ml/work/start-scripts/start-python-fl-server-sgx.sh -p 8980 -c 2
-```
-You can also set port with `-p` and set client number with `-c` while the default settings are `port=8980` and `client-num=2`.
-
-## Programming Guide
-Once the FL Server deployment is ready, you can write the client code and start your FL application. 
-
-You could see the [examples](overview.md#quick-start-examples) in overview for basic usages of the APIs.
-
-You could check [API Doc]() for more details.
diff --git a/docs/readthedocs/source/doc/PPML/VFL/vfl_he.md b/docs/readthedocs/source/doc/PPML/VFL/vfl_he.md
deleted file mode 100644
index 7d80f71b..00000000
--- a/docs/readthedocs/source/doc/PPML/VFL/vfl_he.md
+++ /dev/null
@@ -1,16 +0,0 @@
-# Vertical Federated Learning with Homomorphic Encryption
-Vertical Federated Learning (VFL) is a federated machine learning case where multiple data sets share the same sample ID space but differ in feature space. To protect user data, data(partial output) passed to server should be encrytped, and server should be trusted and running in SGX environment. See the diagram below:
-![](../images/fl_architecture.png)
-
-In some cases, third party doesn't has a trusted computing environment, to run the BigDL FL server. So we introduce a new solution using Homomorphic Encryption to protect the data passed to FL server.
-
-## System Architecture
-The high-level architecture is shown in the diagram below.   
-![](../images/fl_ckks.PNG)  
-Different from VFL with SGX, this solution will encrypt all the data passed to FL server, using CKKS encryptor. Server only holds computing secrets to compute loss and gradient with the cipher data, server has not secrets to see what's inside the cipher data. So the data passed to server is very safe, even FL server is not protected by SGX.
-
-## Quick Start Examples
-* [VFL with Homomorphic Encryption](https://github.com/intel-analytics/BigDL/blob/main/scala/ppml/src/main/scala/com/intel/analytics/bigdl/ppml/fl/example/ckks/README.md): A example of VFL Logistic Regression with HE on census dataset.
-
-
-
diff --git a/docs/readthedocs/source/doc/PPML/images/fl_architecture.png b/docs/readthedocs/source/doc/PPML/images/fl_architecture.png
deleted file mode 100644
index 2c8dcc67..00000000
Binary files a/docs/readthedocs/source/doc/PPML/images/fl_architecture.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/PPML/images/fl_ckks.PNG b/docs/readthedocs/source/doc/PPML/images/fl_ckks.PNG
deleted file mode 100644
index 8e7a8f5a..00000000
Binary files a/docs/readthedocs/source/doc/PPML/images/fl_ckks.PNG and /dev/null differ
diff --git a/docs/readthedocs/source/doc/PPML/images/occlum_maa.png b/docs/readthedocs/source/doc/PPML/images/occlum_maa.png
deleted file mode 100644
index e0954880..00000000
Binary files a/docs/readthedocs/source/doc/PPML/images/occlum_maa.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/PPML/images/ppml_azure_latest.png b/docs/readthedocs/source/doc/PPML/images/ppml_azure_latest.png
deleted file mode 100644
index 8e67c387..00000000
Binary files a/docs/readthedocs/source/doc/PPML/images/ppml_azure_latest.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/PPML/images/ppml_azure_workflow.png b/docs/readthedocs/source/doc/PPML/images/ppml_azure_workflow.png
deleted file mode 100644
index cc975ea4..00000000
Binary files a/docs/readthedocs/source/doc/PPML/images/ppml_azure_workflow.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/PPML/images/ppml_build_deploy.png b/docs/readthedocs/source/doc/PPML/images/ppml_build_deploy.png
deleted file mode 100644
index 09bb6c29..00000000
Binary files a/docs/readthedocs/source/doc/PPML/images/ppml_build_deploy.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/PPML/images/ppml_dev_basic.png b/docs/readthedocs/source/doc/PPML/images/ppml_dev_basic.png
deleted file mode 100644
index 53627ebe..00000000
Binary files a/docs/readthedocs/source/doc/PPML/images/ppml_dev_basic.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/PPML/images/ppml_scope.png b/docs/readthedocs/source/doc/PPML/images/ppml_scope.png
deleted file mode 100644
index c0215c20..00000000
Binary files a/docs/readthedocs/source/doc/PPML/images/ppml_scope.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/PPML/images/ppml_sgx_enclave.png b/docs/readthedocs/source/doc/PPML/images/ppml_sgx_enclave.png
deleted file mode 100644
index 5e43a0a6..00000000
Binary files a/docs/readthedocs/source/doc/PPML/images/ppml_sgx_enclave.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/PPML/images/ppml_test_dev.png b/docs/readthedocs/source/doc/PPML/images/ppml_test_dev.png
deleted file mode 100644
index 4421af4b..00000000
Binary files a/docs/readthedocs/source/doc/PPML/images/ppml_test_dev.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/PPML/images/spark_sgx_azure.png b/docs/readthedocs/source/doc/PPML/images/spark_sgx_azure.png
deleted file mode 100644
index 0796e46a..00000000
Binary files a/docs/readthedocs/source/doc/PPML/images/spark_sgx_azure.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/PPML/images/spark_sgx_occlum.png b/docs/readthedocs/source/doc/PPML/images/spark_sgx_occlum.png
deleted file mode 100755
index 0142565e..00000000
Binary files a/docs/readthedocs/source/doc/PPML/images/spark_sgx_occlum.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/PPML/index.rst b/docs/readthedocs/source/doc/PPML/index.rst
deleted file mode 100644
index a8e32ccc..00000000
--- a/docs/readthedocs/source/doc/PPML/index.rst
+++ /dev/null
@@ -1,71 +0,0 @@
-BigDL-PPML
-=========================
-
-Protecting privacy and confidentiality is critical for large-scale data analysis and machine learning. BigDL PPML (BigDL Privacy Preserving Machine Learning) combines various low-level hardware and software security technologies (e.g., Intel® Software Guard Extensions (Intel® SGX), Security Key Management, Remote Attestation, Data Encryption, Federated Learning, etc.) so that users can continue applying standard Big Data and AI technologies (such as Apache Spark, Apache Flink, TensorFlow, PyTorch, etc.) without sacrificing privacy.
-
-----------------------
-
-
-.. grid:: 1 2 2 2
-    :gutter: 2
-
-    .. grid-item-card::
-
-        **Get Started**
-        ^^^
-
-        Documents in these sections helps you getting started quickly with PPML.
-
-        +++
-
-        :bdg-link:`Installation <./Overview/install.html>` |
-        :bdg-link:`Hello World Example  <./Overview/quicktour.html>`
-
-
-    .. grid-item-card::
-
-        **User Guide**
-        ^^^
-
-        Provides you with in-depth information about PPML features and concepts and step-by-step guides.
-
-        +++
-
-        :bdg-link:`Introduction <./Overview/intro.html>` |
-        :bdg-link:`Advanced Topics <./Overview/misc.html>`
-
-
-    .. grid-item-card::
-
-        **Tutorials**
-        ^^^
-
-        PPML Tutorials and Examples.
-
-        +++
-
-        :bdg-link:`End-to-End Example <./Overview/examples.html>` |
-        :bdg-link:`More Examples <https://github.com/intel-analytics/BigDL/blob/main/ppml/docs/examples.md>`
-
-    .. grid-item-card::
-
-        **Videos**
-        ^^^
-
-        Videos and Demos helps you quick understand the architecture and start hands-on work.
-
-        +++
-
-        :bdg-link:`Introduction <./Overview/intro.html#what-is-bigdl-ppml>` |
-        :bdg-link:`E2E Workflow <./QuickStart/end-to-end.html#e2e-architecture-overview>` |
-        :bdg-link:`E2E Demo <./QuickStart/end-to-end.html#video-demo>`
-
-
-
-
-
-
-..  toctree::
-    :hidden:
-
-    BigDL-PPML Document <self>
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/PythonAPI/LLM/index.rst b/docs/readthedocs/source/doc/PythonAPI/LLM/index.rst
index 6d6e38e1..6b6f69dd 100644
--- a/docs/readthedocs/source/doc/PythonAPI/LLM/index.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/LLM/index.rst
@@ -1,4 +1,4 @@
-BigDL-LLM API
+IPEX-LLM API
 ==================
 
 .. toctree::
diff --git a/docs/readthedocs/source/doc/PythonAPI/LLM/langchain.rst b/docs/readthedocs/source/doc/PythonAPI/LLM/langchain.rst
index bf0fa88d..307c6d14 100644
--- a/docs/readthedocs/source/doc/PythonAPI/LLM/langchain.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/LLM/langchain.rst
@@ -1,4 +1,4 @@
-BigDL-LLM LangChain API
+IPEX-LLM LangChain API
 =====================
 
 LLM Wrapper of LangChain
@@ -7,7 +7,7 @@ LLM Wrapper of LangChain
 Hugging Face ``transformers`` Format
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-BigDL-LLM provides ``TransformersLLM`` and ``TransformersPipelineLLM``, which implement the standard interface of LLM wrapper of LangChain.
+IPEX-LLM provides ``TransformersLLM`` and ``TransformersPipelineLLM``, which implement the standard interface of LLM wrapper of LangChain.
 
 .. tabs::
 
@@ -37,7 +37,7 @@ For ``llama``/``chatglm``/``bloom``/``gptneox``/``starcoder`` model families, yo
 
     .. tab:: Llama
 
-        .. autoclass:: ipex_llm.langchain.llms.bigdlllm.LlamaLLM
+        .. autoclass:: ipex_llm.langchain.llms.ipexllm.LlamaLLM
             :members:
             :undoc-members:
             :show-inheritance:
@@ -49,7 +49,7 @@ For ``llama``/``chatglm``/``bloom``/``gptneox``/``starcoder`` model families, yo
 
     .. tab:: ChatGLM
 
-        .. autoclass:: ipex_llm.langchain.llms.bigdlllm.ChatGLMLLM
+        .. autoclass:: ipex_llm.langchain.llms.ipexllm.ChatGLMLLM
             :members:
             :undoc-members:
             :show-inheritance:
@@ -61,7 +61,7 @@ For ``llama``/``chatglm``/``bloom``/``gptneox``/``starcoder`` model families, yo
 
     .. tab:: Bloom
 
-        .. autoclass:: ipex_llm.langchain.llms.bigdlllm.BloomLLM
+        .. autoclass:: ipex_llm.langchain.llms.ipexllm.BloomLLM
             :members:
             :undoc-members:
             :show-inheritance:
@@ -73,7 +73,7 @@ For ``llama``/``chatglm``/``bloom``/``gptneox``/``starcoder`` model families, yo
 
     .. tab:: Gptneox
 
-        .. autoclass:: ipex_llm.langchain.llms.bigdlllm.GptneoxLLM
+        .. autoclass:: ipex_llm.langchain.llms.ipexllm.GptneoxLLM
             :members:
             :undoc-members:
             :show-inheritance:
@@ -85,7 +85,7 @@ For ``llama``/``chatglm``/``bloom``/``gptneox``/``starcoder`` model families, yo
 
     .. tab:: Starcoder
 
-        .. autoclass:: ipex_llm.langchain.llms.bigdlllm.StarcoderLLM
+        .. autoclass:: ipex_llm.langchain.llms.ipexllm.StarcoderLLM
             :members:
             :undoc-members:
             :show-inheritance:
@@ -117,7 +117,7 @@ For ``llama``/``bloom``/``gptneox``/``starcoder`` model families, you could also
 
     .. tab:: Llama
 
-        .. autoclass:: ipex_llm.langchain.embeddings.bigdlllm.LlamaEmbeddings
+        .. autoclass:: ipex_llm.langchain.embeddings.ipexllm.LlamaEmbeddings
             :members:
             :undoc-members:
             :show-inheritance:
@@ -129,7 +129,7 @@ For ``llama``/``bloom``/``gptneox``/``starcoder`` model families, you could also
 
     .. tab:: Bloom
 
-        .. autoclass:: ipex_llm.langchain.embeddings.bigdlllm.BloomEmbeddings
+        .. autoclass:: ipex_llm.langchain.embeddings.ipexllm.BloomEmbeddings
             :members:
             :undoc-members:
             :show-inheritance:
@@ -141,7 +141,7 @@ For ``llama``/``bloom``/``gptneox``/``starcoder`` model families, you could also
 
     .. tab:: Gptneox
 
-        .. autoclass:: ipex_llm.langchain.embeddings.bigdlllm.GptneoxEmbeddings
+        .. autoclass:: ipex_llm.langchain.embeddings.ipexllm.GptneoxEmbeddings
             :members:
             :undoc-members:
             :show-inheritance:
@@ -153,7 +153,7 @@ For ``llama``/``bloom``/``gptneox``/``starcoder`` model families, you could also
 
     .. tab:: Starcoder
 
-        .. autoclass:: ipex_llm.langchain.embeddings.bigdlllm.StarcoderEmbeddings
+        .. autoclass:: ipex_llm.langchain.embeddings.ipexllm.StarcoderEmbeddings
             :members:
             :undoc-members:
             :show-inheritance:
diff --git a/docs/readthedocs/source/doc/PythonAPI/LLM/optimize.rst b/docs/readthedocs/source/doc/PythonAPI/LLM/optimize.rst
index d979376e..fc861cf2 100644
--- a/docs/readthedocs/source/doc/PythonAPI/LLM/optimize.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/LLM/optimize.rst
@@ -1,10 +1,10 @@
-BigDL-LLM PyTorch API
+IPEX-LLM PyTorch API
 =====================
 
 Optimize Model
 ----------------------------------------
 
-You can run any PyTorch model with ``optimize_model`` through only one-line code change to benefit from BigDL-LLM optimization, regardless of the library or API you are using.
+You can run any PyTorch model with ``optimize_model`` through only one-line code change to benefit from IPEX-LLM optimization, regardless of the library or API you are using.
 
 .. automodule:: ipex_llm
     :members: optimize_model
diff --git a/docs/readthedocs/source/doc/PythonAPI/LLM/transformers.rst b/docs/readthedocs/source/doc/PythonAPI/LLM/transformers.rst
index 711f397a..e4f2d539 100644
--- a/docs/readthedocs/source/doc/PythonAPI/LLM/transformers.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/LLM/transformers.rst
@@ -1,10 +1,10 @@
-BigDL-LLM ``transformers``-style API
+IPEX-LLM ``transformers``-style API
 ====================================
 
 Hugging Face ``transformers`` AutoModel
 ------------------------------------
 
-You can apply BigDL-LLM optimizations on any Hugging Face Transformers models by using the standard AutoModel APIs.
+You can apply IPEX-LLM optimizations on any Hugging Face Transformers models by using the standard AutoModel APIs.
 
 
 AutoModelForCausalLM
diff --git a/docs/readthedocs/source/doc/Serving/Example/cluster-serving-http-example.ipynb b/docs/readthedocs/source/doc/Serving/Example/cluster-serving-http-example.ipynb
deleted file mode 100644
index 7f007c82..00000000
--- a/docs/readthedocs/source/doc/Serving/Example/cluster-serving-http-example.ipynb
+++ /dev/null
@@ -1,857 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "In this example, we will use tensorflow.keras package to create a keras image classification application using model MobileNetV2, and transfer the application to Cluster Serving step by step."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Original Keras application\n",
-    "We will first show an original Keras application, which download the data and preprocess it, then create the MobileNetV2 model to predict."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import tensorflow as tf\n",
-    "import os\n",
-    "import PIL"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "'2.4.1'"
-      ]
-     },
-     "execution_count": 2,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "tf.__version__"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Found 1000 images belonging to 2 classes.\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Obtain data from url:\"https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip\"\n",
-    "zip_file = tf.keras.utils.get_file(origin=\"https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip\",\n",
-    "                                   fname=\"cats_and_dogs_filtered.zip\", extract=True)\n",
-    "\n",
-    "# Find the directory of validation set\n",
-    "base_dir, _ = os.path.splitext(zip_file)\n",
-    "test_dir = os.path.join(base_dir, 'validation')\n",
-    "# Set images size to 160x160x3\n",
-    "image_size = 160\n",
-    "\n",
-    "# Rescale all images by 1./255 and apply image augmentation\n",
-    "test_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)\n",
-    "\n",
-    "# Flow images using generator to the test_generator\n",
-    "test_generator = test_datagen.flow_from_directory(\n",
-    "                test_dir,\n",
-    "                target_size=(image_size, image_size),\n",
-    "                batch_size=1,\n",
-    "                class_mode='binary')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Create the base model from the pre-trained model MobileNet V2\n",
-    "IMG_SHAPE=(160,160,3)\n",
-    "model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,\n",
-    "                                               include_top=False,\n",
-    "                                               weights='imagenet')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "In keras, input could be ndarray, or generator. We could just use `model.predict(test_generator)`. But to simplify, here we just input the first record to model."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "[[[[0.         0.         0.         ... 0.         0.\n",
-      "    0.        ]\n",
-      "   [0.         0.         0.         ... 0.         0.\n",
-      "    0.        ]\n",
-      "   [0.         0.         0.         ... 0.         0.\n",
-      "    0.        ]\n",
-      "   [0.         0.         0.         ... 0.         0.\n",
-      "    0.        ]\n",
-      "   [0.         0.         0.         ... 0.         0.\n",
-      "    0.        ]]\n",
-      "\n",
-      "  [[0.         0.         0.         ... 0.         0.\n",
-      "    0.        ]\n",
-      "   [0.997349   0.         0.         ... 0.         0.96874905\n",
-      "    0.        ]\n",
-      "   [1.8385804  0.3380084  2.4926844  ... 0.         0.14267397\n",
-      "    0.        ]\n",
-      "   [0.         0.         3.576158   ... 0.         0.\n",
-      "    0.        ]\n",
-      "   [0.         0.         0.         ... 0.         0.\n",
-      "    0.        ]]\n",
-      "\n",
-      "  [[0.         0.         0.         ... 0.         0.\n",
-      "    0.        ]\n",
-      "   [0.         0.0062952  0.         ... 0.         0.15311003\n",
-      "    0.        ]\n",
-      "   [0.         1.7324333  1.1691046  ... 0.         0.9847245\n",
-      "    0.        ]\n",
-      "   [0.         0.84404707 3.2351522  ... 0.         0.\n",
-      "    0.        ]\n",
-      "   [0.         0.         0.         ... 0.         0.\n",
-      "    0.        ]]\n",
-      "\n",
-      "  [[0.         0.         0.         ... 0.         0.\n",
-      "    0.3681116 ]\n",
-      "   [0.         3.3440204  0.5372138  ... 0.         0.\n",
-      "    0.79515934]\n",
-      "   [0.         3.0932055  3.5937624  ... 0.         0.\n",
-      "    0.66862965]\n",
-      "   [0.         1.4007983  0.         ... 0.         0.\n",
-      "    2.8901892 ]\n",
-      "   [0.         0.         0.         ... 0.         0.\n",
-      "    0.        ]]\n",
-      "\n",
-      "  [[0.         0.         0.         ... 0.         0.\n",
-      "    0.73307323]\n",
-      "   [0.         0.         0.         ... 0.         0.\n",
-      "    2.9129057 ]\n",
-      "   [0.         0.         0.6134901  ... 0.         0.\n",
-      "    2.7102432 ]\n",
-      "   [0.         0.         0.         ... 0.         0.\n",
-      "    1.8489733 ]\n",
-      "   [0.         0.         0.         ... 0.         0.\n",
-      "    0.22623205]]]]\n"
-     ]
-    }
-   ],
-   "source": [
-    "prediction=model.predict(test_generator.next()[0])\n",
-    "print(prediction)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Great! Now the Keras application is completed. "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Export TensorFlow Saved Model\n",
-    "Next, we transfer the application to Cluster Serving. The first step is to save the model to SavedModel format."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "WARNING:tensorflow:From /home/user/anaconda3/envs/rec/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:1817: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.\n",
-      "Instructions for updating:\n",
-      "If using Keras pass *_constraint arguments to layers.\n",
-      "INFO:tensorflow:Assets written to: /tmp/transfer_learning_mobilenetv2/assets\n",
-      "assets\tsaved_model.pb\tvariables\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Save trained model to ./transfer_learning_mobilenetv2\n",
-    "model.save('/tmp/transfer_learning_mobilenetv2')\n",
-    "! ls /tmp/transfer_learning_mobilenetv2"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Deploy Cluster Serving\n",
-    "After model prepared, we start to deploy it on Cluster Serving.\n",
-    "\n",
-    "First install Cluster Serving"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Requirement already satisfied: bigdl-serving in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (0.9.0)\n",
-      "Requirement already satisfied: opencv-python in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from bigdl-serving) (4.5.1.48)\n",
-      "Requirement already satisfied: httpx in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from bigdl-serving) (0.16.1)\n",
-      "Requirement already satisfied: pyarrow in /home/user/.local/lib/python3.6/site-packages (from bigdl-serving) (1.0.1)\n",
-      "Requirement already satisfied: redis in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from bigdl-serving) (3.5.3)\n",
-      "Requirement already satisfied: pyyaml in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from bigdl-serving) (5.4.1)\n",
-      "Requirement already satisfied: rfc3986[idna2008]<2,>=1.3 in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from httpx->bigdl-serving) (1.4.0)\n",
-      "Requirement already satisfied: certifi in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from httpx->bigdl-serving) (2020.12.5)\n",
-      "Requirement already satisfied: httpcore==0.12.* in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from httpx->bigdl-serving) (0.12.3)\n",
-      "Requirement already satisfied: sniffio in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from httpx->bigdl-serving) (1.2.0)\n",
-      "Requirement already satisfied: h11==0.* in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from httpcore==0.12.*->httpx->bigdl-serving) (0.12.0)\n",
-      "Requirement already satisfied: contextvars>=2.1 in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from sniffio->httpx->bigdl-serving) (2.4)\n",
-      "Requirement already satisfied: immutables>=0.9 in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from contextvars>=2.1->sniffio->httpx->bigdl-serving) (0.14)\n",
-      "Requirement already satisfied: idna in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from rfc3986[idna2008]<2,>=1.3->httpx->bigdl-serving) (2.10)\n",
-      "Requirement already satisfied: numpy>=1.13.3 in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from opencv-python->bigdl-serving) (1.19.2)\n",
-      "\u001b[33mWARNING: You are using pip version 20.3.3; however, version 21.0.1 is available.\n",
-      "You should consider upgrading via the '/home/user/anaconda3/envs/rec/bin/python -m pip install --upgrade pip' command.\u001b[0m\n"
-     ]
-    }
-   ],
-   "source": [
-    "! pip install bigdl-serving"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Cluster Serving has been properly set up.\n",
-      "You did not specify BIGDL_VERSION, will download 0.9.0\n",
-      "BIGDL_VERSION is 0.9.0\n",
-      "BIGDL_VERSION is 0.12.1\n",
-      "SPARK_VERSION is 2.4.3\n",
-      "2.4\n",
-      "--2021-02-07 10:01:46--  https://repo1.maven.org/maven2/com/intel/analytics/bigdl/bigdl-spark_2.4.3/0.9.0/bigdl-spark_2.4.3-0.9.0-serving.jar\n",
-      "Resolving child-prc.intel.com (child-prc.intel.com)... You are installing Cluster Serving by pip, downloading...\n",
-      "\n",
-      "SIGHUP received.\n",
-      "Redirecting output to ‘wget-log.2’.\n"
-     ]
-    }
-   ],
-   "source": [
-    "# we go to a new directory and initialize the environment\n",
-    "! mkdir cluster-serving\n",
-    "os.chdir('cluster-serving')\n",
-    "! cluster-serving-init"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "  2150K .......... .......... .......... .......... ..........  0% 27.0K 5h37m\r\n",
-      "  2200K .......... .......... .......... .......... ..........  0% 33.6K 5h36m\r\n",
-      "  2250K .......... .......... .......... .......... ..........  0% 27.3K 5h37m\r\n",
-      "  2300K .......... .......... .......... .......... ..........  0% 30.3K 5h36m\r\n",
-      "  2350K .......... .......... .......... .......... ..........  0% 29.7K 5h36m\r\n",
-      "  2400K .......... .......... .......... .......... ..........  0% 23.7K 5h38m\r\n",
-      "  2450K .......... .......... .......... .......... ..........  0% 23.4K 5h39m\r\n",
-      "  2500K .......... .......... .......... .......... ..........  0% 23.4K 5h41m\r\n",
-      "  2550K .......... .......... .......... .......... ..........  0% 22.3K 5h43m\r\n",
-      "  2600K .......... .......... .......... ....."
-     ]
-    }
-   ],
-   "source": [
-    "! tail wget-log.2"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# if you encounter slow download issue like above, you can just use following command to download\n",
-    "# ! wget https://repo1.maven.org/maven2/com/intel/analytics/bigdl/bigdl-spark_2.4.3/0.9.0/bigdl-spark_2.4.3-0.9.0-serving.jar\n",
-    "\n",
-    "# if you are using wget to download, call mv *serving.jar bigdl.jar again after downloaded."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "config.yaml  bigdl.jar\r\n"
-     ]
-    }
-   ],
-   "source": [
-    "# After initialization finished, check the directory\n",
-    "! ls"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We config the model path in `config.yaml` to following (the detail of config is at [Cluster Serving Configuration](https://github.com/intel-analytics/bigdl/blob/master/docs/docs/ClusterServingGuide/ProgrammingGuide.md#2-configuration))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "## BigDL Cluster Serving\n",
-    "\n",
-    "model:\n",
-    "  # model path must be provided\n",
-    "  path: /tmp/transfer_learning_mobilenetv2"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "## BigDL Cluster Serving\r\n",
-      "\r\n",
-      "model:\r\n",
-      "  # model path must be provided\r\n",
-      "  path: /tmp/transfer_learning_mobilenetv2\r\n",
-      "  # name, default is serving_stream, you need to specify if running multiple servings\r\n",
-      "  name:\r\n",
-      "data:\r\n",
-      "  # default, localhost:6379\r\n",
-      "  src:\r\n"
-     ]
-    }
-   ],
-   "source": [
-    "! head config.yaml"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Start Cluster Serving\n",
-    "\n",
-    "Cluster Serving requires Flink and Redis installed, and corresponded environment variables set, check [Cluster Serving Installation Guide](https://github.com/intel-analytics/bigdl/blob/master/docs/docs/ClusterServingGuide/ProgrammingGuide.md#1-installation) for detail.\n",
-    "\n",
-    "Flink cluster should start before Cluster Serving starts, if Flink cluster is not started, call following to start a local Flink cluster."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Starting cluster.\n",
-      "Starting standalonesession daemon on host my-PC.\n",
-      "Starting taskexecutor daemon on host my-PC.\n"
-     ]
-    }
-   ],
-   "source": [
-    "! $FLINK_HOME/bin/start-cluster.sh"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "After configuration, start Cluster Serving by `cluster-serving-start` (the detail is at [Cluster Serving Programming Guide](https://github.com/intel-analytics/bigdl/blob/master/docs/docs/ClusterServingGuide/ProgrammingGuide.md#3-launching-service))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "model_path=\"/tmp/transfer_learning_mobilenetv2\"\n",
-      "redis_timeout=\"5000\"\n",
-      "Redis maxmemory is not set, using default value 8G\n",
-      "redis server started, please check log in redis.log\n",
-      "OK\n",
-      "OK\n",
-      "OK\n",
-      "redis config maxmemory set to 8G\n",
-      "OK\n",
-      "OK\n",
-      "SLF4J: Class path contains multiple SLF4J bindings.\n",
-      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/bigdl-spark_2.4.3-0.9.0-SNAPSHOT-serving.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
-      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
-      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
-      "SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\n",
-      "SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]\n",
-      "log4j:WARN No appenders could be found for logger (org.apache.flink.client.cli.CliFrontend).\n",
-      "log4j:WARN Please initialize the log4j system properly.\n",
-      "log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.\n",
-      "Starting new Cluster Serving job.\n",
-      "Cluster Serving job submitted, check log in log-cluster_serving-serving_stream.txt\n",
-      "To list Cluster Serving job status, use cluster-serving-cli list\n",
-      "SLF4J: Class path contains multiple SLF4J bindings.\n",
-      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/bigdl-spark_2.4.3-0.9.0-SNAPSHOT-serving.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
-      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
-      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
-      "SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\n",
-      "SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]\n",
-      "log4j:WARN No appenders could be found for logger (org.apache.flink.client.cli.CliFrontend).\n",
-      "log4j:WARN Please initialize the log4j system properly.\n",
-      "log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.\n",
-      "[Full GC (Metadata GC Threshold)  32304K->20432K(1030144K), 0.0213821 secs]\n"
-     ]
-    }
-   ],
-   "source": [
-    "! cluster-serving-start"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Prediction using Cluster Serving\n",
-    "Next we start Cluster Serving code at python client."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "redis group exist, will not create new one\n",
-      "redis group exist, will not create new one\n"
-     ]
-    }
-   ],
-   "source": [
-    "from bigdl.serving.client import InputQueue, OutputQueue\n",
-    "input_queue = InputQueue()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "In Cluster Serving, only NdArray is supported as input. Thus, we first transform the generator to ndarray (If you do not know how to transform your input to NdArray, you may get help at [data transform guide](https://github.com/intel-analytics/bigdl/tree/master/docs/docs/ClusterServingGuide/OtherFrameworkUsers#data))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 20,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "array([[[[0.41176474, 0.50980395, 0.5882353 ],\n",
-       "         [0.42352945, 0.47450984, 0.50980395],\n",
-       "         [0.4901961 , 0.5058824 , 0.5019608 ],\n",
-       "         ...,\n",
-       "         [0.5764706 , 0.6392157 , 0.7019608 ],\n",
-       "         [0.454902  , 0.5176471 , 0.5803922 ],\n",
-       "         [0.3647059 , 0.427451  , 0.4784314 ]],\n",
-       "\n",
-       "        [[0.31764707, 0.38431376, 0.4156863 ],\n",
-       "         [0.35686275, 0.38431376, 0.40784317],\n",
-       "         [0.34509805, 0.34509805, 0.3529412 ],\n",
-       "         ...,\n",
-       "         [0.5803922 , 0.64705884, 0.6862745 ],\n",
-       "         [0.48627454, 0.5529412 , 0.5921569 ],\n",
-       "         [0.48235297, 0.54509807, 0.59607846]],\n",
-       "\n",
-       "        [[0.4039216 , 0.4431373 , 0.44705886],\n",
-       "         [0.35686275, 0.36078432, 0.37647063],\n",
-       "         [0.46274513, 0.4431373 , 0.47058827],\n",
-       "         ...,\n",
-       "         [0.53333336, 0.6       , 0.6313726 ],\n",
-       "         [0.47450984, 0.5411765 , 0.5686275 ],\n",
-       "         [0.5137255 , 0.5764706 , 0.627451  ]],\n",
-       "\n",
-       "        ...,\n",
-       "\n",
-       "        [[0.44705886, 0.5019608 , 0.54509807],\n",
-       "         [0.42352945, 0.48627454, 0.5372549 ],\n",
-       "         [0.37647063, 0.43921572, 0.49803925],\n",
-       "         ...,\n",
-       "         [0.69411767, 0.69411767, 0.69411767],\n",
-       "         [0.6745098 , 0.6745098 , 0.68235296],\n",
-       "         [0.6392157 , 0.63529414, 0.6666667 ]],\n",
-       "\n",
-       "        [[0.3647059 , 0.41960788, 0.454902  ],\n",
-       "         [0.35686275, 0.427451  , 0.47450984],\n",
-       "         [0.3254902 , 0.3921569 , 0.454902  ],\n",
-       "         ...,\n",
-       "         [0.5647059 , 0.5647059 , 0.5647059 ],\n",
-       "         [0.627451  , 0.627451  , 0.63529414],\n",
-       "         [0.7176471 , 0.70980394, 0.76470596]],\n",
-       "\n",
-       "        [[0.34117648, 0.40784317, 0.43529415],\n",
-       "         [0.29803923, 0.37254903, 0.427451  ],\n",
-       "         [0.31764707, 0.3921569 , 0.45882356],\n",
-       "         ...,\n",
-       "         [0.454902  , 0.454902  , 0.46274513],\n",
-       "         [0.5803922 , 0.57254905, 0.6156863 ],\n",
-       "         [0.5137255 , 0.5019608 , 0.58431375]]]], dtype=float32)"
-      ]
-     },
-     "execution_count": 20,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "arr = test_generator.next()[0]\n",
-    "arr"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 23,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Write to Redis successful\n",
-      "redis group exist, will not create new one\n",
-      "Write to Redis successful\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Use async api to put and get, you have pass a name arg and use the name to get\n",
-    "input_queue.enqueue('my-input', t=arr)\n",
-    "output_queue = OutputQueue()\n",
-    "prediction = output_queue.query('my-input')\n",
-    "# Use sync api to predict, this will block until the result is get or timeout\n",
-    "prediction = input_queue.predict(arr)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 24,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "array([[[0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 1.3543907 ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 4.1898136 ,\n",
-       "         0.        , 0.        ]],\n",
-       "\n",
-       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 3.286649  , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 4.0817494 , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 3.3224926 , 0.        , ..., 1.4220613 ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 4.9100547 ,\n",
-       "         0.        , 0.        ]],\n",
-       "\n",
-       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 1.5577714 , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 1.767426  , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 2.3534465 , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.21401057,\n",
-       "         0.        , 0.        ]],\n",
-       "\n",
-       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.34797698, 0.        ],\n",
-       "        [0.        , 1.4496232 , 0.        , ..., 0.        ,\n",
-       "         1.6221215 , 0.        ],\n",
-       "        [0.        , 0.6171873 , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ]],\n",
-       "\n",
-       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         1.192298  , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ]]], dtype=float32)"
-      ]
-     },
-     "execution_count": 24,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "prediction"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "If everything works well, the result `prediction` would be the exactly the same NdArray object with the output of original Keras model."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Next is the way to use http service through python."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# start the http server via jar\n",
-    "# ! java -jar bigdl-spark_2.4.3-0.9.0-SNAPSHOT-http.jar"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "If you do not know how to find the jar or other http service, you may get help at [Cluster Serving http guide](https://github.com/intel-analytics/bigdl/blob/master/docs/docs/ClusterServingGuide/ProgrammingGuide.md#3-launching-service)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 25,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "welcome to BigDL web serving frontend"
-     ]
-    }
-   ],
-   "source": [
-    "! curl http://localhost:10020"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Cluster Serving provides an Python util `http_response_to_ndarray` which let user parse http response directly to ndarray, as following."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 26,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import json\n",
-    "import requests\n",
-    "import numpy as np\n",
-    "from bigdl.serving.client import http_response_to_ndarray"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 28,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "array([[[0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ]],\n",
-       "\n",
-       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ]],\n",
-       "\n",
-       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ]],\n",
-       "\n",
-       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.7070324 , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 1.9520156 , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ]],\n",
-       "\n",
-       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.45007578],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ]]])"
-      ]
-     },
-     "execution_count": 28,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "url = 'http://localhost:10020/predict'\n",
-    "d = json.dumps({\"instances\":[{\"floatTensor\": arr.tolist()}]})\n",
-    "r = requests.post(url, data=d)\n",
-    "\n",
-    "http_prediction = http_response_to_ndarray(r)\n",
-    "http_prediction"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 29,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# don't forget to delete the model you save for this tutorial\n",
-    "! rm -rf /tmp/transfer_learning_mobilenetv2"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "This is the end of this tutorial. If you have any question, you could raise an issue at [BigDL Github](https://github.com/intel-analytics/bigdl/issues)."
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.6.12"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}
diff --git a/docs/readthedocs/source/doc/Serving/Example/example.md b/docs/readthedocs/source/doc/Serving/Example/example.md
deleted file mode 100644
index f6df5f97..00000000
--- a/docs/readthedocs/source/doc/Serving/Example/example.md
+++ /dev/null
@@ -1,124 +0,0 @@
-# Cluster Serving Example
-
-There are some examples provided for new user or existing Tensorflow user.
-## Quick Start Example
-Following is the recommended quick start example to transfer a local Keras application to Cluster Serving.
-
-[keras-to-cluster-serving-example](https://github.com/intel-analytics/BigDL/blob/branch-2.0/docs/readthedocs/source/doc/Serving/Example/keras-to-cluster-serving-example.ipynb)
-
-## End-to-end Example
-### TFDataSet: 
-[l08c08_forecasting_with_lstm.py](https://github.com/intel-analytics/bigdl/blob/branch-2.0/docs/docs/ClusterServingGuide/OtherFrameworkUsers/l08c08_forecasting_with_lstm.py)
-### Tokenizer: 
-[l10c03_nlp_constructing_text_generation_model.py](https://github.com/intel-analytics/bigdl/tree/master/blob/branch-2.0/ClusterServingGuide/OtherFrameworkUsers/l10c03_nlp_constructing_text_generation_model.py) 
-### ImageDataGenerator: 
-[transfer_learning.py](https://github.com/intel-analytics/bigdl/blob/branch-2.0/docs/docs/ClusterServingGuide/OtherFrameworkUsers/transfer_learning.py)
-
-## Model/Data Convert Guide
-This guide is for users who:
-
-* have written local code of Tensorflow, Pytorch(to be added)
-* have used specified data type of a specific framework, e.g. TFDataSet
-* want to deploy the local code on Cluster Serving but do not know how to write client code (Cluster Serving takes Numpy Ndarray as input, other types need to transform in advance).
-
-**If you have the above needs but fail to find the solution below, please [create issue here](https://github.com/intel-analytics/bigdl/issues)
-
-## Tensorflow
-
-Model - includes savedModel, Frozen Graph (savedModel is recommended).
-
-Data - includes [TFDataSet](#tfdataset), [Tokenizer](#tokenizer), [ImageDataGenerator](#imagedatagenerator)
-
-Notes - includes tips to notice, includes [savedModel tips](#notes---use-savedmodel)
-
-### Model - ckpt to savedModel
-#### tensorflow all version
-This method works in all version of TF
-
-You need to create the graph, get the output layer, create place holder for input, load the ckpt then save the model
-```
-# --- code you need to write
-input_layer = tf.placeholder(...)
-model = YourModel(...)
-output_layer = model.your_output_layer()
-# --- code you need to write
-with tf.Session() as sess:
-    saver = tf.train.Saver()
-    saver.restore(sess, tf.train.latest_checkpoint(FLAGS.ckpt_path))
-    tf.saved_model.simple_save(sess,
-                               FLAGS.export_path,
-                               inputs={
-                                   'input_layer': input_layer
-                               },
-                               outputs={"output_layer": output_layer})
-```
-
-#### tensorflow >= 1.15
-This method works if you are familiar with savedModel signature, and tensorflow >= 1.15
-
-model graph could be load via `.meta`, and load ckpt then save the model, signature_def_map is required to provide
-```
-# provide signature first
-inputs = tf.placeholder(...)
-outputs = tf.add(inputs, inputs)
-tensor_info_input = tf.saved_model.utils.build_tensor_info(inputs)
-tensor_info_output = tf.saved_model.utils.build_tensor_info(outputs)
-
-prediction_signature = (
-  tf.saved_model.signature_def_utils.build_signature_def(
-      inputs={'x_input': tensor_info_input},
-      outputs={'y_output': tensor_info_output},
-      method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME))
-
-      
-# Your ckpt file is prefix.meta, prefix.index, etc
-ckpt_prefix = 'model/model.ckpt-xxxx'
-export_dir = 'saved_model'
-
-loaded_graph = tf.Graph()
-with tf.Session(graph=loaded_graph) as sess:
-    # load
-    loader = tf.train.import_meta_graph(ckpt_prefix + '.meta')
-    loader.restore(sess, ckpt_prefix)
-
-    # export
-    builder = tf.saved_model.builder.SavedModelBuilder(export_dir)
-    builder.add_meta_graph_and_variables(sess,
-                                         [tf.saved_model.tag_constants.TRAINING, tf.saved_model.tag_constants.SERVING],signature_def_map={
-      tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
-          prediction_signature 
-      }
-    )
-    builder.save()
-```
-### Model - Keras to savedModel
-#### tensorflow > 2.0
-```
-model = tf.keras.models.load_model("./model.h5")
-tf.saved_model.save(model, "saved_model")
-```
-### Model - ckpt to Frozen Graph
-[freeze checkpoint example](https://github.com/intel-analytics/bigdl/tree/master/python/orca/example/freeze_checkpoint)
-### Notes - Use SavedModel
-If model has single tensor input, then nothing to notice.
-
-**If model has multiple input, please notice following.**
-
-When export, savedModel would store the inputs in alphabetical order. Use `saved_model_cli show --dir . --all` to see the order. e.g.
-```
-signature_def['serving_default']:
-  The given SavedModel SignatureDef contains the following input(s):
-    inputs['id1'] tensor_info:
-        dtype: DT_INT32
-        shape: (-1, 512)
-        name: id1:0
-    inputs['id2'] tensor_info:
-        dtype: DT_INT32
-        shape: (-1, 512)
-        name: id2:0
-
-```
-
-when enqueue to Cluster Serving, follow this order
-### Data
-To transform following data type to Numpy Ndarray, following examples are provided
diff --git a/docs/readthedocs/source/doc/Serving/Example/keras-to-cluster-serving-example.ipynb b/docs/readthedocs/source/doc/Serving/Example/keras-to-cluster-serving-example.ipynb
deleted file mode 100644
index d1eeb518..00000000
--- a/docs/readthedocs/source/doc/Serving/Example/keras-to-cluster-serving-example.ipynb
+++ /dev/null
@@ -1,719 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "In this example, we will use tensorflow.keras package to create a keras image classification application using model MobileNetV2, and transfer the application to Cluster Serving step by step."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Original Keras application\n",
-    "We will first show an original Keras application, which download the data and preprocess it, then create the MobileNetV2 model to predict."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import tensorflow as tf\n",
-    "import os\n",
-    "import PIL"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "'2.2.0'"
-      ]
-     },
-     "execution_count": 2,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "tf.__version__"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Found 1000 images belonging to 2 classes.\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Obtain data from url:\"https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip\"\n",
-    "zip_file = tf.keras.utils.get_file(origin=\"https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip\",\n",
-    "                                   fname=\"cats_and_dogs_filtered.zip\", extract=True)\n",
-    "\n",
-    "# Find the directory of validation set\n",
-    "base_dir, _ = os.path.splitext(zip_file)\n",
-    "test_dir = os.path.join(base_dir, 'validation')\n",
-    "# Set images size to 160x160x3\n",
-    "image_size = 160\n",
-    "\n",
-    "# Rescale all images by 1./255 and apply image augmentation\n",
-    "test_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)\n",
-    "\n",
-    "# Flow images using generator to the test_generator\n",
-    "test_generator = test_datagen.flow_from_directory(\n",
-    "                test_dir,\n",
-    "                target_size=(image_size, image_size),\n",
-    "                batch_size=1,\n",
-    "                class_mode='binary')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Create the base model from the pre-trained model MobileNet V2\n",
-    "IMG_SHAPE=(160,160,3)\n",
-    "model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,\n",
-    "                                               include_top=False,\n",
-    "                                               weights='imagenet')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "In keras, input could be ndarray, or generator. We could just use `model.predict(test_generator)`. But to simplify, here we just input the first record to model."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "[[[[0.         0.         0.         ... 0.         0.\n",
-      "    0.        ]\n",
-      "   [0.         0.         0.         ... 0.         0.\n",
-      "    0.        ]\n",
-      "   [0.         0.         0.         ... 0.         0.\n",
-      "    0.        ]\n",
-      "   [0.         0.         0.8406992  ... 0.         0.\n",
-      "    0.        ]\n",
-      "   [0.         0.         0.         ... 0.         0.\n",
-      "    0.        ]]\n",
-      "\n",
-      "  [[0.         0.         0.         ... 0.         0.\n",
-      "    0.        ]\n",
-      "   [0.         0.         0.         ... 0.         0.81465054\n",
-      "    0.        ]\n",
-      "   [0.         0.         0.         ... 0.         0.6572695\n",
-      "    0.23970175]\n",
-      "   [0.         0.         0.         ... 0.         1.2423501\n",
-      "    0.8024192 ]\n",
-      "   [0.         0.         0.         ... 0.         0.\n",
-      "    0.        ]]\n",
-      "\n",
-      "  [[0.         0.         0.         ... 0.         0.\n",
-      "    0.        ]\n",
-      "   [0.         0.         0.         ... 0.         5.185735\n",
-      "    0.21723604]\n",
-      "   [0.         0.         0.         ... 0.         4.6399093\n",
-      "    0.40124178]\n",
-      "   [0.3284886  0.         0.         ... 0.         5.295811\n",
-      "    3.4133787 ]\n",
-      "   [0.         0.         0.         ... 0.         0.\n",
-      "    0.        ]]\n",
-      "\n",
-      "  [[0.         0.         0.         ... 0.         0.\n",
-      "    0.        ]\n",
-      "   [0.         0.         0.         ... 0.         0.52712107\n",
-      "    0.20341969]\n",
-      "   [0.         0.         0.         ... 0.         0.8279238\n",
-      "    0.42696333]\n",
-      "   [0.         0.         0.         ... 0.         1.0344229\n",
-      "    1.5225778 ]\n",
-      "   [0.         0.         0.         ... 0.         0.\n",
-      "    0.        ]]\n",
-      "\n",
-      "  [[0.         0.         0.         ... 0.         0.\n",
-      "    0.        ]\n",
-      "   [0.         0.         0.         ... 0.         0.\n",
-      "    1.3237557 ]\n",
-      "   [0.         0.         0.         ... 0.         0.\n",
-      "    1.3395147 ]\n",
-      "   [0.         0.         0.         ... 0.         0.\n",
-      "    0.        ]\n",
-      "   [0.         0.         0.         ... 0.         0.\n",
-      "    0.        ]]]]\n"
-     ]
-    }
-   ],
-   "source": [
-    "prediction=model.predict(test_generator.next()[0])\n",
-    "print(prediction)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Great! Now the Keras application is completed. "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Export TensorFlow SavedModel\n",
-    "Next, we transfer the application to Cluster Serving. The first step is to save the model to SavedModel format."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "WARNING:tensorflow:From /home/user/anaconda3/envs/rec/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:1817: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.\n",
-      "Instructions for updating:\n",
-      "If using Keras pass *_constraint arguments to layers.\n",
-      "INFO:tensorflow:Assets written to: /tmp/transfer_learning_mobilenetv2/assets\n",
-      "assets\tsaved_model.pb\tvariables\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Save trained model to ./transfer_learning_mobilenetv2\n",
-    "model.save('/tmp/transfer_learning_mobilenetv2')\n",
-    "! ls /tmp/transfer_learning_mobilenetv2"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Deploy Cluster Serving\n",
-    "After model prepared, we start to deploy it on Cluster Serving.\n",
-    "\n",
-    "First install Cluster Serving"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Requirement already satisfied: bigdl-serving in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (0.9.0)\n",
-      "Requirement already satisfied: opencv-python in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from bigdl-serving) (4.5.1.48)\n",
-      "Requirement already satisfied: httpx in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from bigdl-serving) (0.16.1)\n",
-      "Requirement already satisfied: pyarrow in /home/user/.local/lib/python3.6/site-packages (from bigdl-serving) (1.0.1)\n",
-      "Requirement already satisfied: redis in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from bigdl-serving) (3.5.3)\n",
-      "Requirement already satisfied: pyyaml in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from bigdl-serving) (5.4.1)\n",
-      "Requirement already satisfied: rfc3986[idna2008]<2,>=1.3 in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from httpx->bigdl-serving) (1.4.0)\n",
-      "Requirement already satisfied: certifi in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from httpx->bigdl-serving) (2020.12.5)\n",
-      "Requirement already satisfied: httpcore==0.12.* in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from httpx->bigdl-serving) (0.12.3)\n",
-      "Requirement already satisfied: sniffio in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from httpx->bigdl-serving) (1.2.0)\n",
-      "Requirement already satisfied: h11==0.* in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from httpcore==0.12.*->httpx->bigdl-serving) (0.12.0)\n",
-      "Requirement already satisfied: contextvars>=2.1 in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from sniffio->httpx->bigdl-serving) (2.4)\n",
-      "Requirement already satisfied: immutables>=0.9 in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from contextvars>=2.1->sniffio->httpx->bigdl-serving) (0.14)\n",
-      "Requirement already satisfied: idna in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from rfc3986[idna2008]<2,>=1.3->httpx->bigdl-serving) (2.10)\n",
-      "Requirement already satisfied: numpy>=1.13.3 in /home/user/anaconda3/envs/rec/lib/python3.6/site-packages (from opencv-python->bigdl-serving) (1.19.2)\n",
-      "\u001b[33mWARNING: You are using pip version 20.3.3; however, version 21.0.1 is available.\n",
-      "You should consider upgrading via the '/home/user/anaconda3/envs/rec/bin/python -m pip install --upgrade pip' command.\u001b[0m\n"
-     ]
-    }
-   ],
-   "source": [
-    "! pip install bigdl-serving"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Cluster Serving has been properly set up.\n",
-      "You did not specify BIGDL_VERSION, will download 0.9.0\n",
-      "BIGDL_VERSION is 0.9.0\n",
-      "BIGDL_VERSION is 0.12.1\n",
-      "SPARK_VERSION is 2.4.3\n",
-      "2.4\n",
-      "--2021-02-07 10:01:46--  https://repo1.maven.org/maven2/com/intel/analytics/bigdl/bigdl-spark_2.4.3/0.9.0/bigdl-spark_2.4.3-0.9.0-serving.jar\n",
-      "Resolving child-prc.intel.com (child-prc.intel.com)... You are installing Cluster Serving by pip, downloading...\n",
-      "\n",
-      "SIGHUP received.\n",
-      "Redirecting output to ‘wget-log.2’.\n"
-     ]
-    }
-   ],
-   "source": [
-    "# we go to a new directory and initialize the environment\n",
-    "! mkdir cluster-serving\n",
-    "os.chdir('cluster-serving')\n",
-    "! cluster-serving-init"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "  2150K .......... .......... .......... .......... ..........  0% 27.0K 5h37m\r\n",
-      "  2200K .......... .......... .......... .......... ..........  0% 33.6K 5h36m\r\n",
-      "  2250K .......... .......... .......... .......... ..........  0% 27.3K 5h37m\r\n",
-      "  2300K .......... .......... .......... .......... ..........  0% 30.3K 5h36m\r\n",
-      "  2350K .......... .......... .......... .......... ..........  0% 29.7K 5h36m\r\n",
-      "  2400K .......... .......... .......... .......... ..........  0% 23.7K 5h38m\r\n",
-      "  2450K .......... .......... .......... .......... ..........  0% 23.4K 5h39m\r\n",
-      "  2500K .......... .......... .......... .......... ..........  0% 23.4K 5h41m\r\n",
-      "  2550K .......... .......... .......... .......... ..........  0% 22.3K 5h43m\r\n",
-      "  2600K .......... .......... .......... ....."
-     ]
-    }
-   ],
-   "source": [
-    "! tail wget-log.2"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# if you encounter slow download issue like above, you can just use following command to download\n",
-    "# ! wget https://repo1.maven.org/maven2/com/intel/analytics/bigdl/bigdl-spark_2.4.3/0.9.0/bigdl-spark_2.4.3-0.9.0-serving.jar\n",
-    "\n",
-    "# if you are using wget to download, or get \"bigdl-xxx-serving.jar\" after \"ls\", please call mv *serving.jar bigdl.jar after downloaded."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "config.yaml  bigdl.jar\r\n"
-     ]
-    }
-   ],
-   "source": [
-    "# After initialization finished, check the directory\n",
-    "! ls"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We config the model path in `config.yaml` to following (the detail of config is at [Cluster Serving Configuration](https://github.com/intel-analytics/bigdl/blob/master/docs/docs/ClusterServingGuide/ProgrammingGuide.md#2-configuration))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "## BigDL Cluster Serving\n",
-    "\n",
-    "model:\n",
-    "  # model path must be provided\n",
-    "  path: /tmp/transfer_learning_mobilenetv2"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "## BigDL Cluster Serving\r\n",
-      "\r\n",
-      "model:\r\n",
-      "  # model path must be provided\r\n",
-      "  path: /tmp/transfer_learning_mobilenetv2\r\n",
-      "  # name, default is serving_stream, you need to specify if running multiple servings\r\n",
-      "  name:\r\n",
-      "data:\r\n",
-      "  # default, localhost:6379\r\n",
-      "  src:\r\n"
-     ]
-    }
-   ],
-   "source": [
-    "! head config.yaml"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Start Cluster Serving\n",
-    "\n",
-    "Cluster Serving requires Flink and Redis installed, and corresponded environment variables set, check [Cluster Serving Installation Guide](https://github.com/intel-analytics/bigdl/blob/master/docs/docs/ClusterServingGuide/ProgrammingGuide.md#1-installation) for detail.\n",
-    "\n",
-    "Flink cluster should start before Cluster Serving starts, if Flink cluster is not started, call following to start a local Flink cluster."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Starting cluster.\n",
-      "Starting standalonesession daemon on host my-PC.\n",
-      "Starting taskexecutor daemon on host my-PC.\n"
-     ]
-    }
-   ],
-   "source": [
-    "! $FLINK_HOME/bin/start-cluster.sh"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "After configuration, start Cluster Serving by `cluster-serving-start` (the detail is at [Cluster Serving Programming Guide](https://github.com/intel-analytics/bigdl/blob/master/docs/docs/ClusterServingGuide/ProgrammingGuide.md#3-launching-service))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "model_path=\"/tmp/transfer_learning_mobilenetv2\"\n",
-      "redis_timeout=\"5000\"\n",
-      "Redis maxmemory is not set, using default value 8G\n",
-      "redis server started, please check log in redis.log\n",
-      "OK\n",
-      "OK\n",
-      "OK\n",
-      "redis config maxmemory set to 8G\n",
-      "OK\n",
-      "OK\n",
-      "SLF4J: Class path contains multiple SLF4J bindings.\n",
-      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/bigdl-spark_2.4.3-0.9.0-SNAPSHOT-serving.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
-      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
-      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
-      "SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\n",
-      "SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]\n",
-      "log4j:WARN No appenders could be found for logger (org.apache.flink.client.cli.CliFrontend).\n",
-      "log4j:WARN Please initialize the log4j system properly.\n",
-      "log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.\n",
-      "Starting new Cluster Serving job.\n",
-      "Cluster Serving job submitted, check log in log-cluster_serving-serving_stream.txt\n",
-      "To list Cluster Serving job status, use cluster-serving-cli list\n",
-      "SLF4J: Class path contains multiple SLF4J bindings.\n",
-      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/bigdl-spark_2.4.3-0.9.0-SNAPSHOT-serving.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
-      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
-      "SLF4J: Found binding in [jar:file:/home/user/dep/flink-1.11.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]\n",
-      "SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\n",
-      "SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]\n",
-      "log4j:WARN No appenders could be found for logger (org.apache.flink.client.cli.CliFrontend).\n",
-      "log4j:WARN Please initialize the log4j system properly.\n",
-      "log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.\n",
-      "[Full GC (Metadata GC Threshold)  32304K->20432K(1030144K), 0.0213821 secs]\n"
-     ]
-    }
-   ],
-   "source": [
-    "! cluster-serving-start"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Prediction using Cluster Serving\n",
-    "Next we start Cluster Serving code at python client."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "redis group exist, will not create new one\n",
-      "redis group exist, will not create new one\n"
-     ]
-    }
-   ],
-   "source": [
-    "from bigdl.serving.client import InputQueue, OutputQueue\n",
-    "input_queue = InputQueue()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "In Cluster Serving, only NdArray is supported as input. Thus, we first transform the generator to ndarray (If you do not know how to transform your input to NdArray, you may get help at [data transform guide](https://github.com/intel-analytics/bigdl/tree/master/docs/docs/ClusterServingGuide/OtherFrameworkUsers#data))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 20,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "array([[[[0.12156864, 0.11764707, 0.10980393],\n",
-       "         [0.12156864, 0.11764707, 0.10980393],\n",
-       "         [0.11764707, 0.1137255 , 0.10588236],\n",
-       "         ...,\n",
-       "         [0.28627452, 0.29803923, 0.22352943],\n",
-       "         [0.24705884, 0.25882354, 0.18431373],\n",
-       "         [0.24705884, 0.24705884, 0.20000002]],\n",
-       "\n",
-       "        [[0.15686275, 0.15294118, 0.14509805],\n",
-       "         [0.13725491, 0.13333334, 0.1254902 ],\n",
-       "         [0.09803922, 0.09411766, 0.08627451],\n",
-       "         ...,\n",
-       "         [0.31764707, 0.3254902 , 0.27450982],\n",
-       "         [0.31764707, 0.3254902 , 0.27058825],\n",
-       "         [0.2784314 , 0.2784314 , 0.2392157 ]],\n",
-       "\n",
-       "        [[0.21960786, 0.21568629, 0.20784315],\n",
-       "         [0.23137257, 0.227451  , 0.21960786],\n",
-       "         [0.24705884, 0.24313727, 0.23529413],\n",
-       "         ...,\n",
-       "         [0.29411766, 0.29803923, 0.27450982],\n",
-       "         [0.26666668, 0.27058825, 0.2392157 ],\n",
-       "         [0.30588236, 0.30588236, 0.26666668]],\n",
-       "\n",
-       "        ...,\n",
-       "\n",
-       "        [[0.35686275, 0.3019608 , 0.15686275],\n",
-       "         [0.38431376, 0.29803923, 0.14509805],\n",
-       "         [0.36862746, 0.25490198, 0.12156864],\n",
-       "         ...,\n",
-       "         [0.1764706 , 0.08627451, 0.01568628],\n",
-       "         [0.16862746, 0.08627451, 0.00392157],\n",
-       "         [0.1764706 , 0.08627451, 0.03137255]],\n",
-       "\n",
-       "        [[0.30980393, 0.2784314 , 0.13333334],\n",
-       "         [0.3529412 , 0.29411766, 0.14117648],\n",
-       "         [0.3529412 , 0.26666668, 0.12156864],\n",
-       "         ...,\n",
-       "         [0.1764706 , 0.08627451, 0.01568628],\n",
-       "         [0.17254902, 0.08235294, 0.01176471],\n",
-       "         [0.18039216, 0.09019608, 0.03529412]],\n",
-       "\n",
-       "        [[0.30588236, 0.27450982, 0.13333334],\n",
-       "         [0.33333334, 0.28627452, 0.12941177],\n",
-       "         [0.3372549 , 0.26666668, 0.11764707],\n",
-       "         ...,\n",
-       "         [0.19607845, 0.09411766, 0.03529412],\n",
-       "         [0.18039216, 0.07843138, 0.02745098],\n",
-       "         [0.1764706 , 0.08627451, 0.03137255]]]], dtype=float32)"
-      ]
-     },
-     "execution_count": 20,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "arr = test_generator.next()[0]\n",
-    "arr"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 23,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Write to Redis successful\n",
-      "redis group exist, will not create new one\n",
-      "Write to Redis successful\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Use async api to put and get, you have pass a name arg and use the name to get\n",
-    "input_queue.enqueue('my-input', t=arr)\n",
-    "output_queue = OutputQueue()\n",
-    "prediction = output_queue.query('my-input')\n",
-    "# Use sync api to predict, this will block until the result is get or timeout\n",
-    "prediction = input_queue.predict(arr)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 24,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "array([[[0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 1.3543907 ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 4.1898136 ,\n",
-       "         0.        , 0.        ]],\n",
-       "\n",
-       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 3.286649  , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 4.0817494 , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 3.3224926 , 0.        , ..., 1.4220613 ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 4.9100547 ,\n",
-       "         0.        , 0.        ]],\n",
-       "\n",
-       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 1.5577714 , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 1.767426  , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 2.3534465 , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.21401057,\n",
-       "         0.        , 0.        ]],\n",
-       "\n",
-       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.34797698, 0.        ],\n",
-       "        [0.        , 1.4496232 , 0.        , ..., 0.        ,\n",
-       "         1.6221215 , 0.        ],\n",
-       "        [0.        , 0.6171873 , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ]],\n",
-       "\n",
-       "       [[0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         1.192298  , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ],\n",
-       "        [0.        , 0.        , 0.        , ..., 0.        ,\n",
-       "         0.        , 0.        ]]], dtype=float32)"
-      ]
-     },
-     "execution_count": 24,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "prediction"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "If everything works well, the result `prediction` would be the exactly the same NdArray object with the output of original Keras model."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 25,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# don't forget to delete the model you save for this tutorial\n",
-    "! rm -rf /tmp/transfer_learning_mobilenetv2"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "This is the end of this tutorial. If you have any question, you could raise an issue at [BigDL Github](https://github.com/intel-analytics/bigdl/issues)."
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.10"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}
diff --git a/docs/readthedocs/source/doc/Serving/Example/l08c08_forecasting_with_lstm.py b/docs/readthedocs/source/doc/Serving/Example/l08c08_forecasting_with_lstm.py
deleted file mode 100644
index 612017b8..00000000
--- a/docs/readthedocs/source/doc/Serving/Example/l08c08_forecasting_with_lstm.py
+++ /dev/null
@@ -1,75 +0,0 @@
-# Related url: https://github.com/tensorflow/examples/blob/master/courses/udacity_intro_to_tensorflow_for_deep_learning/l08c08_forecasting_with_lstm.ipynb
-# Forecasting with LSTM
-import numpy as np
-import tensorflow as tf
-import tensorflow.keras as keras
-
-# Get the trend with time and slope
-def trend(time, slope=0):
-    return slope * time
-
-
-# Get a specific pattern, which can be customerized
-def seasonal_pattern(season_time):
-    return np.where(season_time < 0.4,
-                    np.cos(season_time * 2 * np.pi),
-                    1 / np.exp(3 * season_time))
-
-# Repeats the same pattern at each period
-def seasonality(time, period, amplitude=1, phase=0):
-    season_time = ((time + phase) % period) / period
-    return amplitude * seasonal_pattern(season_time)
-
-# Obtain a random white noise
-def white_noise(time, noise_level=1, seed=None):
-    rnd = np.random.RandomState(seed)
-    return rnd.randn(len(time)) * noise_level
-
-# Convert the series to dataset form
-def ndarray_to_dataset(ndarray):
-    return tf.data.Dataset.from_tensor_slices(ndarray)
-
-# Convert the series to dataset with some modifications
-def sequential_window_dataset(series, window_size):
-    series = tf.expand_dims(series, axis=-1)
-    ds = ndarray_to_dataset(series)
-    ds = ds.window(window_size + 1, shift=window_size, drop_remainder=True)
-    ds = ds.flat_map(lambda window: window.batch(window_size + 1))
-    ds = ds.map(lambda window: (window[:-1], window[1:]))
-    return ds.batch(1).prefetch(1)
-
-# Convert dataset form to ndarray
-def dataset_to_ndarray(dataset):
-    array=list(dataset.as_numpy_iterator())
-    return np.ndarray(array)
-
-# Generate some raw test data
-time_range=4 * 365 + 1
-time = np.arange(time_range)
-
-slope = 0.05
-baseline = 10
-amplitude = 40
-series = baseline + trend(time, slope) + seasonality(time, period=365, amplitude=amplitude)
-
-noise_level = 5
-noise = white_noise(time, noise_level, seed=42)
-
-series += noise
-
-# Modify the raw test data with DataSet form
-tf.random.set_seed(42)
-np.random.seed(42)
-
-window_size = 30
-test_set = sequential_window_dataset(series, window_size)
-
-# Convert the DataSet form data to ndarry
-#pre_in=series[np.newaxis, :, np.newaxis]
-test_array=dataset_to_ndarray(test_set)
-
-# Load the saved LSTM model
-model=tf.keras.models.load_model("path/to/model")
-
-# Predict with LSTM model
-rnn_forecast_nd = model.predict(test_array)
diff --git a/docs/readthedocs/source/doc/Serving/Example/l10c03_nlp_constructing_text_generation_model.py b/docs/readthedocs/source/doc/Serving/Example/l10c03_nlp_constructing_text_generation_model.py
deleted file mode 100644
index 3d27b9a0..00000000
--- a/docs/readthedocs/source/doc/Serving/Example/l10c03_nlp_constructing_text_generation_model.py
+++ /dev/null
@@ -1,75 +0,0 @@
-# Related url: https://github.com/tensorflow/examples/blob/master/courses/udacity_intro_to_tensorflow_for_deep_learning/l10c03_nlp_constructing_text_generation_model.ipynb
-# Generating some new lyrics from the trained model
-
-import tensorflow as tf
-from tensorflow.keras.preprocessing.text import Tokenizer
-from tensorflow.keras.preprocessing.sequence import pad_sequences
-
-# Other imports for processing data
-import string
-import numpy as np
-import pandas as pd
-
-# DATA PREPROCESSING
-# First to get the dataset of the Song Lyrics dataset on Kaggle by:
-# !wget --no-check-certificate \
-#    https://drive.google.com/uc?id=1LiJFZd41ofrWoBtW-pMYsfz1w8Ny0Bj8 \
-#    -O /tmp/songdata.csv
-
-# Then to generate a tokenizer with the songdata.csv
-def tokenize_corpus(corpus, num_words=-1):
-  # Fit a Tokenizer on the corpus
-  if num_words > -1:
-    tokenizer = Tokenizer(num_words=num_words)
-  else:
-    tokenizer = Tokenizer()
-  tokenizer.fit_on_texts(corpus)
-  return tokenizer
-
-def create_lyrics_corpus(dataset, field):
-  # Remove all other punctuation
-  dataset[field] = dataset[field].str.replace('[{}]'.format(string.punctuation), '')
-  # Make it lowercase
-  dataset[field] = dataset[field].str.lower()
-  # Make it one long string to split by line
-  lyrics = dataset[field].str.cat()
-  corpus = lyrics.split('\n')
-  # Remove any trailing whitespace
-  for l in range(len(corpus)):
-    corpus[l] = corpus[l].rstrip()
-  # Remove any empty lines
-  corpus = [l for l in corpus if l != '']
-
-  return corpus
-
-# Read the dataset from csv
-dataset = pd.read_csv('/tmp/songdata.csv', dtype=str)
-# Create the corpus using the 'text' column containing lyrics
-corpus = create_lyrics_corpus(dataset, 'text')
-# Tokenize the corpus
-tokenizer = tokenize_corpus(corpus)
-
-# Get the uniform input length (max_sequence_len) of the model
-max_sequence_len=0
-for line in corpus:
-    token_list = tokenizer.texts_to_sequences([line])[0]
-    max_sequence_len=max(max_sequence_len,len(token_list))
-
-# Load the saved model which is trained on the Song Lyrics dataset
-model=tf.keras.models.load_model("path/to/model")
-
-# Generate new lyrics with some "seed text"
-seed_text = "im feeling chills" # seed text can be customerized
-next_words = 100    # this defined the length of the new lyrics
-
-for _ in range(next_words):
-    token_list = tokenizer.texts_to_sequences([seed_text])[0]   # convert the seed text to ndarray
-    token_list = pad_sequences([token_list], maxlen=max_sequence_len - 1, padding='pre')    # pad the input for equal length
-    predicted = np.argmax(model.predict(token_list), axis=-1)   # get the predicted word index
-    output_word = ""
-    for word, index in tokenizer.word_index.items():
-        if index == predicted:
-            output_word = word
-            break
-    seed_text += " " + output_word  # add the predicted word to the seed text
-print(seed_text)
diff --git a/docs/readthedocs/source/doc/Serving/Example/tf1-to-cluster-serving-example.ipynb b/docs/readthedocs/source/doc/Serving/Example/tf1-to-cluster-serving-example.ipynb
deleted file mode 100644
index b6c2daf0..00000000
--- a/docs/readthedocs/source/doc/Serving/Example/tf1-to-cluster-serving-example.ipynb
+++ /dev/null
@@ -1,571 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "reported-geometry",
-   "metadata": {},
-   "source": [
-    "In this example, we will use tensorflow v1 (version 1.15) to create a simple MLP model, and transfer the application to Cluster Serving step by step.\n",
-    "\n",
-    "This tutorial is recommended for Tensorflow v1 user only. If you are not Tensorflow v1 user, the keras tutorial [here](#keras-to-cluster-serving-example.ipynb) is more recommended."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "athletic-trance",
-   "metadata": {},
-   "source": [
-    "### Original Tensorflow v1 Application"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "olive-dutch",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "'1.15.0'"
-      ]
-     },
-     "execution_count": 1,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "import tensorflow as tf\n",
-    "tf.__version__"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "vertical-recall",
-   "metadata": {},
-   "source": [
-    "We first define the Tensorflow graph, and create some data."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "tropical-clinton",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "WARNING:tensorflow:From <ipython-input-2-853d23643c61>:24: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n",
-      "Instructions for updating:\n",
-      "Use tf.where in 2.0, which has the same broadcast rule as np.where\n"
-     ]
-    }
-   ],
-   "source": [
-    "g = tf.Graph()\n",
-    "with g.as_default():\n",
-    "   \n",
-    "    # Graph Inputs\n",
-    "    features = tf.placeholder(dtype=tf.float32, \n",
-    "                              shape=[None, 2], name='features')\n",
-    "    targets = tf.placeholder(dtype=tf.float32, \n",
-    "                             shape=[None, 1], name='targets')\n",
-    "\n",
-    "    # Model Parameters\n",
-    "    weights = tf.Variable(tf.zeros(shape=[2, 1], \n",
-    "                          dtype=tf.float32), name='weights')\n",
-    "    bias = tf.Variable([[0.]], dtype=tf.float32, name='bias')\n",
-    "    \n",
-    "\n",
-    "    \n",
-    "    # Forward Pass\n",
-    "    linear = tf.add(tf.matmul(features, weights), bias, name='linear')\n",
-    "    ones = tf.ones(shape=tf.shape(linear)) \n",
-    "    zeros = tf.zeros(shape=tf.shape(linear))\n",
-    "    prediction = tf.where(condition=tf.less(linear, 0.),\n",
-    "                          x=zeros, \n",
-    "                          y=ones, \n",
-    "                          name='prediction')\n",
-    "    \n",
-    "    # Backward Pass\n",
-    "    errors = targets - prediction\n",
-    "    weight_update = tf.assign_add(weights, \n",
-    "                                  tf.reshape(errors * features, (2, 1)),\n",
-    "                                  name='weight_update')\n",
-    "    bias_update = tf.assign_add(bias, errors,\n",
-    "                                name='bias_update')\n",
-    "    \n",
-    "    train = tf.group(weight_update, bias_update, name='train')\n",
-    "    \n",
-    "    saver = tf.train.Saver(name='saver')\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "id": "legislative-boutique",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "((3, 2), (3,))"
-      ]
-     },
-     "execution_count": 4,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "import numpy as np\n",
-    "x_train, y_train = np.array([[1,2],[3,4],[1,3]]), np.array([1,2,1])\n",
-    "x_train.shape, y_train.shape"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "coated-grill",
-   "metadata": {},
-   "source": [
-    "### Export TensorFlow SavedModel\n",
-    "Then, we train the graph and in the `with tf.Session`, we save the graph to SavedModel. The detailed code is following, and we could see the prediction result is `[1]` with input `[1,2]`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "id": "detailed-message",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Model parameters:\n",
-      "\n",
-      "Weights:\n",
-      " [[15.]\n",
-      " [20.]]\n",
-      "Bias: [[5.]]\n",
-      "[[1.]\n",
-      " [1.]\n",
-      " [1.]]\n",
-      "WARNING:tensorflow:From <ipython-input-5-399ffbde562b>:26: simple_save (from tensorflow.python.saved_model.simple_save) is deprecated and will be removed in a future version.\n",
-      "Instructions for updating:\n",
-      "This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.simple_save.\n",
-      "WARNING:tensorflow:From /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages/tensorflow_core/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.\n",
-      "Instructions for updating:\n",
-      "This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.\n",
-      "INFO:tensorflow:Assets added to graph.\n",
-      "INFO:tensorflow:No assets to write.\n",
-      "INFO:tensorflow:SavedModel written to: /tmp/mlp_tf1/saved_model.pb\n"
-     ]
-    }
-   ],
-   "source": [
-    "with tf.Session(graph=g) as sess:\n",
-    "    \n",
-    "    sess.run(tf.global_variables_initializer())\n",
-    "    \n",
-    "    for epoch in range(5):\n",
-    "        for example, target in zip(x_train, y_train):\n",
-    "            feed_dict = {'features:0': example.reshape(-1, 2),\n",
-    "                         'targets:0': target.reshape(-1, 1)}\n",
-    "            _ = sess.run(['train'], feed_dict=feed_dict)\n",
-    "\n",
-    "\n",
-    "    w, b = sess.run(['weights:0', 'bias:0'])    \n",
-    "    print('Model parameters:\\n')\n",
-    "    print('Weights:\\n', w)\n",
-    "    print('Bias:', b)\n",
-    "\n",
-    "    saver.save(sess, save_path='perceptron')\n",
-    "    \n",
-    "    pred = sess.run('prediction:0', feed_dict={features: x_train})\n",
-    "    print(pred)\n",
-    "    \n",
-    "    # in this session, save the model to savedModel format\n",
-    "    inputs = dict([(features.name, features)])\n",
-    "    outputs = dict([(prediction.name, prediction)])\n",
-    "    inputs, outputs\n",
-    "    tf.saved_model.simple_save(sess, \"/tmp/mlp_tf1\", inputs, outputs)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "consolidated-newport",
-   "metadata": {},
-   "source": [
-    "### Deploy Cluster Serving\n",
-    "After model prepared, we start to deploy it on Cluster Serving.\n",
-    "\n",
-    "First install Cluster Serving"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "id": "inner-texas",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Collecting bigdl-serving\n",
-      "Requirement already satisfied: httpx in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from bigdl-serving) (0.17.1)\n",
-      "Requirement already satisfied: pyarrow in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from bigdl-serving) (3.0.0)\n",
-      "Requirement already satisfied: pyyaml in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from bigdl-serving) (5.4.1)\n",
-      "Requirement already satisfied: redis in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from bigdl-serving) (3.5.3)\n",
-      "Requirement already satisfied: opencv-python in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from bigdl-serving) (4.5.1.48)\n",
-      "Requirement already satisfied: certifi in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from httpx->bigdl-serving) (2020.12.5)\n",
-      "Requirement already satisfied: httpcore<0.13,>=0.12.1 in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from httpx->bigdl-serving) (0.12.3)\n",
-      "Requirement already satisfied: rfc3986[idna2008]<2,>=1.3 in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from httpx->bigdl-serving) (1.4.0)\n",
-      "Requirement already satisfied: sniffio in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from httpx->bigdl-serving) (1.2.0)\n",
-      "Requirement already satisfied: h11==0.* in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from httpcore<0.13,>=0.12.1->httpx->bigdl-serving) (0.12.0)\n",
-      "Requirement already satisfied: idna in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from rfc3986[idna2008]<2,>=1.3->httpx->bigdl-serving) (3.1)\n",
-      "Requirement already satisfied: numpy>=1.14.5 in /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages (from opencv-python->bigdl-serving) (1.20.1)\n",
-      "Installing collected packages: bigdl-serving\n",
-      "Successfully installed bigdl-serving-0.9.0\n"
-     ]
-    }
-   ],
-   "source": [
-    "! pip install bigdl-serving"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "working-terrorism",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Trying to find config file in  /home/user/anaconda3/envs/tf1/lib/python3.7/site-packages/bigdl/conf/config.yaml\r\n",
-      "Config file found in pip package, copying...\r\n",
-      "Config file ready.\r\n",
-      "Cluster Serving has been properly set up.\r\n",
-      "You did not specify BIGDL_VERSION, will download 0.9.0\r\n",
-      "BIGDL_VERSION is 0.9.0\r\n",
-      "BIGDL_VERSION is 0.12.1\r\n",
-      "SPARK_VERSION is 2.4.3\r\n",
-      "2.4\r\n",
-      "You are installing Cluster Serving by pip, downloading...\r\n"
-     ]
-    }
-   ],
-   "source": [
-    "import os\n",
-    "! mkdir cluster-serving\n",
-    "os.chdir('cluster-serving')\n",
-    "! cluster-serving-init"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "id": "excited-exception",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "  2800K .......... .......... .......... .......... ..........  0% 11.8M 19m20s\r\n",
-      "  2850K .......... .......... .......... .......... ..........  0% 11.3M 19m1s\r\n",
-      "  2900K .......... .......... .......... .......... ..........  0% 8.60M 18m43s\r\n",
-      "  2950K .......... .......... .......... .......... ..........  0% 11.9M 18m25s\r\n",
-      "  3000K .......... .......... .......... .......... ..........  0% 11.8M 18m7s\r\n",
-      "  3050K .......... .......... .......... .......... ..........  0%  674K 18m4s\r\n",
-      "  3100K .......... .......... .......... .......... ..........  0%  418K 18m9s\r\n",
-      "  3150K .......... .......... .......... .......... ..........  0% 1.05M 18m0s\r\n",
-      "  3200K .......... .......... .......... .......... ..........  0%  750K 17m56s\r\n",
-      "  3250K .......... .......... .......... ...."
-     ]
-    }
-   ],
-   "source": [
-    "! tail wget-log"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "casual-premium",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# if you encounter slow download issue like above, you can just use following command to download\n",
-    "# ! wget https://repo1.maven.org/maven2/com/intel/analytics/bigdl/bigdl-spark_2.4.3/0.9.0/bigdl-spark_2.4.3-0.9.0-serving.jar\n",
-    "\n",
-    "# if you are using wget to download, or get \"bigdl-xxx-serving.jar\" after \"ls\", please call mv *serving.jar bigdl.jar after downloaded."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "id": "ruled-bermuda",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "bigdl-spark_2.4.3-0.9.0-serving.jar  config.yaml  wget-log\r\n"
-     ]
-    }
-   ],
-   "source": [
-    "# After initialization finished, check the directory\n",
-    "! ls"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "id": "computational-rehabilitation",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Call mv *serving.jar bigdl.jar as mentioned above\n",
-    "! mv *serving.jar bigdl.jar"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "id": "personal-central",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "config.yaml  wget-log  bigdl.jar\r\n"
-     ]
-    }
-   ],
-   "source": [
-    "! ls"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "combined-stability",
-   "metadata": {},
-   "source": [
-    "We config the model path in `config.yaml` to following (the detail of config is at [Cluster Serving Configuration](https://github.com/intel-analytics/bigdl/blob/master/docs/docs/ClusterServingGuide/ProgrammingGuide.md#2-configuration))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "received-hayes",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "## BigDL Cluster Serving\n",
-    "\n",
-    "model:\n",
-    "  # model path must be provided\n",
-    "  path: /tmp/mlp_tf1"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "id": "satellite-honey",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "## BigDL Cluster Serving\r\n",
-      "\r\n",
-      "model:\r\n",
-      "  # model path must be provided\r\n",
-      "  path: /tmp/mlp_tf1\r\n",
-      "  # name, default is serving_stream, you need to specify if running multiple servings\r\n",
-      "  name:\r\n",
-      "data:\r\n",
-      "  # default, localhost:6379\r\n",
-      "  src:\r\n"
-     ]
-    }
-   ],
-   "source": [
-    "! head config.yaml"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "planned-hometown",
-   "metadata": {},
-   "source": [
-    "### Start Cluster Serving\n",
-    "\n",
-    "Cluster Serving requires Flink and Redis installed, and corresponded environment variables set, check [Cluster Serving Installation Guide](https://github.com/intel-analytics/bigdl/blob/master/docs/docs/ClusterServingGuide/ProgrammingGuide.md#1-installation) for detail.\n",
-    "\n",
-    "Flink cluster should start before Cluster Serving starts, if Flink cluster is not started, call following to start a local Flink cluster."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "id": "antique-melbourne",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Starting cluster.\n",
-      "Starting standalonesession daemon on host user-PC.\n",
-      "Starting taskexecutor daemon on host user-PC.\n"
-     ]
-    }
-   ],
-   "source": [
-    "! $FLINK_HOME/bin/start-cluster.sh"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "interested-bench",
-   "metadata": {},
-   "source": [
-    "After configuration, start Cluster Serving by `cluster-serving-start` (the detail is at [Cluster Serving Programming Guide](https://github.com/intel-analytics/bigdl/blob/master/docs/docs/ClusterServingGuide/ProgrammingGuide.md#3-launching-service))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "id": "modern-monster",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "model_path=\"/tmp/mlp_tf1\"\n",
-      "redis_timeout=\"5000\"\n",
-      "Redis maxmemory is not set, using default value 8G\n",
-      "redis server started, please check log in redis.log\n",
-      "OK\n",
-      "OK\n",
-      "OK\n",
-      "redis config maxmemory set to 8G\n",
-      "OK\n",
-      "OK\n",
-      "Starting new Cluster Serving job.\n",
-      "Cluster Serving job submitted, check log in log-cluster_serving-serving_stream.txt\n",
-      "To list Cluster Serving job status, use cluster-serving-cli list\n",
-      "{maxmem=null, timeout=5000}timeout getted: 5000\n"
-     ]
-    }
-   ],
-   "source": [
-    "! cluster-serving-start"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "improved-rough",
-   "metadata": {},
-   "source": [
-    "### Prediction using Cluster Serving\n",
-    "Next we start Cluster Serving code at python client."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 22,
-   "id": "immune-madness",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "redis group exist, will not create new one\n",
-      "redis group exist, will not create new one\n",
-      "Write to Redis successful\n",
-      "redis group exist, will not create new one\n",
-      "Write to Redis successful\n"
-     ]
-    }
-   ],
-   "source": [
-    "from bigdl.serving.client import InputQueue, OutputQueue\n",
-    "input_queue = InputQueue()\n",
-    "# Use async api to put and get, you have pass a name arg and use the name to get\n",
-    "arr = np.array([1,2])\n",
-    "input_queue.enqueue('my-input', t=arr)\n",
-    "output_queue = OutputQueue()\n",
-    "prediction = output_queue.query('my-input')\n",
-    "# Use sync api to predict, this will block until the result is get or timeout\n",
-    "prediction = input_queue.predict(arr)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 23,
-   "id": "signal-attention",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "array([1.], dtype=float32)"
-      ]
-     },
-     "execution_count": 23,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "prediction"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "suitable-selection",
-   "metadata": {},
-   "source": [
-    "The `prediction` result would be the same as using Tensorflow.\n",
-    "\n",
-    "This is the end of this tutorial. If you have any question, you could raise an issue at [BigDL Github](https://github.com/intel-analytics/bigdl/issues)."
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.10"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/docs/readthedocs/source/doc/Serving/Example/transfer_learning.py b/docs/readthedocs/source/doc/Serving/Example/transfer_learning.py
deleted file mode 100644
index 9777ea70..00000000
--- a/docs/readthedocs/source/doc/Serving/Example/transfer_learning.py
+++ /dev/null
@@ -1,40 +0,0 @@
-# Related url: https://github.com/tensorflow/docs/blob/master/site/en/r1/tutorials/images/transfer_learning.ipynb
-# Categorize image to cat or dog   
-import os
-import tensorflow.compat.v1 as tf
-from tensorflow import keras
-
-# Obtain data from url:"https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip"
-zip_file = tf.keras.utils.get_file(origin="https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip",
-                                   fname="cats_and_dogs_filtered.zip", extract=True)
-
-# Find the directory of validation set
-base_dir, _ = os.path.splitext(zip_file)
-test_dir = os.path.join(base_dir, 'validation')
-
-# Set images size to 160x160x3
-image_size = 160
-
-# Rescale all images by 1./255 and apply image augmentation
-test_datagen = keras.preprocessing.image.ImageDataGenerator(rescale=1./255)
-
-# Flow images using generator to the test_generator
-test_generator = test_datagen.flow_from_directory(
-                test_dir,
-                target_size=(image_size, image_size),
-                batch_size=1,
-                class_mode='binary')
-
-# Convert the next data of ImageDataGenerator to ndarray
-def convert_to_ndarray(ImageGenerator):
-    return ImageGenerator.next()[0]
-
-# Load model from its path
-model=tf.keras.models.load_model("path/to/model")
-
-# Convert each image in test_generator to ndarray and predict with model
-max_length=test_generator.__len__()
-for i in range(max_length): # number of image to predict can be altered
-    test_input=convert_to_ndarray(test_generator)
-    prediction=model.predict(test_input)
-
diff --git a/docs/readthedocs/source/doc/Serving/FAQ/contribute-guide.md b/docs/readthedocs/source/doc/Serving/FAQ/contribute-guide.md
deleted file mode 100644
index b9afe05a..00000000
--- a/docs/readthedocs/source/doc/Serving/FAQ/contribute-guide.md
+++ /dev/null
@@ -1,118 +0,0 @@
-# Contribute to Cluster Serving
-
-This is the guide to contribute your code to Cluster Serving.
-
-Cluster Serving takes advantage of BigDL core with integration of Deep Learning Frameworks, e.g. Tensorflow, OpenVINO, PyTorch, and implements the inference logic on top of it, and parallelize the computation with Flink and Redis by default. To contribute more features to Cluster Serving, you could refer to following sections accordingly.
-## Dev Environment
-
-### Get Code and Prepare Branch
-Go to BigDL main repo https://github.com/intel-analytics/bigdl, press Fork to your github repo, and git clone the forked repo to local. Use `git checkout -b your_branch_name` to create a new branch, and you could start to write code and pull request to BigDL from this branch.
-### Environment Set up
-You could refer to [BigDL Scala Developer Guide](https://bigdl.readthedocs.io/en/latest/doc/UserGuide/develop.html#scala) to set up develop environment. Cluster Serving is an BigDL Scala module.
-
-### Debug in IDE
-Cluster Serving depends on Flink and Redis. To install Redis and start Redis server,
-```
-$ export REDIS_VERSION=5.0.5
-$ wget http://download.redis.io/releases/redis-${REDIS_VERSION}.tar.gz && \
-    tar xzf redis-${REDIS_VERSION}.tar.gz && \
-    rm redis-${REDIS_VERSION}.tar.gz && \
-    cd redis-${REDIS_VERSION} && \
-    make
-$ ./src/redis-server
-```
-in IDE, embedded Flink would be used so that no dependency is needed.
-
-Once set up, you could copy the `/path/to/bigdl/scripts/cluster-serving/config.yaml` to `/path/to/bigdl/config.yaml`, and run `scala/serving/src/main/com/intel/analytics/bigdl/serving/ClusterServing.scala` in IDE. Since IDE consider `/path/to/bigdl/` as the current directory, it would read the config file in it.
-
-Run `scala/serving/src/main/com/intel/analytics/bigdl/serving/http/Frontend2.scala` if you use HTTP frontend.
- 
-Once started, you could run python client code to finish an end-to-end test just as you run Cluster Serving in [Programming Guide](https://github.com/intel-analytics/bigdl/blob/master/docs/docs/ClusterServingGuide/ProgrammingGuide.md#4-model-inference).
-### Test Package
-Once you write the code and complete the test in IDE, you can package the jar and test.
-
-To package,
-```
-cd /path/to/bigdl/scala
-./make-dist.sh
-```
-Then, in `target` folder, copy `bigdl-xxx-flink-udf.jar` to your test directory, and rename it as `bigdl.jar`, and also copy the `config.yaml` to your test directory.
-
-You could copy `/path/to/bigdl/scripts/cluster-serving/cluster-serving-start` to start Cluster Serving, this scripts will start Redis server for you and submit Flink job. If you prefer not to control Redis, you could use the command in it `${FLINK_HOME}/bin/flink run -c com.intel.analytics.bigdl.serving.ClusterServing bigdl.jar` to start Cluster Serving.
-
-To run frontend, call `java -cp bigdl.jar com.intel.analytics.bigdl.serving.http.Frontend2`.
-
-The rest are the same with test in IDE.
-
-## Add Features
-### Data Connector
-Data connector is the producer of Cluster Serving. The remote clients put data into data pipeline
-#### Scala code (The Server)
-
-To define a new data connector to, e.g. Kafka, Redis, or other database, you have to define a Flink Source first.
-
-You could refer to `com/intel/analytics/bigdl/serving/engine/FlinkRedisSource.scala` as an example.
-
-```
-class FlinkRedisSource(params: ClusterServingHelper)
-  extends RichParallelSourceFunction[List[(String, String)]] {
-  @volatile var isRunning = true
-
-  override def open(parameters: Configuration): Unit = {
-    // initlalize the connector
-  }
-
-  override def run(sourceContext: SourceFunction
-    .SourceContext[List[(String, String)]]): Unit = while (isRunning) {
-    // get data from data pipeline
-  }
-
-  override def cancel(): Unit = {
-    // close the connector
-  }
-}
-```
-Then you could refer to `com/intel/analytics/bigdl/serving/engine/FlinkInference.scala` as the inference method to your new connector. Usually it could be directly used without new implementation. However, you could still define your new method if you need.
-
-Finally, you have to define a Flink Sink, to write data back to data pipeline.
-
-You could refer to `com/intel/analytics/bigdl/serving/engine/FlinkRedisSink.scala` as an example.
-
-```
-class FlinkRedisSink(params: ClusterServingHelper)
-  extends RichSinkFunction[List[(String, String)]] {
-  
-  override def open(parameters: Configuration): Unit = {
-    // initialize the connector
-  }
-
-  override def close(): Unit = {
-    // close the connector
-  }
-
-  override def invoke(value: List[(String, String)], context: SinkFunction.Context[_]): Unit = {
-    // write data to data pipeline
-  }
-}
-```
-Please note that normally you should do the space (memory or disk) control of your data pipeline in your code.
-
-
-Please locate Flink Source and Flink Sink code to `com/intel/analytics/bigdl/serving/engine/`
-
-If you have some method which need to be wrapped as a class, you could locate them in `com/intel/analytics/bigdl/serving/pipeline/`
-#### Python Code (The Client)
-You could refer to `python/serving/src/bigdl/serving/client.py` to define your client code according to your data connector.
-
-Please locate this part of code in `python/serving/src/bigdl/serving/data_pipeline_name/`, e.g. `python/serving/src/bigdl/serving/kafka/` if you create a Kafka connector.
-##### put to data pipeline
-It is recommended to refer to `InputQueue.enqueue()` and `InputQueue.predict()` method. This method calls `self.data_to_b64` method first and add data to data pipeline. You could define a similar enqueue method to work with your data connector.
-##### get from data pipeline
-It is recommended to refer to `OutputQueue.query()` and `OutputQueue.dequeue()` method. This method gets result from data pipeline and calls `self.get_ndarray_from_b64` method to decode. You could define a similar dequeue method to work with your data connector.
-
-## Benchmark Test
-You could use `scala/serving/src/main/com/intel/analytics/BIGDL/serving/engine/Operations.scala` to test the inference time of your model. 
-
-The script takes two arguments, run it with `-m modelPath` and `-j jsonPath` to indicate the path to the model and the path to the prepared json format operation template of the model.
-
-The model will output the inference time stats of preprocessing, prediction and postprocessing processes, which varies with the different preprocessing/postprocessing time and thread numbers.
diff --git a/docs/readthedocs/source/doc/Serving/FAQ/faq.md b/docs/readthedocs/source/doc/Serving/FAQ/faq.md
deleted file mode 100644
index 916ad465..00000000
--- a/docs/readthedocs/source/doc/Serving/FAQ/faq.md
+++ /dev/null
@@ -1,53 +0,0 @@
-# Cluster Serving FAQ
-
-## General Debug Guide
-You could use following guide to debug if serving is not working properly.
-
-### Check if Cluster Serving environment is ready
-Run following commands in terminal
-```
-echo $FLINK_HOME
-echo $REDIS_HOME
-```
-the output directory
-```
-/path/to/flink-version
-/path/to/redis-version
-``` 
- 
-should be displayed, otherwise, go to [Programming Guide](ProgrammingGuide.md) **Installation** section.
-
-### Check if Flink Cluster is working
-Run following commands in terminal
-```
-netstat -tnlp
-```
-output like following should be displayed, `6123,8081` is Flink default port usage.
-```
-tcp6       0      0 :::6123                 :::*                    LISTEN      xxxxx/java
-tcp6       0      0 :::8081                 :::*                    LISTEN      xxxxx/java
-```
-if not, run `$FLINK_HOME/bin/start-cluster.sh` to start Flink cluster.
-
-After that, check Flink log in `$FLINK_HOME/log/`, check the log file of `flink-xxx-standalone-xxx.log` and `flink-xxx-taskexecutor-xxx.log` to make sure there is no error.
-
-If the port could not bind in this step, kill the program which use the port, and `$FLINK_HOME/bin/stop-cluster.sh && $FLINK_HOME/bin/start-cluster.sh` to restart Flink cluster.
-### Check if Cluster Serving is running
-```
-$FLINK_HOME/bin/flink list
-```
-output of Cluster Serving job information should be displayed, if not, go to [Programming Guide](ProgrammingGuide.md) **Launching Service** section to make sure you call `cluster-serving-start` correctly.
-
-
-
-### Troubleshooting
-
-1. `Duplicate registration of device factory for type XLA_CPU with the same priority 50`
-
-   This error is caused by Flink ClassLoader. Please put cluster serving related jars into `${FLINK_HOME}/lib`.
-
-2. `servable Manager config dir not exist`
-
-   Check if `servables.yaml` exists in current directory. If not, download from [github](https://github.com/intel-analytics/bigdl/blob/master/ppml/trusted-realtime-ml/scala/docker-graphene/servables.yaml).
-### Still, I get no result
-If you still get empty result, raise issue [here](https://github.com/intel-analytics/bigdl/issues) and post the output/log of your serving job.
diff --git a/docs/readthedocs/source/doc/Serving/Overview/cluster_serving_overview.jpg b/docs/readthedocs/source/doc/Serving/Overview/cluster_serving_overview.jpg
deleted file mode 100644
index 6edbc9c9..00000000
Binary files a/docs/readthedocs/source/doc/Serving/Overview/cluster_serving_overview.jpg and /dev/null differ
diff --git a/docs/readthedocs/source/doc/Serving/Overview/cluster_serving_steps.jpg b/docs/readthedocs/source/doc/Serving/Overview/cluster_serving_steps.jpg
deleted file mode 100644
index 74fb2752..00000000
Binary files a/docs/readthedocs/source/doc/Serving/Overview/cluster_serving_steps.jpg and /dev/null differ
diff --git a/docs/readthedocs/source/doc/Serving/Overview/serving.md b/docs/readthedocs/source/doc/Serving/Overview/serving.md
deleted file mode 100644
index 39326ae6..00000000
--- a/docs/readthedocs/source/doc/Serving/Overview/serving.md
+++ /dev/null
@@ -1,49 +0,0 @@
-# Cluster Serving User Guide
-BigDL Cluster Serving is a lightweight distributed, real-time serving solution that supports a wide range of deep learning models (such as TensorFlow, PyTorch, Caffe, BigDL and OpenVINO models). It provides a simple pub/sub API, so that the users can easily send their inference requests to the input queue (using a simple Python API); Cluster Serving will then automatically manage the scale-out and real-time model inference across a large cluster (using distributed streaming frameworks such as Apache Spark Streaming, Apache Flink, etc.) 
-
-The overall architecture of BigDL Cluster Serving solution is illustrated as below: 
-
-![overview](cluster_serving_overview.jpg)
-
-## Workflow Overview
-The figure below illustrates the simple 3-step "Prepare-Launch-Inference" workflow for Cluster Serving.
-
-![steps](cluster_serving_steps.jpg)
-
-#### 1. Install and prepare Cluster Serving environment on a local node:
-
-- Copy a previously trained model to the local node; currently TensorFlow, PyTorch, Caffe, BigDL and OpenVINO models are supported.
-- Install BigDL Cluster Serving on the local node (e.g., using a single pip install command)
-- Configure Cluster Server on the local node, including the file path to the trained model and the address of the cluster (such as Apache Hadoop YARN cluster, K8s cluster, etc.).
-Please note that you only need to deploy the Cluster Serving solution on a single local node, and NO modifications are needed for the (YARN or K8s) cluster. 
-
-#### 2. Launch the Cluster Serving service
-
-You can launch the Cluster Serving service by running the startup script on the local node. Under the hood, Cluster Serving will automatically deploy the trained model and serve the model inference requests across the cluster in a distributed fashion. You may monitor its runtime status (such as inference throughput) using TensorBoard. 
-
-#### 3. Distributed, real-time (streaming) inference
-
-Cluster Serving provides a simple pub/sub API to the users, so that you can easily send the inference requests to an input queue (currently Redis Streams is used) using a simple Python API.
-
-Cluster Serving will then read the requests from the Redis stream, run the distributed real-time inference across the cluster (using Flink), and return the results back through Redis. As a result, you may get the inference results again using a simple Python API.
-
-## Next Steps
-### Deploy Cluster Serving
-To deploy Cluster Serving, follow steps below
-
-[1. Install Cluster Serving](https://bigdl.readthedocs.io/en/latest/doc/Serving/ProgrammingGuide/serving-installation.html)
-
-[2. Start Cluster Serving](https://bigdl.readthedocs.io/en/latest/doc/Serving/ProgrammingGuide/serving-start.html)
-
-[3. Inference by Cluster Serving](https://bigdl.readthedocs.io/en/latest/doc/Serving/ProgrammingGuide/serving-inference.html)
-
-### Examples
-You could find some end-to-end examples about how to build a serving application from scratch or how to migrate an existed local application to serving.
-
-[Exammple link](https://bigdl.readthedocs.io/en/latest/doc/Serving/Example/example.html) 
-### Trouble Shooting
-Some frequently asked questions are at [FAQ](https://bigdl.readthedocs.io/en/latest/doc/Serving/FAQ/faq.html)
-
-
-### Contribute Guide
-For contributors, check [Contribute Guide](https://bigdl.readthedocs.io/en/latest/doc/Serving/FAQ/contribute-guide.html)
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-inference.md b/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-inference.md
deleted file mode 100644
index da92b872..00000000
--- a/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-inference.md
+++ /dev/null
@@ -1,185 +0,0 @@
-# Inference by Cluster Serving
-
-## Model Inference
-Once you finish the installation and service launch, you could do inference using Cluster Serving client API.
-
-We support Python API and HTTP RESTful API for conducting inference with Data Pipeline in Cluster Serving. 
-
-### Python API
-For Python API, the requirements of python packages are `opencv-python`(for raw image only), `pyyaml`, `redis`. You can use `InputQueue` and `OutputQueue` to connect to data pipeline by providing the pipeline url, e.g. `my_input_queue = InputQueue(host, port)` and `my_output_queue = OutputQueue(host, port)`. If parameters are not provided, default url `localhost:6379` would be used.
-
-We provide some basic usages here, for more details, please see [API Guide](APIGuide.md).
-
-To input data to queue, you need a `InputQueue` instance, and using `enqueue` method, for each input, give a key correspond to your model or give arbitrary key if your model does not care about it.
-
-To enqueue an image
-```
-from bigdl.serving.client import InputQueue
-input_api = InputQueue()
-input_api.enqueue('my-image1', user_define_key={"path: 'path/to/image1'})
-```
-To enqueue an instance containing 1 image and 2 ndarray
-```
-from bigdl.serving.client import InputQueue
-import numpy as np
-input_api = InputQueue()
-t1 = np.array([1,2])
-t2 = np.array([[1,2], [3,4]])
-input_api.enqueue('my-instance', img={"path": 'path/to/image'}, tensor1=t1, tensor2=t2)
-```
-There are 4 types of inputs in total, string, image, tensor, sparse tensor, which could represents nearly all types of models. For more details of usage, go to [API Guide](APIGuide.md)
-
-To get data from queue, you need a `OutputQueue` instance, and using `query` or `dequeue` method. The `query` method takes image uri as parameter and returns the corresponding result. The `dequeue` method takes no parameter and just returns all results and also delete them in data queue. See following example.
-```
-from bigdl.serving.client import OutputQueue
-output_api = OutputQueue()
-img1_result = output_api.query('img1')
-all_result = output_api.dequeue() # the output queue is empty after this code
-```
-Consider the code above,
-```
-img1_result = output_api.query('img1')
-```
-##### Sync API
-Python API is a pub-sub schema async API. Specifically, thread would not block once you call `enqueue` method. If you want the thread to block, see this section.
-
-To use sync API, create a `InputQueue` instance with `sync=True` and `frontend_url=frontend_server_url` argument.
-```
-from bigdl.serving.client import InputQueue
-input_api = InputQueue(sync=True, frontend_url=frontend_server_url)
-response = input_api.predict(request_json_string)
-print(response.text)
-```
-example of `request_json_string` is
-```
-'{
-  "instances" : [ {
-    "ids" : [ 100.0, 88.0 ]
-  }]
-}'
-```
-This API is also a python support of [Restful API](#restful-api) section, so for more details of input format, refer to it.
-### RESTful API
-RESTful API uses serving HTTP server.
-This part describes API endpoints and end-to-end examples on usage. 
-The requests and responses are in JSON format. The composition of them depends on the requests type or verb. See the APIs for details.
-In case of error, all APIs will return a JSON object in the response body with error as key and the error message as the value:
-```
-{
-  "error": <error message string>
-}
-```
-#### Predict API
-URL
-```
-POST http://host:port/predict
-```
-Request Example for images as inputs:
-```
-curl -d \
-'{
-  "instances": [
-    {
-      "image": "/9j/4AAQSkZJRgABAQEASABIAAD/7RcEUGhvdG9za..."
-    },   
-    {
-      "image": "/9j/4AAQSkZJRgABAQEASABIAAD/7RcEUGhvdG9za..."
-    }
-  ]
-}' \
--X POST http://host:port/predict
-```
-Response Example
-```
-{
-  "predictions": [
-    "{value=[[903,0.1306194]]}",    
-    "{value=[[903,0.1306194]]}"
-  ]
-}
-```
-Request Example for tensor as inputs:
-```
-curl -d \
-'{
-  "instances" : [ {
-    "ids" : [ 100.0, 88.0 ]
-  }, {
-    "ids" : [ 100.0, 88.0 ]
-  } ]
-}' \
--X POST http://host:port/predict
-```
-Response Example
-```
-{
-  "predictions": [
-    "{value=[[1,0.6427843]]}",
-    "{value=[[1,0.6427842]]}"
-  ]
-}
-```
-Another request example for composition of scalars and tensors.
-```
-curl -d \
- '{
-  "instances" : [ {
-    "intScalar" : 12345,
-    "floatScalar" : 3.14159,
-    "stringScalar" : "hello, world. hello, arrow.",
-    "intTensor" : [ 7756, 9549, 1094, 9808, 4959, 3831, 3926, 6578, 1870, 1741 ],
-    "floatTensor" : [ 0.6804766, 0.30136853, 0.17394465, 0.44770062, 0.20275897, 0.32762378, 0.45966738, 0.30405098, 0.62053126, 0.7037923 ],
-    "stringTensor" : [ "come", "on", "united" ],
-    "intTensor2" : [ [ 1, 2 ], [ 3, 4 ], [ 5, 6 ] ],
-    "floatTensor2" : [ [ [ 0.2, 0.3 ], [ 0.5, 0.6 ] ], [ [ 0.2, 0.3 ], [ 0.5, 0.6 ] ] ],
-    "stringTensor2" : [ [ [ [ "come", "on", "united" ], [ "come", "on", "united" ], [ "come", "on", "united" ], [ "come", "on", "united" ] ], [ [ "come", "on", "united" ], [ "come", "on", "united" ], [ "come", "on", "united" ], [ "come", "on", "united" ] ] ], [ [ [ "come", "on", "united" ], [ "come", "on", "united" ], [ "come", "on", "united" ], [ "come", "on", "united" ] ], [ [ "come", "on", "united" ], [ "come", "on", "united" ], [ "come", "on", "united" ], [ "come", "on", "united" ] ] ] ]
-  }]
-}' \
--X POST http://host:port/predict
-```
-Another request example for composition of sparse and dense tensors.
-```
-curl -d \
-'{
-  "instances" : [ {
-    "sparseTensor" : {
-      "shape" : [ 100, 10000, 10 ],
-      "data" : [ 0.2, 0.5, 3.45, 6.78 ],
-      "indices" : [ [ 1, 1, 1 ], [ 2, 2, 2 ], [ 3, 3, 3 ], [ 4, 4, 4 ] ]
-    },
-    "intTensor2" : [ [ 1, 2 ], [ 3, 4 ], [ 5, 6 ] ]
-  }]
-}' \
--X POST http://host:port/predict
-```
-
-
-#### Metrics API
-URL
-```
-GET http://host:port/metrics
-```
-Response example:
-```
-[
-  {
-    name: "bigdl.serving.redis.get",
-    count: 810,
-    meanRate: 12.627772820651845,
-    min: 0,
-    max: 25,
-    mean: 0.9687099303718213,
-    median: 0.928579,
-    stdDev: 0.8150031623593447,
-    _75thPercentile: 1.000047,
-    _95thPercentile: 1.141443,
-    _98thPercentile: 1.268665,
-    _99thPercentile: 1.608387,
-    _999thPercentile: 25.874584
-  }
-]
-```
-## Logs and Visualization
-To see outputs/logs, go to FLink UI -> job -> taskmanager, (`localhost:8081` by default), or go to `${FLINK_HOME}/logs`
-
-To visualize the statistics, e.g. performance, go to Flink UI -> job -> metrics, and select the statistic to monitor
diff --git a/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-installation.md b/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-installation.md
deleted file mode 100644
index b6e165ba..00000000
--- a/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-installation.md
+++ /dev/null
@@ -1,154 +0,0 @@
-# Install Cluster Serving
-
-## Installation
-It is recommended to install Cluster Serving by pulling the pre-built Docker image to your local node, which have packaged all the required dependencies. Alternatively, you may also manually install Cluster Serving (through either pip or direct downloading), Redis on the local node.
-#### Docker
-```
-docker pull intelanalytics/bigdl-cluster-serving
-```
-then, (or directly run `docker run`, it will pull the image if it does not exist)
-```
-docker run --name cluster-serving -itd --net=host intelanalytics/bigdl-cluster-serving:0.9.0
-```
-Log into the container
-```
-docker exec -it cluster-serving bash
-```
-`cd ./cluster-serving`, you can see all the environments prepared.
-
-#### Manual installation
-
-##### Requirements
-Non-Docker users need to install [Flink 1.10.0+](https://archive.apache.org/dist/flink/flink-1.10.0/), 1.10.0 by default, [Redis 5.0.0+](https://redis.io/topics/quickstart), 5.0.5 by default.
-
-For users do not have above dependencies, we provide following command to quickly set up.
-
-Redis
-```
-$ export REDIS_VERSION=5.0.5
-$ wget http://download.redis.io/releases/redis-${REDIS_VERSION}.tar.gz && \
-    tar xzf redis-${REDIS_VERSION}.tar.gz && \
-    rm redis-${REDIS_VERSION}.tar.gz && \
-    cd redis-${REDIS_VERSION} && \
-    make
-```
-
-Flink
-```
-$ export FLINK_VERSION=1.11.2
-$ wget https://archive.apache.org/dist/flink/flink-${FLINK_VERSION}/flink-${FLINK_VERSION}-bin-scala_2.11.tgz && \
-    tar xzf flink-${FLINK_VERSION}-bin-scala_2.11.tgz && \
-    rm flink-${FLINK_VERSION}-bin-scala_2.11.tgz.tgz
-```
-
-After preparing dependencies above, make sure the environment variable `$FLINK_HOME` (/path/to/flink-FLINK_VERSION-bin), `$REDIS_HOME`(/path/to/redis-REDIS_VERSION) is set before following steps. 
-
-#### Install release version
-```
-pip install bigdl-serving
-```
-#### Install nightly version
-Download package from [here](https://sourceforge.net/projects/bigdl/files/cluster-serving-py/), run following command to install Cluster Serving
-```
-pip install bigdl_serving-*.whl
-```
-For users who need to deploy and start Cluster Serving, run `cluster-serving-init` to download and prepare dependencies.
-
-For users who need to do inference, aka. predict data only, the environment is ready.
-
-## Configuration
-### Set up cluster
-Cluster Serving uses Flink cluster, make sure you have it according to [Installation](#1-installation).
-
-For docker user, the cluster should be already started. You could use `netstat -tnlp | grep 8081` to check if Flink REST port is working, if not, call `$FLINK_HOME/bin/start-cluster.sh` to start Flink cluster.
-
-If you need to start Flink on yarn, refer to [Flink on Yarn](https://ci.apache.org/projects/flink/flink-docs-stable/deployment/resource-providers/yarn.html), or K8s, refer to [Flink on K8s](https://ci.apache.org/projects/flink/flink-docs-stable/deployment/resource-providers/standalone/kubernetes.html) at Flink official documentation.
-
-If you use Flink standalone, call `$FLINK_HOME/bin/start-cluster.sh` to start Flink cluster.
-
-
-
-### Configuration file
-After [Installation](#1-installation), you will see a config file `config.yaml` in your current working directory. This file contains all the configurations that you can customize for your Cluster Serving. See an example of `config.yaml` below.
-```
-## BigDL Cluster Serving Config Example
-# model path must be provided
-modelPath: /path/to/model
-```
-
-### Preparing Model
-Currently BigDL Cluster Serving supports TensorFlow, OpenVINO, PyTorch, BigDL, Caffe models. Supported types are listed below.
-
-You need to put your model file into a directory with layout like following according to model type, note that only one model is allowed in your directory. Then, set in `config.yaml` file with `modelPath:/path/to/dir`.
-
-**Tensorflow**
-***Tensorflow SavedModel***
-```
-|-- model
-   |-- saved_model.pb
-   |-- variables
-       |-- variables.data-00000-of-00001
-       |-- variables.index
-```
-***Tensorflow Frozen Graph***
-```
-|-- model
-   |-- frozen_inference_graph.pb
-   |-- graph_meta.json
-```
-**note:** `.pb` is the weight file which name must be `frozen_inference_graph.pb`, `.json` is the inputs and outputs definition file which name must be `graph_meta.json`, with contents like `{"input_names":["input:0"],"output_names":["output:0"]}`
-
-***Tensorflow Checkpoint***
-Please refer to [freeze checkpoint example](https://github.com/intel-analytics/bigdl/tree/master/python/orca/example/freeze_checkpoint)
-
-**Pytorch**
-
-```
-|-- model
-   |-- xx.pt
-```
-Running Pytorch model needs extra dependency and config. Refer to [here](https://github.com/intel-analytics/bigdl/blob/master/python/orca/example/torchmodel/train/README.md) to install dependencies, and set environment variable `$PYTHONHOME` to your python, e.g. python could be run by `$PYTHONHOME/bin/python` and library is at `$PYTHONHOME/lib/`.
-
-**OpenVINO**
-
-```
-|-- model
-   |-- xx.xml
-   |-- xx.bin
-```
-**BigDL**
-
-```
-|--model
-   |-- xx.model
-```
-**Caffe**
-
-```
-|-- model
-   |-- xx.prototxt
-   |-- xx.caffemodel
-```
-
-
-### Other Configuration
-The field `params` contains your inference parameter configuration.
-
-* core_number: the **batch size** you use for model inference, usually the core number of your machine is recommended. Thus you could just provide your machine core number at this field. We recommend this value to be not smaller than 4 and not larger than 512. In general, using larger batch size means higher throughput, but also increase the latency between batches accordingly.
-
-### High Performance Configuration Recommended
-#### Tensorflow, Pytorch
-1 <= thread_per_model <= 8, in config
-```
-# default: number of models used in serving
-# modelParallelism: core_number of your machine / thread_per_model
-```
-environment variable
-```
-export OMP_NUM_THREADS=thread_per_model
-```
-#### OpenVINO
-environment variable
-```
-export OMP_NUM_THREADS=core_number of your machine
-```
diff --git a/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-start.md b/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-start.md
deleted file mode 100644
index d8e8bef0..00000000
--- a/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-start.md
+++ /dev/null
@@ -1,87 +0,0 @@
-# Start Cluster Serving
-  
-## Launching Service of Serving
-
-Before do inference (predict), you have to start serving service. This section shows how to start/stop the service. 
-
-### Start
-You can use following command to start Cluster Serving.
-```
-cluster-serving-start
-```
-
-Normally, when calling `cluster-serving-start`, your `config.yaml` should be in current directory. You can also use `cluster-serving-start -c config_path` to pass config path `config_path` to Cluster Serving manually.
-
-### Stop
-You can use Flink UI in `localhost:8081` by default, to cancel your Cluster Serving job.
-
-Or you can use `${FLINK_HOME}/bin/flink list` to get serving job ID and call `${FLINK_HOME|/bin/flink cancel $ID`.
-
-### Shut Down
-You can use following command to shutdown Cluster Serving. This operation will stop all Cluster Serving jobs and Redis server. Note that your data in Redis will be removed when you shutdown. 
-```
-cluster-serving-shutdown
-```
-If you are using Docker, you could also run `docker rm` to shutdown Cluster Serving.
-### Start Multiple Serving
-To run multiple Cluster Serving job, e.g. the second job name is `serving2`, then use following configuration
-```
-# model path must be provided
-# modelPath: /path/to/model
-
-# name, default is serving_stream, you need to specify if running multiple servings
-# jobName: serving2
-```
-then call `cluster-serving-start` in this directory would start another Cluster Serving job with this new configuration.
-
-Then, in Python API, pass `name=serving2` argument during creating object, e.g.
-```
-input_queue=InputQueue(name=serving2)
-output_queue=OutputQueue(name=serving2)
-```
-Then the Python API would interact with job `serving2`.
-
-### HTTP Server
-If you want to use sync API for inference, you should start a provided HTTP server first. User can submit HTTP requests to the HTTP server through RESTful APIs. The HTTP server will parse the input requests and pub them to Redis input queues, then retrieve the output results and render them as json results in HTTP responses.
-
-#### Prepare
-User can download a bigdl-${VERSION}-http.jar from the Nexus Repository with GAVP: 
-```
-<groupId>com.intel.analytics.bigdl</groupId>
-<artifactId>bigdl-bigdl_${BIGDL_VERSION}-spark_${SPARK_VERSION}</artifactId>
-<version>${BIGDL_VERSION}</version>
-```
-User can also build from the source code:
-```
-mvn clean package -P spark_2.4+ -Dmaven.test.skip=true
-```
-#### Start the HTTP Server
-User can start the HTTP server with following command.
-```
-java -jar bigdl-bigdl_${BIGDL_VERSION}-spark_${SPARK_VERSION}-${BIGDL_VERSION}-http.jar
-```
-And check the status of the HTTP server with:
-```
-curl  http://${BINDED_HOST_IP}:${BINDED_HOST_PORT}/
-```
-If you get a response like "welcome to BigDL web serving frontend", that means the HTTP server is started successfully.
-#### Start options
-User can pass options to the HTTP server when start it:
-```
-java -jar bigdl-bigdl_${BIGDL_VERSION}-spark_${SPARK_VERSION}-${BIGDL_VERSION}-http.jar --redisHost="172.16.0.109"
-```
-All the supported parameter are listed here:
-* **interface**: the binded server interface, default is "0.0.0.0"
-* **port**: the binded server port, default is 10020
-* **redisHost**: the host IP of redis server, default is "localhost"
-* **redisPort**: the host port of redis server, default is 6379
-* **redisInputQueue**: the input queue of redis server, default is "serving_stream"
-* **redisOutputQueue**: the output queue of redis server, default is "result:" 
-* **parallelism**: the parallelism of requests processing, default is 1000
-* **timeWindow**: the timeWindow wait to pub inputs to redis, default is 0
-* **countWindow**: the timeWindow wait to ub inputs to redis, default is 56
-* **tokenBucketEnabled**: the switch to enable/disable RateLimiter, default is false
-* **tokensPerSecond**: the rate of permits per second, default is 100
-* **tokenAcquireTimeout**: acquires a permit from this RateLimiter if it can be obtained without exceeding the specified timeout(ms), default is 100
-
-**User can adjust these options to tune the performance of the HTTP server.**
diff --git a/docs/readthedocs/source/doc/Serving/QuickStart/serving-quickstart.md b/docs/readthedocs/source/doc/Serving/QuickStart/serving-quickstart.md
deleted file mode 100644
index 5edfde6a..00000000
--- a/docs/readthedocs/source/doc/Serving/QuickStart/serving-quickstart.md
+++ /dev/null
@@ -1,50 +0,0 @@
-# Cluster Serving Quick Start
-
-This section provides a quick start example for you to run BigDL Cluster Serving. To simplify the example, we use docker to run Cluster Serving. If you do not have docker installed, [install docker](https://docs.docker.com/install/) first. The quick start example contains all the necessary components so the first time users can get it up and running within minutes:
-
-* A docker image for BigDL Cluster Serving (with all dependencies installed)
-* A sample configuration file
-* A sample trained TensorFlow model, and sample data for inference
-* A sample Python client program
-
-Use one command to run Cluster Serving container. (We provide quick start model in older version of docker image, for newest version, please refer to following sections and we remove the model to reduce the docker image size).
-```
-(bigdl-cluster-serving publish is in progress, so use following for now)
-docker run --name cluster-serving -itd --net=host intelanalytics/bigdl-cluster-serving:0.9.1
-```
-Log into the container using `docker exec -it cluster-serving bash`, and run
-```
-cd cluster-serving
-cluster-serving-init
-```
-`bigdl.jar` and `config.yaml` is in your directory now.
-
-Also, you can see prepared TensorFlow frozen ResNet50 model in `resources/model` directory with following structure.
-
-```
-cluster-serving | 
-               -- | model
-                 -- frozen_graph.pb
-                 -- graph_meta.json
-```
-Modify `config.yaml` and add following to `model` config
-```
-model:
-    path: resources/model
-```
-
-Start Cluster Serving using `cluster-serving-start`. 
-
-Run python program `python3 image_classification_and_object_detection_quick_start.py -i resources/test_image` to push data into queue and get inference result. 
-
-Then you can see the inference output in console. 
-```
-cat prediction layer shape:  (1000,)
-the class index of prediction of cat image result:  292
-cat prediction layer shape:  (1000,)
-```
-Wow! You made it!
-
-Note that the Cluster Serving quick start example will run on your local node only. Check the [Deploy Your Own Cluster Serving](#deploy-your-own-cluster-serving) section for how to configure and run Cluster Serving in a distributed fashion.
-
-For more details, refer to following sections.
diff --git a/docs/readthedocs/source/doc/Serving/index.rst b/docs/readthedocs/source/doc/Serving/index.rst
deleted file mode 100644
index de542e40..00000000
--- a/docs/readthedocs/source/doc/Serving/index.rst
+++ /dev/null
@@ -1,66 +0,0 @@
-Cluster Serving
-=========================
-
-BigDL Cluster Serving is a lightweight distributed, real-time serving solution that supports a wide range of deep learning models (such as TensorFlow, PyTorch, Caffe, BigDL and OpenVINO models). It provides a simple pub/sub API, so that the users can easily send their inference requests to the input queue (using a simple Python API); Cluster Serving will then automatically manage the scale-out and real-time model inference across a large cluster (using distributed streaming frameworks such as Apache Spark Streaming, Apache Flink, etc.)
-
-----------------------
-
-
-
-.. grid:: 1 2 2 2
-    :gutter: 2
-
-    .. grid-item-card::
-
-        **Get Started**
-        ^^^
-
-        Documents in these sections helps you getting started quickly with Serving.
-
-        +++
-
-        :bdg-link:`Serving in 5 minutes <./QuickStart/serving-quickstart.html>` |
-        :bdg-link:`Installation <./ProgrammingGuide/serving-installation.html>`
-
-    .. grid-item-card::
-
-        **Key Features Guide**
-        ^^^
-
-        Each guide in this section provides you with in-depth information, concepts and knowledges about DLLib key features.
-
-        +++
-
-        :bdg-link:`Start Serving <./ProgrammingGuide/serving-start.html>` |
-        :bdg-link:`Inference <./ProgrammingGuide/serving-inference.html>`
-
-
-    .. grid-item-card::
-
-        **Examples**
-        ^^^
-
-        Cluster Serving Examples and Tutorials.
-
-        +++
-
-        :bdg-link:`Examples <./Example/example.html>`
-
-    .. grid-item-card::
-
-        **MISC**
-        ^^^
-
-        Cluster Serving
-
-        +++
-
-        :bdg-link:`FAQ <./FAQ/faq.html>` |
-        :bdg-link:`Contribute <./FAQ/contribute-guide.html>`
-
-
-
-..  toctree::
-    :hidden:
-
-    Cluster Serving Document <self>
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/UseCase/tensorboard.md b/docs/readthedocs/source/doc/UseCase/tensorboard.md
deleted file mode 100644
index e69de29b..00000000
diff --git a/docs/readthedocs/source/doc/UserGuide/colab.md b/docs/readthedocs/source/doc/UserGuide/colab.md
deleted file mode 100644
index 4c59b2a4..00000000
--- a/docs/readthedocs/source/doc/UserGuide/colab.md
+++ /dev/null
@@ -1,61 +0,0 @@
-# Colab User Guide
-
----
-
-You can use BigDL without any installation by using  [Google Colab](https://colab.research.google.com/).
-
-### 1. Open a Colab Notebook
-
-BigDL includes a collection of [notebooks](./notebooks.md) that can be directly opened and run in Colab. You can click 'Run in Google Colab' that opens the notebook on Colab directly. Click the "run" triangle on the left of each cell to run the notebook cell. When you run the first cell, you may face a pop-up saying 'Warning: This notebook was not authored by Google'; you should click on 'Run Anyway' to get rid of the warning. 
-
-### 2. Notebook Setup
-
-The first few cells of the notebook contains the code necessary to set up BigDL and other libraries.
-
-**Install Java 8**
-
-Run the following command on the Google Colab to install jdk 1.8
-
-```bash
-# Install jdk8
-!apt-get install openjdk-8-jdk-headless -qq > /dev/null
-# Set jdk environment path which enables you to run Pyspark in your Colab environment.
-import os
-os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
-!update-alternatives --set java /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
-```
-
-**Install Conda**
-
-Run the code bellow to install [conda](https://docs.conda.io/en/latest/) on Colab.
-
-```bash
-# Install Miniconda
-!wget https://repo.continuum.io/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh
-!chmod +x Miniconda3-4.5.4-Linux-x86_64.sh
-!./Miniconda3-4.5.4-Linux-x86_64.sh -b -f -p /usr/local
-
-# Update Conda
-!conda install --channel defaults conda python=3.7 --yes
-!conda update --channel defaults --all --yes
-
-# Append to the sys.path
-import sys
-_ = (sys.path
-        .append("/usr/local/lib/python3.7/site-packages"))
-
-os.environ['PYTHONHOME']="/usr/local"
-```
-
-**Install BigDL**
-
-Install the latest pre-release version. 
-```bash
-# Install latest pre-release version of BigDL
-# Installing BigDL from pip will automatically install all BigDL modules and their dependencies.
-!pip install --pre --upgrade bigdl
-```
-
-**Install Python Dependencies**
-
-As Colab python environment provides some built-in Python libraries, you should check if the library versions are compatible with your application. You may refer [compatibility](./python.md) to specify the python library version that BigDL supports.
diff --git a/docs/readthedocs/source/doc/UserGuide/contributor.rst b/docs/readthedocs/source/doc/UserGuide/contributor.rst
deleted file mode 100644
index 92ad50af..00000000
--- a/docs/readthedocs/source/doc/UserGuide/contributor.rst
+++ /dev/null
@@ -1,5 +0,0 @@
-Contributor Guide
-=========================
-
-* `Developer Guide <./develop.html>`_ provides instructions on how to build from source and contribute to BigDL source code.
-* `Documentation Guide <./documentation.html>`_ provides tips and guidelines for adding/modifying BigDL documents.
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/UserGuide/databricks.md b/docs/readthedocs/source/doc/UserGuide/databricks.md
deleted file mode 100644
index b0e7a4d3..00000000
--- a/docs/readthedocs/source/doc/UserGuide/databricks.md
+++ /dev/null
@@ -1,175 +0,0 @@
-# Databricks User Guide
-
----
-
-You can run BigDL program on the [Databricks](https://databricks.com/) cluster as follows.
-### 1. Create a Databricks Cluster
-
-- Create either an [AWS Databricks](https://docs.databricks.com/getting-started/try-databricks.html) workspace or an [Azure Databricks](https://docs.microsoft.com/en-us/azure/azure-databricks/) workspace. 
-- Create a Databricks [cluster](https://docs.databricks.com/clusters/create.html) using the UI. Choose Databricks runtime version. This guide is tested on Runtime 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12).
-
-![](images/create-cluster.png)
-
-### 2. Generate initialization script
-
-[Init script](https://learn.microsoft.com/en-us/azure/databricks/clusters/init-scripts) is used to Install BigDL or other libraries. First, you need to put the **init script** into [DBFS](https://docs.databricks.com/dbfs/index.html), you can use one of the following ways.
-
-Note that as Python 3.8 is highly recommended by BigDL, please set the Databricks runtime version to 9.1 LTS or 10.4 LTS to use the Python 3.8 environment.
-
-**a. Generate init script in Databricks notebook**
-
-Create a Databricks notebook and execute
-
-```python
-init_script = """
-#!/bin/bash
-
-# install bigdl-orca, add other bigdl modules if you need
-/databricks/python/bin/pip install pip install --pre --upgrade bigdl-orca-spark3[ray]
-
-# install other necessary libraries, here we install libraries needed in this tutorial
-/databricks/python/bin/pip install tensorflow==2.9.1
-/databricks/python/bin/pip install tqdm
-/databricks/python/bin/pip install torch==1.11.0+cpu torchvision==0.12.0+cpu tensorboard -f https://download.pytorch.org/whl/torch_stable.html
-
-# copy bigdl jars to databricks
-cp /databricks/python/lib/python3.8/site-packages/bigdl/share/*/lib/*.jar /databricks/jars
-"""
-
-# Change the first parameter to your DBFS path
-dbutils.fs.put("dbfs:/FileStore/scripts/init.sh", init_script, True)
-```
-
-To make sure the init script is in DBFS, in the left panel, click **Data > DBFS > check your script save path**.
-
-> if you do not see DBFS in your panel, see [Appendix A](#appendix-a).
-
-**b. Create init script in local and upload to DBFS**
-
-Create a file **init.sh**(or any other filename) in your computer, the file content is
-
-```bash
-#!/bin/bash
-
-# install bigdl-orca, add other bigdl modules if you need
-/databricks/python/bin/pip install pip install --pre --upgrade bigdl-orca-spark3[ray]
-
-# install other necessary libraries, here we install libraries needed in this tutorial
-/databricks/python/bin/pip install tensorflow==2.9.1
-/databricks/python/bin/pip install tqdm
-/databricks/python/bin/pip install torch==1.11.0+cpu torchvision==0.12.0+cpu tensorboard -f https://download.pytorch.org/whl/torch_stable.html
-
-# copy bigdl jars to databricks
-cp /databricks/python/lib/python3.8/site-packages/bigdl/share/*/lib/*.jar /databricks/jars
-```
-
-Then upload **init.sh** to DBFS. In Databricks left panel, click **Data > DBFS > Choose or create upload directory > Right click > Upload here**.
-
-![](images/upload-init-script.png)
-
-Now the init script is in DBFS, right click the init.sh and choose **Copy path**, copy the **Spark API Format** path.
-
-![](images/copy-script-path.png)
-
-__Notes:__
-* If Databricks returns an init script failure, please check your Databricks runtime and Python version.
-* If your Databricks runtime version is 11.2 or later, click the `Edit` button and change the version to 9.1 LTS or 10.4 LTS.
-
-### 3. Set Spark configuration
-
-In the left panel, click **Compute > Choose your cluster > edit > Advanced options > Spark > Confirm**. You can provide custom [Spark configuration properties](https://spark.apache.org/docs/latest/configuration.html) in a cluster configuration. Please set it according to your cluster resource and program needs.
-
-![](images/spark-config.png)
-
-See below for an example of Spark config setting **needed** by BigDL. Here it sets 2 core per executor. Note that "spark.cores.max" needs to be properly set below.
-
-```
-spark.executor.cores 2
-spark.cores.max 4
-```
-
-### 4. Install BigDL Libraries
-
-Use the init script from [step 2](#2-generate-initialization-script) to install BigDL libraries. In the left panel, click **Compute > Choose your cluster > edit > Advanced options > Init Scripts > Paste init script path > Add > Confirm**.
-
-![](images/config-init-script.png)
-
-Then start or restart the cluster. After starting/restarting the cluster, the libraries specified in the init script are all installed.
-
-### 5. Run BigDL on Databricks
-
-Open a new notebook, and call `init_orca_context` at the beginning of your code (with `cluster_mode` set to "spark-submit").
-
-```python
-from bigdl.orca import init_orca_context, stop_orca_context
-init_orca_context(cluster_mode="spark-submit")
-```
-
-Output on Databricks:
-
-![](images/init-orca-context.png)
-
-**Run Examples**
-
-- [Keras example on Databricks](https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/databricks/tf_keras_ncf.ipynb)
-- [Pytorch example on Databricks](https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/databricks/pytorch_fashion_mnist.ipynb)
-> Note
-> * If you run Pytorch example with `spark` backend (i.e.`Estimator.from_torch(..., backend="spark")`) on multiple node cluster, you need to set GLOO_SOCKET_IFNAME="eth0" in cluster's configuration as below:
-![](images/db-gloo-socket.png)
-> * If you want to save model to DBFS, or load model from DBFS, the save/load path should be the **File API Format** on Databricks, which means your save/load path should start with `/dbfs`.
-
-### 6. Other ways to install third-party libraries on Databricks if necessary
-
-If you want to use other ways to install third-party libraries, check related Databricks documentation of [libraries for AWS Databricks](https://docs.databricks.com/libraries/index.html) and [libraries for Azure Databricks](https://docs.microsoft.com/en-us/azure/databricks/libraries/).
-
-### Appendix A
-
-If there is no DBFS in your panel,  go to **User profile > Admin Console > Workspace settings > Advanced > Enabled DBFS File Browser**
-
-![](images/dbfs.png)
-
-### Appendix B
-
-Use **Databricks CLI** to upload file to DBFS. When you upload a large file to DBFS, using Databricks CLI could be faster than using the Databricks web UI.
-
-**Install and config Azure Databricks CLI**
-
-1. Install Python, need Python version 2.7.9 and above if you’re using Python 2 or Python 3.6 and above if you’re using Python 3.
-
-2. Run `pip install databricks-cli`
-
-3. Set authentication, Click **user profile icon > User Settings > Access tokens > Generate new token > generate > copy the token**, make sure to **copy** the token and store it in a secure location, **it won't show again**.
-
-   ![](images/token.png)
-
-4. Copy the URL of Databricks host, the format is `https://adb-<workspace-id>.<random-number>.azuredatabricks.net`, you can copy it from your Databricks web page URL.
-
-   ![](images/url.png)
-
-5. In cmd run `dbfs config --token` as shown below:
-
-   ```
-   dbfs configure --token
-   Databricks Host (should begin with https://): https://your.url.from.step.4
-   Token: your-token-from-step-3
-   ```
-
-6. Verify whether you are able to connect to DBFS, run "databricks fs ls".
-
-   ![](images/verify-dbfs.png)
-
-**Upload through Databricks CLI**
-
-Now, we can use Databricks CLI to upload file to DBFS. run command:
-
-```
-dbfs cp /your/local/filepath/bigdl-assembly-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar dbfs:/FileStore/jars/stable/bigdl-assembly-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar
-```
-
-After command finished, check DBFS in Databricks, in left panel, click **Data > DBFS > your upload directory**, if you do not see DBFS in your panel, see [Appendix A](#appendix-a).
-
-**Install package from DBFS**
-
-In the left panel, click **Compute > choose your cluster > Libraries > Install new > Library Source(DBFS/ADLS) > Library Type(your package type)**.
-
-![](images/install-zip.png)
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/UserGuide/develop.md b/docs/readthedocs/source/doc/UserGuide/develop.md
deleted file mode 100644
index 14ba04d5..00000000
--- a/docs/readthedocs/source/doc/UserGuide/develop.md
+++ /dev/null
@@ -1,173 +0,0 @@
-# Developer Guide
-
----
-
-BigDL source code is available at [GitHub](https://github.com/intel-analytics/BigDL):
-
-```bash
-git clone https://github.com/intel-analytics/BigDL.git
-```
-
-By default, `git clone` will download the development version of BigDL. If you want a release version, you can use the command `git checkout` to change the specified version.
-
-
-### 1. Python
-
-#### 1.1 Build
-
-To generate a new [whl](https://pythonwheels.com/) package for pip install, you can run the following script:
-
-```bash
-cd BigDL/python/dev
-bash release_default_linux_spark2.sh default false false false  # build on Spark 2.4.6 for linux
-# Use release_default_linux_spark3.sh to build on Spark 3.1.3 for linux
-# Use release_default_mac_spark2.sh to build on Spark 2.4.6 for mac
-# Use release_default_mac_spark3.sh to build on Spark 3.1.3 for mac
-```
-
-**Arguments:**
-
-- The first argument is the BigDL __version__ to build for. 'default' means the default version (`BigDL/python/version.txt`) for the current branch. You can also specify a different version if you wish, e.g., '0.14.0.dev1'.
-- The second argument is whether to __quick build__ BigDL Scala dependencies. You need to set it to be 'false' for the first build. In later builds, if you don't make any changes in BigDL Scala, you can set it to be 'true' so that the Scala dependencies would not be re-built.
-- The third argument is whether to __upload__ the packages to pypi. Set it to 'false' if you are simply developing BigDL for your own usage.
-- The fourth argument is whether to add __spark suffix__ (i.e. -spark2 or -spark3) to BigDL package names. Just set this to be 'false' if you are simply developing BigDL for your own usage.
-- You can also add other Maven profiles to build the package (if any) after the fourth argument, for example '-Ddata-store-url=..', etc.
-
-
-After running the above command, you will find a `whl` file for each submodule of BigDL and you can then directly pip install them to your local Python environment:
-```bash
-# Install bigdl-nano
-cd BigDL/python/nano/src/dist
-pip install bigdl_nano-*.whl
-
-# Install bigdl-dllib
-cd BigDL/python/dllib/src/dist
-pip install bigdl_dllib-*.whl
-
-# Install bigdl-orca, which depends on bigdl-dllib and you need to install bigdl-dllib first
-cd BigDL/python/orca/src/dist
-pip install bigdl_orca-*.whl
-
-# Install bigdl-friesian, which depends on bigdl-orca and you need to install bigdl-dllib and bigdl-orca first
-cd BigDL/python/friesian/src/dist
-pip install bigdl_friesian-*.whl
-
-# Install bigdl-chronos, which depends on bigdl-orca and bigdl-nano. You need to install bigdl-dllib, bigdl-orca and bigdl-nano first
-cd BigDL/python/chronos/src/dist
-pip install bigdl_chronos-*.whl
-
-# Install bigdl-serving
-cd BigDL/python/serving/src/dist
-pip install bigdl_serving-*.whl
-```
-
-See [here](./python.md) for more instructions to run BigDL after pip install.
-
-
-#### 1.2 IDE Setup
-Any IDE that support Python should be able to run BigDL. PyCharm works fine for us.
-
-You need to do the following preparations before starting the IDE to successfully run a BigDL Python program in the IDE:
-
-- Build BigDL; see [here](#build) for more instructions.
-- Prepare Spark environment by either setting `SPARK_HOME` as the environment variable or `pip install pyspark`. Note that the Spark version should match the one you build BigDL on.
-- Check the jars under `BigDL/dist/lib` and set the environment variable `BIGDL_CLASSPATH`. Modify SPARKVERSION and BIGDLVERSION(Scala) as appropriate:
-  ```bash
-  export BIGDL_CLASSPATH=BigDL/dist/lib/bigdl-dllib-spark_SPARKVERSION-BIGDLVERSION-jar-with-dependencies.jar:BigDL/dist/lib/bigdl-orca-spark_SPARKVERSION-BIGDLVERSION-jar-with-dependencies.jar:BigDL/dist/lib/bigdl-friesian-spark_SPARKVERSION-BIGDLVERSION-jar-with-dependencies.jar
-  ```
-- Configure BigDL source files to the Python interpreter:
-
-  You can easily do this after launching PyCharm by right clicking the folder `BigDL/python/dllib/src` -> __Mark Directory As__ -> __Sources Root__ (also do this for `BigDL/python/nano/src`, `BigDL/python/orca/src`, `BigDL/python/friesian/src`, `BigDL/python/chronos/src`, `BigDL/python/serving/src` if necessary).
-
-  Alternatively, you can add BigDL source files to `PYTHONPATH`:
-  ```bash
-  export PYTHONPATH=BigDL/python/dllib/src:BigDL/python/nano/src:BigDL/python/orca/src:BigDL/python/friesian/src:BigDL/python/chronos/src:BigDL/python/serving/src:$PYTHONPATH
-  ```
-
-- Add `spark-bigdl.conf` to `PYTHONPATH`:
-  ```bash
-  export PYTHONPATH=BigDL/python/dist/conf/spark-bigdl.conf:$PYTHONPATH
-  ```
-
-- Install and add `tflibs` to `TF_LIBS_PATH`:
-  ```bash
-  # Install bigdl-tf and bigdl-math
-  pip install bigdl-tf bigdl-math
-
-  # Configure TF_LIBS_PATH
-  export TF_LIBS_PATH=$(python -c 'import site; print(site.getsitepackages()[0])')/bigdl/share/tflibs
-  ```
-
-
-The above environment variables should be available when running or debugging code in the IDE. When running applications in PyCharm, you can add runtime environment variables by clicking  __Run__ -> __Edit Configurations__; then in the __Run/Debug Configurations__ panel, you can add necessary environment variables to your applications.
-
-
-#### 1.3 Terminal Setup
-
-Besides setting the environment variables mentioned above manually for Linux users, we also provide a solution to set them with a script:
-
-```bash
-# Install bigdl-tf and bigdl-math
-pip install bigdl-tf bigdl-math
-
-cd BigDL/python/friesian
-source dev/prepare_env.sh
-```
-
-You can verify the BigDL environment by running the following example.
-
-```bash
-python BigDL/python/dllib/examples/autograd/custom.py
-```
-
-Note that this approach will only work temporarily for this terminal. 
-
-
-### 2. Scala
-
-#### 2.1 Build
-
-Maven 3 is needed to build BigDL, you can download it from the [maven website](https://maven.apache.org/download.cgi).
-
-After installing Maven 3, please set the environment variable MAVEN_OPTS as follows:
-```bash
-$ export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
-```
-
-**Build using `make-dist.sh`**
-
-It is highly recommended that you build BigDL using the [make-dist.sh script](https://github.com/intel-analytics/BigDL/blob/branch-2.0/scala/make-dist.sh) with **Java 8**.
-
-You can build BigDL with the following commands:
-```bash
-$ cd scala
-$ bash make-dist.sh
-```
-After that, you can find a `dist` folder, which contains all the needed files to run a BigDL program. The files in `dist` include:
-
-* **dist/lib/bigdl-VERSION-jar-with-dependencies.jar**: This jar package contains all dependencies except Spark classes.
-* **dist/lib/bigdl-VERSION-python-api.zip**: This zip package contains all Python files of BigDL.
-
-The instructions above will build BigDL with Spark 2.4.6. To build with other spark versions, for example building analytics-zoo with spark 2.2.0, you can use `bash make-dist.sh -Dspark.version=2.2.0`.  
-
-**Build with JDK 11**
-
-Spark starts to supports JDK 11 and Scala 2.12 at Spark 3.0. You can use `-P spark_3.x` to specify Spark3 and scala 2.12. Additionally, `make-dist.sh` default uses Java 8. To compile with Java 11, it is required to specify building opts `-Djava.version=11 -Djavac.version=11`. You can build with `make-dist.sh`.
-
-It's recommended to download [Oracle JDK 11](https://www.oracle.com/java/technologies/javase-jdk11-downloads.html). This will avoid possible incompatibilities with maven plugins. You should update `PATH` and make sure your `JAVA_HOME` environment variable is set to Java 11 if you're running from the command line. If you're running from an IDE, you need to make sure it is set to run maven with your current JDK. 
-
-Build with `make-dist.sh`:
- 
-```bash
-$ bash make-dist.sh -P spark_3.x -Djava.version=11 -Djavac.version=11
-```
-
-#### 2.2 IDE Setup
-
-BigDL uses maven to organize project. You should choose an IDE that supports Maven project and scala language. IntelliJ IDEA works fine for us.
-
-In IntelliJ, you can open BigDL project root directly, and the IDE will import the project automatically. If not imported automatically, right click `scala/pom.xml` and choose `Add as Maven Project`.
-
-We set the scopes of spark related libraries to `provided` in the maven pom.xml, which, however, will cause a problem in IDE  (throwing `NoClassDefFoundError` when you run applications). You can easily change the scopes using the `all-in-one` profile.
-
-* In Intellij, go to View -> Tools Windows -> Maven Projects. Then in the Maven Projects panel, Profiles -> click "all-in-one".
diff --git a/docs/readthedocs/source/doc/UserGuide/docker.md b/docs/readthedocs/source/doc/UserGuide/docker.md
deleted file mode 100644
index f07affba..00000000
--- a/docs/readthedocs/source/doc/UserGuide/docker.md
+++ /dev/null
@@ -1,141 +0,0 @@
-# Docker User Guide
-
----
-
-### 1. Pull Docker Image
-
-You may pull a Docker image from the  [Docker Hub](https://hub.docker.com/r/intelanalytics/bigdl/tags).
-
-To pull the nightly build version, use
-```bash
-sudo docker pull intelanalytics/bigdl:2.1.0-SNAPSHOT
-```
-
-To pull other versions, please refer to [BigDL Docker Hub Tags](https://hub.docker.com/r/intelanalytics/bigdl/tags?page=1&ordering=last_updated), select a tag and use
-```bash
-sudo docker pull intelanalytics/bigdl:tag_name
-```
-
-**Configuring resources**
-
-For Docker Desktop users, the default resources (2 CPUs and 2GB memory) are relatively small, and you may want to change them to larger values (8GB memory and 4 CPUs should be a good estimate for most examples, and the exact memory requirements vary for different applications). For more information, view the Docker documentation for [MacOS](https://docs.docker.com/docker-for-mac/#resources) and [Windows](https://docs.docker.com/docker-for-windows/#resources).
-
-**Speed up pulling image by adding mirrors**
-
-To speed up pulling the image from DockerHub, you may add the registry-mirrors key and value by editing `daemon.json` (located in `/etc/docker/` folder on Linux):
-```
-{
-  "registry-mirrors": ["https://<my-docker-mirror-host>"]
-}
-```
-For instance, users in China may add the USTC mirror as follows:
-```
-{
-  "registry-mirrors": ["https://docker.mirrors.ustc.edu.cn"]
-}
-```
-
-
-After that, flush changes and restart docker：
-
-```
-sudo systemctl daemon-reload
-sudo systemctl restart docker
-```
-
-### 2. Launch Docker Container
-
-After pulling the BigDL Docker image, you can launch an BigDL Docker container:
-```
-sudo docker run -it --rm --net=host \
-    -e http_proxy=http://your-proxy-host:your-proxy-port \
-    -e https_proxy=https://your-proxy-host:your-proxy-port \
-    intelanalytics/bigdl:2.1.0-SNAPSHOT bash
-```
-
-* The value 12345 is a user specified port number.
-* The value "your-token" is a user specified string.
-* If you need to use http/https proxy, please use -e http_proxy/https_proxy
-
-Once the container is successfully launched, you will automatically login into the container and see this as the output:
-```
-root@[hostname]:/opt/work#
-```
-
-The /opt/work directory contains:
-
-* start-notebook.sh is used for starting the jupyter notebook. You can specify the environment settings and spark settings to start a specified jupyter notebook.
-* bigdl-${BigDL_VERSION} is the BigDL home of BigDL distribution.
-* spark-${SPARK_VERSION} is the Spark home.
-* BigDL is cloned from https://github.com/intel-analytics/BigDL.git, contains apps, examples using BigDL.
-* opt/download-bigdl.sh is used for downloading BigDL distributions.
-
-### 3. Run Jupyter Notebook Examples in the Container
-
-After a Docker container is launched and user login into the container, you can start the Jupyter Notebook service inside the container.
-
-#### 3.1 Start the Jupyter Notebook services
-
-In the `/opt/work` directory, run this command line to start the Jupyter Notebook service:
-```
-./start-notebook.sh
-```
-
-You will see the output message like below. This means the Jupyter Notebook service has started successfully within the container.
-```
-[I 07:40:39.354 NotebookApp] Serving notebooks from local directory: /opt/work/bigdl-2.1.0-SNAPSHOT/apps
-[I 07:40:39.355 NotebookApp] Jupyter Notebook 6.4.6 is running at:
-[I 07:40:39.355 NotebookApp] http://(the-host-name):12345/?token=...
-[I 07:40:39.355 NotebookApp]  or http://127.0.0.1:12345/?token=...
-[I 07:40:39.355 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
-```
-
-#### 3.2 Connect to Jupyter Notebook service from a browser
-
-After the Jupyter Notebook service is successfully started, you can connect to the Jupyter Notebook service from a browser.
-
-1. Get the IP address of the container
-2. Launch a browser, and connect to the Jupyter Notebook service with the URL: https://container-ip-address:port-number/?token=your-token
-As a result, you will see the Jupyter Notebook like this:
-
-![](images/notebook1.jpg)
-
-#### 3.3 Run BigDL Jupyter Notebooks
-
-After connecting to the Jupyter Notebook in the browser, you can run multiple BigDL Jupyter Notebook examples. The example shown below is the “dogs-vs-cats”.
-
-* Click into the "dogs-vs-cats" folder:
-
-![](images/notebook2.jpg)
-
-* Open the notebook file:
-
-![](images/notebook3.jpg)
-
-* Start to run the "dogs-vs-cats" notebook:
-
-![](images/notebook4.jpg)
-
-* Run through the example and check the prediction:
-
-![](images/notebook5.jpg)
-
-### 4. Shut Down Docker Container
-
-You should shut down the BigDL Docker container after using it.
-
-1. You can list all the active Docker containers by command line:
-   ```
-   sudo docker ps
-   ```
-
-2. You will see your docker containers:
-   ```
-   CONTAINER ID        IMAGE                                        COMMAND                  CREATED             STATUS              PORTS               NAMES
-   40de2cdad025        intelanalytics/bigdl:2.1.0-SNAPSHOT         "/opt/work/start-n..."   3 hours ago         Up 3 hours                              upbeat_al
-   ```
-
-3. Shut down the corresponding docker container by its ID:
-   ```
-   $sudo docker rm -f 40de2cdad025
-   ```
diff --git a/docs/readthedocs/source/doc/UserGuide/documentation.md b/docs/readthedocs/source/doc/UserGuide/documentation.md
deleted file mode 100644
index ec151088..00000000
--- a/docs/readthedocs/source/doc/UserGuide/documentation.md
+++ /dev/null
@@ -1,642 +0,0 @@
-# Documentation Guide
-
-Here list several writing tips and guidelines you could refer to if you want to add/modify documents for BigDL documentation. The source code of our documentation is available [here](https://github.com/intel-analytics/BigDL/tree/main/docs/readthedocs).
-
-```eval_rst
-.. tip::
-
-   You could refer `here <https://github.com/intel-analytics/BigDL/blob/main/docs/readthedocs/README.md>`_ if you would like to test your local changes to BigDL documentation.
-```
-
-## 1. How to add a new document
-### 1.1 Decide whether to add a reStructuredText (`.rst`) file or a CommonMark (`.md`) file
-In our documentation, both reStructuredText (`.rst`) and CommonMark (`.md`) files are allowed to use. In convension, we use `.rst` file in index pages, and `.md` files for other pages.
-
-Here shows an overview of our documentation structure tree:
-
-```eval_rst
-.. graphviz::
-
-   digraph DocStructure {
-      graph [tooltip=" " splines=ortho]
-      node [color="#0171c3" shape=box fontname="Arial" fontsize=12 tooltip=" "]
-      edge [tooltip=" "]
-      
-      N1 [label="BigDL (.rst)" style=filled fontcolor="#ffffff"]
-      
-      N1_1 [label="User guide (.rst)" style=filled fontcolor="#ffffff"]
-      N1_2 [label="Powered by (.md)" style=rounded]
-      N1_3 [label="Orca (.rst)" style=filled fontcolor="#ffffff"]
-      N1_4 [label="Nano (.rst)" style=filled fontcolor="#ffffff"]
-      N1_5 [label="DLlib (.rst)" style=filled fontcolor="#ffffff"]
-      N1_6 [label="Chronos (.rst)" style=filled fontcolor="#ffffff"]
-      N1_7 [label="Fresian (.rst)" style=filled fontcolor="#ffffff"]
-      N1_8 [label="PPML (.rst)" style=filled fontcolor="#ffffff"]
-      N1_9 [label="..." shape=plaintext]
-      
-      N1_1_1 [label="Python (.md)" style=rounded]
-      N1_1_2 [label="Scala (.md)" style=rounded]
-      N1_1_3 [label="..." shape=plaintext]
-      
-      N1_8_1 [label="PPML Intro. (.md)" style=rounded]
-      N1_8_2 [label="User Guide (.md)" style=rounded]
-      N1_8_3 [label="Tutorials (.rst)" style="filled" fontcolor="#ffffff"]
-      N1_8_4 [label="..." shape=plaintext]
-      
-      
-      N1_8_3_1 [label="..." shape=plaintext]
-      
-      N1_3_1 [label="..." shape=plaintext]
-      N1_4_1 [label="..." shape=plaintext]
-      N1_5_1 [label="..." shape=plaintext]
-      N1_6_1 [label="..." shape=plaintext]
-      N1_7_1 [label="..." shape=plaintext]
-      
-      N1 -> N1_1
-      N1 -> N1_2
-      N1 -> N1_3 -> N1_3_1
-      N1 -> N1_4 -> N1_4_1
-      N1 -> N1_5 -> N1_5_1
-      N1 -> N1_6 -> N1_6_1
-      N1 -> N1_7 -> N1_7_1
-      N1 -> N1_8
-      N1 -> N1_9
-      
-      N1_1 -> N1_1_1
-      N1_1 -> N1_1_2
-      N1_1 -> N1_1_3
-      
-      N1_8 -> N1_8_1
-      N1_8 -> N1_8_2
-      N1_8 -> N1_8_3 -> N1_8_3_1
-      N1_8 -> N1_8_4
-   }
-```
-
-Index pages (nodes filled with blue) are the ones supposed to lead to further pages. In the structure above, they are nodes with descendants.
-
-```eval_rst
-.. note::
-   
-   In convension, we use ``.rst`` file for index pages becuase various web components (such as cards, note boxes, tabs, etc.) are more straightforward to be inserted in our documentation through reStructuredText. And it is a common case in our documentation that index pages include various web components.
-```
-
-### 1.2 Add the new document to the table of contents (ToC)
-For clear navigation purposes, it is recommended to put the document in the ToC. To do this, you need to insert the relative path to the newly-added file into the [`_toc.yml`](https://github.com/intel-analytics/BigDL/blob/main/docs/readthedocs/source/_toc.yml) file, according to its position in the structure tree.
-
-```eval_rst
-.. tip::
-
-   When adding a new document, you should always check whether to put relative link directing to it inside its parent index page, or inside any other related pages.
-
-.. warning::
-
-   According to `sphinx-external-toc <https://sphinx-external-toc.readthedocs.io/en/latest/user_guide/sphinx.html#basic-structure>`_ document, "each document file can only occur once in the ToC".
-```
-
-For API related documents, we still use in-file `.. toctree::` directives instead of putting them inside `_toc.yml`. You could refer [here](https://github.com/intel-analytics/BigDL/tree/main/docs/readthedocs/source/doc/PythonAPI) for example usages.
-
-## 2. Differentiate the syntax of reStructuredText and CommonMark
-As mentioned above, our documentation includes both `.rst` and `.md` files. They have different syntax, please make sure you do not mix the usage of them.
-
-```eval_rst
-.. seealso::
-
-   You could refer `here <https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html>`_ for reStructuredText syntax examples, and `here <https://spec.commonmark.org/>`_ for CommonMark specifications.
-```
-
-Here list several use cases where syntax in `.rst` and `.md` are often confused:
-
-<table class="table bigdl-documentation-guide-tables">
-<tr><th></th><th>reStructuredText</th><th>CommonMark</th></tr>
-<tr><td>Inline code</td>
-<td>
-
-```rst
-``inline code``
-```
-</td>
-<td>
-
-```md
-`inline code`
-```
-
-</td></tr>
-<tr><td>Hyperlinks</td>
-<td>
-
-```rst
-`Relative link text <relatve/path/to/the/file>`_ 
-`Absolute link text <https://www.example.com/>`_ 
-```
-
-</td>
-<td>
-
-```md
-[Relative link text](relatve/path/to/the/file)
-[Absolute link text](https://www.example.com/)
-```
-
-</td></tr>
-<tr><td>Italic</td>
-<td>
-
-```rst
-`italicized text`
-*italicized text*
-```
-</td>
-<td>
-
-```md
-*italicized text*
-
-```
-
-</td></tr>
-<tr><td>Italic & bold</td>
-<td>
-
-Not supported, needed help with css
-
-</td>
-<td>
-
-```md
-***italicized & bold text***
-```
-
-</td></tr>
-</table>
-
-```eval_rst
-.. note::
-
-   When linking to a ``.rst`` file in a ``.md`` file, replace the ``.rst`` with ``.html`` in the relative path to avoid errors.
-   That is, if you want to link to the ``example.rst`` in a ``.md`` file, use 
-
-   .. code-block:: md
-
-      [Example](relatve/path/to/example.html)
-```
-
-### 2.1 Tips when adding docstrings in source code for API documentation
-According to the [`sphinx.ext.autodoc`](https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#module-sphinx.ext.autodoc) document, docstrings should be written in reStructuredText. We need to make sure that we are using reStructuredText syntax in the source code docstrings for API documentation.
-
-There are two [field lists](https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#field-lists) syntax often used in API documentation for parameter definition and return values. Let us take a snippet from [`bigdl.nano.pytorch.InferenceOptimizer.get_best_model`](../PythonAPI/Nano/pytorch.html#bigdl.nano.pytorch.InferenceOptimizer.get_best_model) as an example:
-```rst
-:param use_ipex: (optional) if not None, then will only find the
-       model with this specific ipex setting.
-:param accuracy_criterion: (optional) a float represents tolerable
-       accuracy drop percentage, defaults to None meaning no accuracy control.
-:return: best model, corresponding acceleration option
-```
-
-```eval_rst
-.. important::
-
-   The following lines of one parameter/return definition should be indented to be rendered correctly.
-
-.. tip::
-
-   Please always check whether corresponding API documentation is correctly rendered when changes made to the docstrings.
-```
-## 3. Common components in `.rst` files
-
-<table class="table bigdl-documentation-guide-tables">
-<tr><td>Headers</td>
-<td>
-
-```rst
-Header Level 1
-=========================
-
-Header Level 2
--------------------------
-
-Header Level 3
-~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Header Level 4
-^^^^^^^^^^^^^^^^^^^^^^^^^
-```
-
-</td>
-<td>
-
-Note that the underline symbols should be at least as long as the header texts.
-
-Also, **we do not expect maually-added styles to headers.**
-
-You could refer [here](https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#sections) for more information on reStructuredText sections.
-
-</td></tr>
-<tr><td>Lists</td>
-<td>
-
-```rst
-* A unordered list
-* The second item of the unordered list
-  with two lines
-
-#. A numbered list
-
-   1. A nested numbered list
-   2. The second nested numbered list
-
-#. The second item of 
-   the numbered list
-```
-
-</td>
-<td>
-
-Note that the number of spaces indented depends on the markup. That is, if we use '* '/'#. '/'10. ' for the list, the following contents belong to the list or the nested lists after it should be indented by 2/3/4 spaces.
-
-Also note that blanks lines are needed around the nested list.
-
-You could refer [here](https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#lists-and-quote-like-blocks) for more information on reStructuredText lists.
-
-</td></tr>
-<tr><td>
-
-Note, <br>
-Warning, <br>
-Danger, <br>
-Tip, <br>
-Important, <br>
-See Also <br>
-boxes
-
-</td>
-<td>
-
-```rst
-.. note::
-
-   This is a note box.
-
-.. warning::
-
-   This is a warning box.
-
-.. danger::
-
-   This is a danger box.
-
-.. tip::
-
-   This is a tip box.
-
-.. important::
-
-   This is an important box.
-
-.. seealso::
-
-   This is a see also box.
-```
-
-</td>
-<td>
-
-```eval_rst
-.. note::
-
-   This is a note box.
-
-.. warning::
-
-   This is a warning box.
-
-.. danger::
-
-   This is a danger box.
-
-.. tip::
-
-   This is a tip box.
-
-.. important::
-
-   This is an important box.
-
-.. seealso::
-
-   This is a see also box.
-```
-
-</td></tr>
-<tr><td>Code blocks</td>
-<td>
-
-```rst
-.. code-block:: [language]
-
-   some code in this language
-
-.. code-block:: python
-
-   some python code
-```
-
-</td>
-<td>
-
-All the supported language argument for syntax highlighting can be found [here](https://pygments.org/docs/lexers/).
-
-</td></tr>
-<tr><td>Tabs</td>
-<td>
-
-```rst
-.. tabs::
-
-   .. tab:: Title 1
-
-      Contents for tab 1
-
-   .. tab:: Title 2
-
-      Contents for tab 2
-
-      .. code-block:: python
-
-         some python code
-```
-
-</td>
-<td>
-
-```eval_rst
-.. tabs::
-
-   .. tab:: Title 1
-
-      Contents for tab 1
-
-   .. tab:: Title 2
-
-      Contents for tab 2
-
-      .. code-block:: python
-
-         some python code
-```
-
-You could refer [here](https://sphinx-tabs.readthedocs.io/en/v3.4.0/) for more information on the usage of tabs.
-
-</td></tr>
-<tr><td>Cards in grids</td>
-<td>
-
-```rst
-.. grid:: 1 2 2 2
-   :gutter: 2
-
-   .. grid-item-card::
-
-      **Header**
-      ^^^
-      A normal card.
-      +++
-      :bdg-link:`Footer <relatve/path>`
-
-   .. grid-item-card::
-      :link: https://www.example.com/
-      :class-card: bigdl-link-card
-
-      **Header**
-      ^^^
-      A link card.
-      +++
-      Footer
-```
-
-</td>
-<td>
-
-```eval_rst
-.. grid:: 1 2 2 2
-   :gutter: 2
-
-   .. grid-item-card::
-
-      **Header**
-      ^^^
-      A normal card.
-      +++
-      :bdg-link:`Footer <relatve/path>`
-
-   .. grid-item-card::
-      :link: https://www.example.com/
-      :class-card: bigdl-link-card
-
-      **Header**
-      ^^^
-      A link card.
-      +++
-      Footer
-```
-
-You could refer [here](https://sphinx-design.readthedocs.io/en/furo-theme/cards.html) for more information on the usage of cards, and [here](https://sphinx-design.readthedocs.io/en/furo-theme/grids.html#placing-a-card-in-a-grid) for cards in grids.
-
-Note that `1 2 2 2` defines the number of cards per row in different screen sizes (from extra-small to large).
-
-</td></tr>
-<tr><td>
-
-[Mermaid](https://mermaid-js.github.io/) digrams
-
-</td><td>
-
-```rst
-.. mermaid::
-   
-   flowchart LR
-      A(Node A)
-      B([Node B])
-
-      A -- points to --> B
-      A --> C{{Node C}}
-
-      classDef blue color:#0171c3;
-      class B,C blue;
-```
-
-</td>
-<td>
-
-```eval_rst
-.. mermaid::
-   
-   flowchart LR
-      A(Node A)
-      B([Node B])
-
-      A -- points to --> B
-      A --> C{{Node C}}
-
-      classDef blue color:#0171c3;
-      class B,C blue;
-```
-
-Mermaid is a charting tool for dynamically creating/modifying diagrams. Refer [here](https://mermaid-js.github.io/) for more Mermaid syntax.
-
-</td></tr>
-</table>
-
-### 3.1 Use reStructuredText in `.md` files
-You could embed reStructuredText into `.md` files through putting reStructuredText code into `eval_rst` code block. It is really useful when you want to use components such as sepcial boxes, tabs, cards, Mermaid diagrams, etc. in your `.md` file.
-~~~md
-```eval_rst
-any contents in reStructuredText syntax
-```
-
-```eval_rst
-.. note::
-   
-   This is a note box.
-
-.. mermaid::
-   
-   flowchart LR
-      A --> B
-```
-~~~
-
-```eval_rst
-.. important::
-
-   Any contents inside ``eval_rst`` code block should follow the reStructuredText syntax.
-```
-
-## 4. Common components in `.md` files
-<table class="table bigdl-documentation-guide-tables">
-<tr><td>Headers</td>
-<td>
-
-```md
-# Header Level 1
-
-## Header Level 2
-
-### Header Level 3
-
-#### Header Level 4
-```
-
-</td>
-<td>
-
-Note that **we do not expect maually-added styles to headers.**
-
-</td></tr>
-<tr><td>Lists</td>
-<td>
-
-```md
-- A unordered list
-- The second item of the unordered list
-  with two lines
-
-1. A numbered list
-   * A nested unordered list
-   * The second nested unordered list
-2. The second item of 
-   the numbered list
-```
-
-</td>
-<td>
-
-Note that the number of spaces indented depends on the markup. That is, if we use '- '/'1. '/'10. ' for the list, the following contents belong to the list or the nested lists after it should be indented by 2/3/4 spaces.
-
-</td></tr>
-<tr><td>Code blocks</td>
-<td>
-
-~~~md
-```[language]
-some code in this language
-```
-
-```python
-some python code
-```
-~~~
-
-</td>
-<td>
-
-All the supported language argument for syntax highlighting can be found [here](https://pygments.org/docs/lexers/).
-
-</td>
-</tr>
-</table>
-
-## 5. How to include Jupyter notebooks directly inside our documentation
-If you want to include a Jupyter notebook into our documentation as an example, a tutorial, a how-to guide, etc., you could just put it anywhere inside [`BigDL/docs/readthedocs/source`](https://github.com/intel-analytics/BigDL/tree/main/docs/readthedocs/source) dictionary, and link it into `_toc.yml` file.
-
-However, if you want to render a Jupyter notebook located out of `BigDL/docs/readthedocs/source` dictionary into our documentation, the case is a little bit complicated. To do this, you need to add a file with `.nblink` extension into `BigDL/docs/readthedocs/source` , and link the `.nblink` file into `_toc.yml`.
-
-The `.nblink` file should have the following structure:
-```json
-{
-    "path": "relative/path/to/the/notebook/you/want/to/include"
-}
-```
-
-```eval_rst
-.. seealso::
-
-   You could find `here <https://github.com/intel-analytics/BigDL/blob/main/docs/readthedocs/source/doc/Nano/Howto/Training/PyTorchLightning/accelerate_pytorch_lightning_training_ipex.nblink>`_ for an example of ``.nblink`` usage inside our documentation.
-```
-
-### 5.1 How to hide a cell from rendering
-
-If you want to hide a notebook markdown/code cell from rendering into our documentation, you could simply add `"nbsphinx": "hidden"` into the cell's `metadata`.
-
-Here shows an examlpe of a markdown cell hidden from rendering:
-
-```json
-{
-"cell_type": "markdown",
-"metadata": {
-   "nbsphinx": "hidden"
-},
-"source": [
-   ...
-]
-}
-```
-
-```eval_rst
-.. tip::
-
-   You could simply open the notebook through text editor to edit the ``metadata`` of each cell.
-
-.. note::
-
-   Currently we could not hide the output/input code cell individually from rendering as they have the same ``metadata``. 
-```
-
-### 5.2 Note/Warning/Related Readings boxes
-In convension, in the markdown cell of notebooks, we create note/warning/related reading boxes with the help of quote blocks and emoji:
-
-```md
-> 📝 **Note**
->
-> This is a note box in notebooks.
-
-> ⚠️ **Warning**
-> 
-> This is a warning box in notebooks.
-
-> 📚 **Related Readings**
-> 
-> This is a related readings box in notebooks.
-
-```
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/UserGuide/hadoop.md b/docs/readthedocs/source/doc/UserGuide/hadoop.md
deleted file mode 100644
index e415a22a..00000000
--- a/docs/readthedocs/source/doc/UserGuide/hadoop.md
+++ /dev/null
@@ -1,202 +0,0 @@
-# Hadoop/YARN User Guide
-
-Hadoop version: Apache Hadoop >= 2.7 (3.X included) or [CDH](https://www.cloudera.com/products/open-source/apache-hadoop/key-cdh-components.html) 5.X. CDH 6.X have not been tested and thus currently not supported.
-
----
-
-For _**Scala users**_, please see [Scala User Guide](./scala.md) for how to run BigDL on Hadoop/YARN clusters.  
-
-For _**Python users**_, you can run BigDL programs on standard Hadoop/YARN clusters without any changes to the cluster (i.e., no need to pre-install BigDL or other Python libraries on all nodes in the cluster).
-
-### 1. Prepare Python Environment
-
-- You need to first use [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the Python environment _**on the local machine**_ where you submit your application. Create a conda environment, install BigDL and all the needed Python libraries in the created conda environment:
-
-  ```bash
-  conda create -n bigdl python=3.7  # "bigdl" is conda environment name, you can use any name you like.
-  conda activate bigdl
-
-  pip install bigdl
-
-  # Use conda or pip to install all the needed Python dependencies in the created conda environment.
-  ```
-  View the [Python User Guide](./python.md) for more details for BigDL installation.
-
-- You need to download and install JDK in the environment, and properly set the environment variable `JAVA_HOME`, which is required by Spark. __JDK8__ is highly recommended.
-
-  You may take the following commands as a reference for installing [OpenJDK](https://openjdk.java.net/install/):
-
-  ```bash
-  # For Ubuntu
-  sudo apt-get install openjdk-8-jre
-  export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
-
-  # For CentOS
-  su -c "yum install java-1.8.0-openjdk"
-  export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.282.b08-1.el7_9.x86_64/jre
-
-  export PATH=$PATH:$JAVA_HOME/bin
-  java -version  # Verify the version of JDK.
-  ```
-
-- Check the Hadoop setup and configurations of your cluster. Make sure you properly set the environment variable `HADOOP_CONF_DIR`, which is needed to initialize Spark on YARN:
-
-  ```bash
-  export HADOOP_CONF_DIR=the directory of the hadoop and yarn configurations
-  ```
-
-- **For CDH users**
-
-  If your CDH cluster has already installed Spark, the CDH's Spark might be conflict with the pyspark installed by pip required by BigDL.
-
-  Thus before running BigDL applications, you should unset all the Spark related environment variables. You can use `env | grep SPARK` to find all the existing Spark environment variables.
-
-  Also, a CDH cluster's `HADOOP_CONF_DIR` should be `/etc/hadoop/conf` on CDH by default.
-
----
-### 2. Run on YARN with built-in function
-
-_**This is the easiest and most recommended way to run BigDL on YARN,**_ as you don't need to care about environment preparation and Spark related commands. In this way, you can easily switch your job between local (for test) and YARN (for production) by changing the "cluster_mode".
-
-- Call `init_orca_context` at the very beginning of your code to initiate and run BigDL on standard [Hadoop/YARN clusters](https://spark.apache.org/docs/latest/running-on-yarn.html#launching-spark-on-yarn):
-
-  ```python
-  from bigdl.orca import init_orca_context
-
-  sc = init_orca_context(cluster_mode="yarn-client", cores=4, memory="10g", num_nodes=2)
-  ```
-
-  `init_orca_context` would automatically prepare the runtime Python environment, detect the current Hadoop configurations from `HADOOP_CONF_DIR` and initiate the distributed execution engine on the underlying YARN cluster. View [Orca Context](../Orca/Overview/orca-context.md) for more details.    
-  
-  By specifying "cluster_mode" to be `yarn-client` or `yarn-cluster`, `init_orca_context` will submit the job to YARN with client and cluster mode respectively.  
-  
-  The difference between `yarn-client` and `yarn-cluster` is where you run your Spark driver. For `yarn-client`, the Spark driver will run on the node where you start Python, while for `yarn-cluster` the Spark driver will run on a random node in the YARN cluster. So if you are running with `yarn-cluster`, you should change the application's data loading from local file to a network file system (e.g. HDFS).  
-
-- You can then simply run your BigDL program in a Jupyter notebook. Note that _**jupyter cannot run on yarn-cluster**_, as the driver is not running on the local node.
-
-  ```bash
-  jupyter notebook --notebook-dir=./ --ip=* --no-browser
-  ```
-
-  Or you can run your BigDL program as a normal Python script (e.g. script.py) and in this case both `yarn-client` and `yarn-cluster` are supported.
-
-  ```bash
-  python script.py
-  ```
-
----
-### 3. Run on YARN with spark-submit
-
-Follow the steps below if you need to run BigDL with [spark-submit](https://spark.apache.org/docs/latest/running-on-yarn.html#launching-spark-on-yarn).  
-
-- Pack the current active conda environment to `environment.tar.gz` (you can use any name you like) in the current working directory:
-
-  ```bash
-  conda pack -o environment.tar.gz
-  ```
-
-- _**You need to write your BigDL program as a Python script.**_ In the script, you need to call `init_orca_context` at the very beginning of your code and specify "cluster_mode" to be `spark-submit`:
-
-  ```python
-  from bigdl.orca import init_orca_context
-
-  sc = init_orca_context(cluster_mode="spark-submit")
-  ```
-
-- Use `spark-submit` to submit your BigDL program (e.g. script.py). You can adjust the configurations according to your cluster settings. Note that if `environment.tar.gz` is not under the same directory with `script.py`, you may need to modify its path in `--archives` in the running command below.
-
-  Setup environment variables:
-  ```bash
-  export SPARK_HOME=/path/to/spark # the folder path where you extract the Spark package
-  export SPARK_VERSION="downloaded spark version"
-
-  export BIGDL_HOME=/path/to/unzipped_BigDL
-  export BIGDL_VERSION="downloaded BigDL version"
-  ```
-
-  For `yarn-cluster` mode:
-  ```bash
-  ${SPARK_HOME}/bin/spark-submit \
-      --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=environment/bin/python \
-      --conf spark.executorEnv.PYSPARK_PYTHON=environment/bin/python \
-      --jars ${BIGDL_HOME}/jars/bigdl-assembly-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
-      --master yarn \
-      --deploy-mode cluster \
-      --executor-memory 10g \
-      --driver-memory 10g \
-      --executor-cores 8 \
-      --num-executors 2 \
-      --archives environment.tar.gz#environment \
-      script.py
-  ```
-  Note: For `yarn-cluster`, the Spark driver is running in a YARN container as well and thus both the driver and executors will use the Python interpreter in `environment.tar.gz`. If you want to operate HDFS as some certain user, you can add `spark.yarn.appMasterEnv.HADOOP_USER_NAME=username` to SparkConf.
-
-
-  For `yarn-client` mode:
-  ```bash
-  ${SPARK_HOME}/bin/spark-submit \
-      --conf spark.pyspark.driver.python=/path/to/python \
-      --conf spark.pyspark.python=environment/bin/python \
-      --jars ${BIGDL_HOME}/jars/bigdl-assembly-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
-      --master yarn \
-      --deploy-mode client \
-      --executor-memory 10g \
-      --driver-memory 10g \
-      --executor-cores 8 \
-      --num-executors 2 \
-      --archives environment.tar.gz#environment \
-      script.py
-  ```
-  Note: For `yarn-client`, the Spark driver is running on local and it will use the Python interpreter in the current active conda environment while the executors will use the Python interpreter in `environment.tar.gz`.
-
----
-### 4. Run on YARN with bigdl-submit
-
-Follow the steps below if you need to run BigDL with bigdl-submit.
-
-- Pack the current active conda environment to `environment.tar.gz` (you can use any name you like) in the current working directory:
-
-  ```bash
-  conda pack -o environment.tar.gz
-  ```
-
-- _**You need to write your BigDL program as a Python script.**_ In the script, you need to call `init_orca_context` at the very beginning of your code and specify "cluster_mode" to be `bigdl-submit`:
-
-  ```python
-  from bigdl.orca import init_orca_context
-
-  sc = init_orca_context(cluster_mode="bigdl-submit")
-  ```
-
-- Use `bigdl-submit` to submit your BigDL program (e.g. script.py). You can adjust the configurations according to your cluster settings. Note that if `environment.tar.gz` is not under the same directory with `script.py`, you may need to modify its path in `--archives` in the running command below.
-
-  For `yarn-cluster` mode:
-  ```bash
-  bigdl-submit \
-      --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=environment/bin/python \
-      --conf spark.executorEnv.PYSPARK_PYTHON=environment/bin/python \
-      --master yarn \
-      --deploy-mode cluster \
-      --executor-memory 10g \
-      --driver-memory 10g \
-      --executor-cores 8 \
-      --num-executors 2 \
-      --archives environment.tar.gz#environment \
-      script.py
-  ```
-  Note: For `yarn-cluster`, the Spark driver is running in a YARN container as well and thus both the driver and executors will use the Python interpreter in `environment.tar.gz`. If you want to operate HDFS as some certain user, you can add `spark.yarn.appMasterEnv.HADOOP_USER_NAME=username` to SparkConf.
-
-
-  For `yarn-client` mode:
-  ```bash
-  PYSPARK_PYTHON=environment/bin/python bigdl-submit \
-      --master yarn \
-      --deploy-mode client \
-      --executor-memory 10g \
-      --driver-memory 10g \
-      --executor-cores 8 \
-      --num-executors 2 \
-      --archives environment.tar.gz#environment \
-      script.py
-  ```
-  Note: For `yarn-client`, the Spark driver is running on local and it will use the Python interpreter in the current active conda environment while the executors will use the Python interpreter in `environment.tar.gz`.
diff --git a/docs/readthedocs/source/doc/UserGuide/images/Databricks5.PNG b/docs/readthedocs/source/doc/UserGuide/images/Databricks5.PNG
deleted file mode 100644
index 62167b6d..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/Databricks5.PNG and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/apply-all.png b/docs/readthedocs/source/doc/UserGuide/images/apply-all.png
deleted file mode 100644
index b2fc0182..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/apply-all.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/cluster.png b/docs/readthedocs/source/doc/UserGuide/images/cluster.png
deleted file mode 100644
index 848b86e5..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/cluster.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/config-init-script.png b/docs/readthedocs/source/doc/UserGuide/images/config-init-script.png
deleted file mode 100644
index 9c221361..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/config-init-script.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/copy-script-path.png b/docs/readthedocs/source/doc/UserGuide/images/copy-script-path.png
deleted file mode 100644
index d2dcfe2b..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/copy-script-path.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/create-cluster.png b/docs/readthedocs/source/doc/UserGuide/images/create-cluster.png
deleted file mode 100644
index b80fcd05..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/create-cluster.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/db-gloo-socket.png b/docs/readthedocs/source/doc/UserGuide/images/db-gloo-socket.png
deleted file mode 100644
index 254d7cd1..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/db-gloo-socket.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/dbfs.png b/docs/readthedocs/source/doc/UserGuide/images/dbfs.png
deleted file mode 100644
index ebc86d9d..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/dbfs.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/dllib-jar.png b/docs/readthedocs/source/doc/UserGuide/images/dllib-jar.png
deleted file mode 100644
index 5e4f1718..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/dllib-jar.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/dllib-whl.png b/docs/readthedocs/source/doc/UserGuide/images/dllib-whl.png
deleted file mode 100644
index 18efdacc..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/dllib-whl.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/init-orca-context.png b/docs/readthedocs/source/doc/UserGuide/images/init-orca-context.png
deleted file mode 100644
index 82de380c..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/init-orca-context.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/install-zip.png b/docs/readthedocs/source/doc/UserGuide/images/install-zip.png
deleted file mode 100644
index 9777492c..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/install-zip.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/notebook1.jpg b/docs/readthedocs/source/doc/UserGuide/images/notebook1.jpg
deleted file mode 100644
index 578d025b..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/notebook1.jpg and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/notebook2.jpg b/docs/readthedocs/source/doc/UserGuide/images/notebook2.jpg
deleted file mode 100644
index bc274a9c..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/notebook2.jpg and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/notebook3.jpg b/docs/readthedocs/source/doc/UserGuide/images/notebook3.jpg
deleted file mode 100644
index 940b353d..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/notebook3.jpg and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/notebook4.jpg b/docs/readthedocs/source/doc/UserGuide/images/notebook4.jpg
deleted file mode 100644
index e83dc609..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/notebook4.jpg and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/notebook5.jpg b/docs/readthedocs/source/doc/UserGuide/images/notebook5.jpg
deleted file mode 100644
index d8297c3e..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/notebook5.jpg and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/orca-jar.png b/docs/readthedocs/source/doc/UserGuide/images/orca-jar.png
deleted file mode 100644
index 6aace893..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/orca-jar.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/orca-whl.png b/docs/readthedocs/source/doc/UserGuide/images/orca-whl.png
deleted file mode 100644
index c643643f..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/orca-whl.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/spark-config.png b/docs/readthedocs/source/doc/UserGuide/images/spark-config.png
deleted file mode 100644
index c4003b6f..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/spark-config.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/spark-context.png b/docs/readthedocs/source/doc/UserGuide/images/spark-context.png
deleted file mode 100644
index 89fedc90..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/spark-context.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/token.png b/docs/readthedocs/source/doc/UserGuide/images/token.png
deleted file mode 100644
index a035333c..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/token.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/upload-init-script.png b/docs/readthedocs/source/doc/UserGuide/images/upload-init-script.png
deleted file mode 100644
index 9c24eb44..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/upload-init-script.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/url.png b/docs/readthedocs/source/doc/UserGuide/images/url.png
deleted file mode 100644
index e483bfaa..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/url.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/images/verify-dbfs.png b/docs/readthedocs/source/doc/UserGuide/images/verify-dbfs.png
deleted file mode 100644
index 5ba9fc23..00000000
Binary files a/docs/readthedocs/source/doc/UserGuide/images/verify-dbfs.png and /dev/null differ
diff --git a/docs/readthedocs/source/doc/UserGuide/index.rst b/docs/readthedocs/source/doc/UserGuide/index.rst
deleted file mode 100644
index bb8eb2b3..00000000
--- a/docs/readthedocs/source/doc/UserGuide/index.rst
+++ /dev/null
@@ -1,60 +0,0 @@
-.. |scala-logo| image:: ../../../image/scala_logo.png
-  :height: 20
-  :alt: Scala Logo
-
-User Guide
-=========================
-
-.. grid:: 1 2 2 2
-    :gutter: 2
-
-    .. grid-item-card:: :fab:`python` Python User Guide
-        :link: python.html
-        :class-card: bigdl-link-card
-
-        Python Environment Setup Guide (Linux), applicable to Orca, Nano, DLlib, Chronos, Friesian.
-
-    .. grid-item-card:: |scala-logo| Scala User Guide
-        :link: scala.html
-        :class-card: bigdl-link-card
-
-        Scala Environment Setup Guide (Linux), applicable to DLLib.
-
-    .. grid-item-card:: :fab:`windows` Windows User Guide
-        :link: win.html
-        :class-card: bigdl-link-card
-
-        Use BigDL on Windows.
-
-    .. grid-item-card:: :fas:`desktop` Docker User Guide
-        :link: docker.html
-        :class-card: bigdl-link-card
-
-        Use BigDL in docker Environment.
-
-    .. grid-item-card:: :fas:`cloud-arrow-up` Colab User Guide
-        :link: colab.html
-        :class-card: bigdl-link-card
-
-        Use BigDL in Google Colab Environment.
-
-    .. grid-item-card:: :fas:`cloud` Databricks User Guide
-        :link: databricks.html
-        :class-card: bigdl-link-card
-
-        Use BigDL in Databricks Environment.
-
-    .. grid-item-card:: :fas:`cloud` Hadoop/YARN User Guide
-        :link: hadoop.html
-        :class-card: bigdl-link-card
-
-        Use BigDL in Hadoop/YARN Environment.
-
-    .. grid-item-card:: :fas:`cloud` k8s User Guide
-        :link: k8s.html
-        :class-card: bigdl-link-card
-
-        Use BigDL in K8s Environment.
-
-
-
diff --git a/docs/readthedocs/source/doc/UserGuide/k8s.md b/docs/readthedocs/source/doc/UserGuide/k8s.md
deleted file mode 100644
index 73815c76..00000000
--- a/docs/readthedocs/source/doc/UserGuide/k8s.md
+++ /dev/null
@@ -1,346 +0,0 @@
-# K8s User Guide
-
----
-
-### 1. Pull `bigdl-k8s` Docker Image
-
-You may pull the prebuilt  BigDL `bigdl-k8s` Image from [Docker Hub](https://hub.docker.com/r/intelanalytics/bigdl-k8s/tags) as follows:
-
-```bash
-sudo docker pull intelanalytics/bigdl-k8s:latest
-```
-
-**Speed up pulling image by adding mirrors**
-
-To speed up pulling the image from DockerHub, you may add the registry-mirrors key and value by editing `daemon.json` (located in `/etc/docker/` folder on Linux):
-```
-{
-  "registry-mirrors": ["https://<my-docker-mirror-host>"]
-}
-```
-For instance, users in China may add the USTC mirror as follows:
-```
-{
-  "registry-mirrors": ["https://docker.mirrors.ustc.edu.cn"]
-}
-```
-
-After that, flush changes and restart docker：
-
-```
-sudo systemctl daemon-reload
-sudo systemctl restart docker
-```
-
-### 2. Launch a Client Container
-
-You can submit BigDL application from a client container that provides the required environment.
-
-```bash
-sudo docker run -itd --net=host \
-    -v /etc/kubernetes:/etc/kubernetes \
-    -v /root/.kube:/root/.kube \
-    intelanalytics/bigdl-k8s:latest bash
-```
-
-**Note:** to create the client container, `-v /etc/kubernetes:/etc/kubernetes:` and `-v /root/.kube:/root/.kube` are required to specify the path of kube config and installation.
-
-You can specify more arguments:
-
-```bash
-sudo docker run -itd --net=host \
-    -v /etc/kubernetes:/etc/kubernetes \
-    -v /root/.kube:/root/.kube \
-    -e http_proxy=http://your-proxy-host:your-proxy-port \
-    -e https_proxy=https://your-proxy-host:your-proxy-port \
-    -e RUNTIME_SPARK_MASTER=k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
-    -e RUNTIME_K8S_SERVICE_ACCOUNT=account \
-    -e RUNTIME_K8S_SPARK_IMAGE=intelanalytics/bigdl-k8s:latest \
-    -e RUNTIME_PERSISTENT_VOLUME_CLAIM=myvolumeclaim \
-    -e RUNTIME_DRIVER_HOST=x.x.x.x \
-    -e RUNTIME_DRIVER_PORT=54321 \
-    -e RUNTIME_EXECUTOR_INSTANCES=1 \
-    -e RUNTIME_EXECUTOR_CORES=4 \
-    -e RUNTIME_EXECUTOR_MEMORY=20g \
-    -e RUNTIME_TOTAL_EXECUTOR_CORES=4 \
-    -e RUNTIME_DRIVER_CORES=4 \
-    -e RUNTIME_DRIVER_MEMORY=10g \
-    intelanalytics/bigdl-k8s:latest bash 
-```
-
-- http_proxy/https_proxy is to specify http proxy/https_proxy.
-- RUNTIME_SPARK_MASTER is to specify spark master, which should be `k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>` or `spark://<spark-master-host>:<spark-master-port>`. 
-- RUNTIME_K8S_SERVICE_ACCOUNT is service account for driver pod. Please refer to k8s [RBAC](https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac).
-- RUNTIME_K8S_SPARK_IMAGE is the k8s image.
-- RUNTIME_PERSISTENT_VOLUME_CLAIM is to specify [Kubernetes volume](https://spark.apache.org/docs/latest/running-on-kubernetes.html#volume-mounts) mount. We are supposed to use volume mount to store or receive data.
-- RUNTIME_DRIVER_HOST/RUNTIME_DRIVER_PORT is to specify driver localhost and port number (only required when submitting jobs via kubernetes client mode).
-- Other environment variables are for spark configuration setting. The default values in this image are listed above. Replace the values as you need.
-
-Once the container is created, execute the container:
-
-```bash
-sudo docker exec -it <containerID> bash
-```
-
-You will login into the container and see this as the output:
-
-```
-root@[hostname]:/opt/spark/work-dir# 
-```
-
-`/opt/spark/work-dir` is the spark work path. 
-
-The `/opt` directory contains:
-
-- download-bigdl.sh is used for downloading BigDL distributions.
-- start-notebook-spark.sh is used for starting the jupyter notebook on standard spark cluster. 
-- start-notebook-k8s.sh is used for starting the jupyter notebook on k8s cluster.
-- bigdl-x.x-SNAPSHOT is `BIGDL_HOME`, which is the home of BigDL distribution.
-- bigdl-examples directory contains downloaded python example code.
-- install-conda-env.sh is displayed that conda env and python dependencies are installed.
-- jdk is the jdk home.
-- spark is the spark home.
-- redis is the redis home.
-
-### 3. Submit to k8s from remote
-
-Instead of lanuching a client container, you can also submit BigDL application from a remote node with the following steps:
-
-1. Check the [prerequisites](https://spark.apache.org/docs/latest/running-on-kubernetes.html#prerequisites) of running Spark on Kubernetes.
-
-    - The remote node needs to properly setup the configurations and authentications of the k8s cluster (e.g. the `config` file under `~/.kube`, especially the server address in the `config`).
-
-    - Install `kubectl` on the remote node and run some sample commands for verification, for example `kubectl auth can-i <list|create|edit|delete> pods`. 
-    Note that the installation of `kubectl` is not a must for the remote node, but it is a useful tool to verify whether the remote node has access to the k8s cluster.
-
-    - The environment variables `http_proxy` and `https_proxy` may affect the connection using `kubectl`. You may check and unset these environment variables in case you get errors when executing the `kubectl` commands on the remote node.
-
-2. Follow the steps in the [Python User Guide](./python.html#install) to install BigDL in a conda environment.
-
-
-### 4. Run BigDL on k8s
-
-_**Note**: Please make sure `kubectl` has appropriate permission to create, list and delete pod._
-
-You may refer to [Section 5](#known-issues) for some known issues when running BigDL on k8s.
-
-#### 4.1 K8s client mode
-
-We recommend using `init_orca_context` at the very beginning of your code (e.g. in script.py) to initiate and run BigDL on standard K8s clusters in [client mode](http://spark.apache.org/docs/latest/running-on-kubernetes.html#client-mode).
-
-```python
-from bigdl.orca import init_orca_context
-
-init_orca_context(cluster_mode="k8s", master="k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>",
-                  container_image="intelanalytics/bigdl-k8s:latest",
-                  num_nodes=2, cores=2, memory="2g")
-```
-
-Remark: You may need to specify Spark driver host and port if necessary by adding the argument: `conf={"spark.driver.host": "x.x.x.x", "spark.driver.port": "x"}`.
-
-Execute `python script.py` to run your program on k8s cluster directly.
-
-#### 4.2 K8s cluster mode
-
-For k8s [cluster mode](https://spark.apache.org/docs/3.1.2/running-on-kubernetes.html#cluster-mode), you can call `init_orca_context` and specify cluster_mode to be "spark-submit" in your python script (e.g. in script.py):
-
-```python
-from bigdl.orca import init_orca_context
-
-init_orca_context(cluster_mode="spark-submit")
-```
-
-Use spark-submit to submit your BigDL program:
-
-```bash
-${SPARK_HOME}/bin/spark-submit \
-  --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
-  --deploy-mode cluster \
-  --conf spark.kubernetes.authenticate.driver.serviceAccountName=account \
-  --name bigdl \
-  --conf spark.kubernetes.container.image="intelanalytics/bigdl-k8s:latest" \
-  --conf spark.kubernetes.container.image.pullPolicy=Always \
-  --conf spark.pyspark.driver.python=./env/bin/python \
-  --conf spark.pyspark.python=./env/bin/python \
-  --archives path/to/environment.tar.gz#env \
-  --conf spark.executor.instances=1 \
-  --executor-memory 10g \
-  --driver-memory 10g \
-  --executor-cores 8 \
-  --num-executors 2 \
-  --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
-  --py-files local://${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///path/script.py
-  --conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/* \
-  --conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/* \
-  local:///path/script.py
-```
-
-#### 4.3 Run Jupyter Notebooks
-
-After a Docker container is launched and user login into the container, you can start the Jupyter Notebook service inside the container.
-
-In the `/opt` directory, run this command line to start the Jupyter Notebook service:
-```
-./start-notebook-k8s.sh
-```
-
-You will see the output message like below. This means the Jupyter Notebook service has started successfully within the container.
-```
-[I 23:51:08.456 NotebookApp] Serving notebooks from local directory: /opt/bigdl-2.1.0-SNAPSHOT/apps
-[I 23:51:08.456 NotebookApp] Jupyter Notebook 6.2.0 is running at:
-[I 23:51:08.456 NotebookApp] http://xxxx:12345/?token=...
-[I 23:51:08.457 NotebookApp]  or http://127.0.0.1:12345/?token=...
-[I 23:51:08.457 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
-```
-
-Then, refer [docker guide](./docker.md) to open Jupyter Notebook service from a browser and run notebook.
-
-#### 4.4 Run Scala programs
-
-Use spark-submit to submit your BigDL program.  e.g., run [nnframes imageInference](../../../../../../scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/example/nnframes/imageInference) example (running in either local mode or cluster mode) as follows:
-
-```bash
-${SPARK_HOME}/bin/spark-submit \
-  --master ${RUNTIME_SPARK_MASTER} \
-  --deploy-mode client \
-  --conf spark.driver.host=${RUNTIME_DRIVER_HOST} \
-  --conf spark.driver.port=${RUNTIME_DRIVER_PORT} \
-  --conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
-  --name bigdl \
-  --conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
-  --conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
-  --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
-  --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/path \
-  --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
-  --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/path \
-  --conf spark.kubernetes.driver.label.<your-label>=true \
-  --conf spark.kubernetes.executor.label.<your-label>=true \
-  --executor-cores ${RUNTIME_EXECUTOR_CORES} \
-  --executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
-  --total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
-  --driver-cores ${RUNTIME_DRIVER_CORES} \
-  --driver-memory ${RUNTIME_DRIVER_MEMORY} \
-  --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
-  --conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
-  --conf spark.sql.catalogImplementation='in-memory' \
-  --conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/*  \
-  --conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/*  \
-  --class com.intel.analytics.bigdl.dllib.examples.nnframes.imageInference.ImageTransferLearning \
-  ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip \
-  --inputDir /path
-```
-
-Options:
-
-- --master: the spark mater, must be a URL with the format `k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>`. 
-- --deploy-mode: submit application in client/cluster mode.
-- --name: the Spark application name.
-- --conf: to specify k8s service account, container image to use for the Spark application, driver volumes name and path, label of pods, spark driver and executor configuration, etc. You can refer to [spark configuration](https://spark.apache.org/docs/latest/configuration.html) and [spark on k8s configuration](https://spark.apache.org/docs/latest/running-on-kubernetes.html#configuration) for more details.
-- --properties-file: the customized conf properties.
-- --py-files: the extra python packages is needed.
-- --class: scala example class name.
-- --inputDir: input data path of the nnframe example. The data path is the mounted filesystem of the host. Refer to more details by [Kubernetes Volumes](https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-kubernetes-volumes).
-
-### 5 Known Issues
-
-This section shows some common topics for both client mode and cluster mode.
-
-#### 5.1 How to specify the Python environment?
-
-In client mode, follow [python user guide](./python.md) to install conda and BigDL and run application:
-```python
-python script.py
-```
-In cluster mode, install conda, pack environment and use on both the driver and executor.
-- Pack the current conda environment to `environment.tar.gz` (you can use any name you like):
-  ```bash
-  conda pack -o environment.tar.gz
-  ```
-- spark-submit with "--archives" and specify python stores for dirver and executor
-  ```bash
-  --conf spark.pyspark.driver.python=./env/bin/python \
-  --conf spark.pyspark.python=./env/bin/python \
-  --archives local:///bigdl2.0/data/environment.tar.gz#env \ # this path shoud be that k8s pod can access
-  ```
-
-#### 5.2 How to retain executor logs for debugging?
-
-The k8s would delete the pod once the executor failed in client mode and cluster mode.  If you want to get the content of executor log, you could set "temp-dir" to a mounted network file system (NFS) storage to change the log dir to replace the former one. In this case, you may meet `JSONDecodeError` because multiple executors would write logs to the same physical folder and cause conflicts. The solutions are in the next section.
-
-```python
-init_orca_context(..., extra_params = {"temp-dir": "/bigdl/"})
-```
-
-#### 5.3 How to deal with "JSONDecodeError"?
-
-If you set `temp-dir` to a mounted nfs storage and use multiple executors , you may meet `JSONDecodeError` since multiple executors would write to the same physical folder and cause conflicts. Do not mount `temp-dir` to shared storage is one option to avoid conflicts. But if you debug ray on k8s, you need to output logs to a shared storage. In this case, you could set num-nodes to 1. After testing, you can remove `temp-dir` setting and run multiple executors.
-
-#### 5.4 How to use NFS?
-
-If you want to save some files out of pod's lifecycle, such as logging callbacks or tensorboard callbacks, you need to set the output dir to a mounted persistent volume dir. Let NFS be a simple example.
-
-Use NFS in client mode:
-
-```python
-init_orca_context(cluster_mode="k8s", ...,
-                  conf={...,
-                  "spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName":"nfsvolumeclaim",
-                  "spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path": "/bigdl" 
-                  })
-```
-
-Use NFS in cluster mode:
-
-```bash
-${SPARK_HOME}/bin/spark-submit \
-  --... ...\
-  --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName="nfsvolumeclaim" \
-  --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path="/bigdl" \
-  --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName="nfsvolumeclaim" \
-  --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path="/bigdl" \
-  file:///path/script.py
-```
-
-#### 5.5 How to deal with "RayActorError"?
-
-"RayActorError" may caused by running out of the ray memory. If you meet this error, try to increase the memory for ray.
-
-```python
-init_orca_context(..., extra_executor_memory_for_ray="100g")
-```
-
-#### 5.6 How to set proper "steps_per_epoch" and "validation steps"?
-
-The `steps_per_epoch` and `validation_steps` should equal to numbers of dataset divided by batch size if you want to train all dataset. The `steps_per_epoch` and `validation_steps` do not relate to the `num_nodes` when total dataset and batch size are fixed. For example, you set `num_nodes` to 1, and set `steps_per_epoch` to 6. If you change the `num_nodes` to 3, the `steps_per_epoch` should still be 6.
-
-#### 5.7 Others
-
-`spark.kubernetes.container.image.pullPolicy` needs to be specified as `always` if you need to update your spark executor image for k8s.
-
-### 6. Access logs and clear pods
-
-When application is running, it’s possible to stream logs on the driver pod:
-
-```bash
-$ kubectl logs <spark-driver-pod>
-```
-
-To check pod status or to get some basic information around pod using:
-
-```bash
-$ kubectl describe pod <spark-driver-pod>
-```
-
-You can also check other pods using the similar way.
-
-After finishing running the application, deleting the driver pod:
-
-```bash
-$ kubectl delete <spark-driver-pod>
-```
-
-Or clean up the entire spark application by pod label:
-
-```bash
-$ kubectl delete pod -l <pod label>
-```
diff --git a/docs/readthedocs/source/doc/UserGuide/known_issues.md b/docs/readthedocs/source/doc/UserGuide/known_issues.md
deleted file mode 100644
index a1a77ca8..00000000
--- a/docs/readthedocs/source/doc/UserGuide/known_issues.md
+++ /dev/null
@@ -1,40 +0,0 @@
-# BigDL Known Issues
-
-## Spark Dynamic Allocation
-
-By design, BigDL does not support Spark Dynamic Allocation mode, and needs to allocate fixed resources for deep learning model training. Thus if your environment has already configured Spark Dynamic Allocation, or stipulated that Spark Dynamic Allocation must be used, you may encounter the following error:
-
-> **requirement failed: Engine.init: spark.dynamicAllocation.maxExecutors and spark.dynamicAllocation.minExecutors must be identical in dynamic allocation for BigDL**
-> 
-
-Here we provide a workaround for running BigDL under Spark Dynamic Allocation mode.
-
-For `spark-submit` cluster mode, the first solution is to disable the Spark Dynamic Allocation mode in `SparkConf` when you submit your application as follows:
-
-```bash
-spark-submit --conf spark.dynamicAllocation.enabled=false
-```
-
-Otherwise, if you can not set this configuration due to your cluster settings, you can set `spark.dynamicAllocation.minExecutors` to be equal to `spark.dynamicAllocation.maxExecutors` as follows: 
-
-```bash
-spark-submit --conf spark.dynamicAllocation.enabled=true \
-             --conf spark.dynamicAllocation.minExecutors 2 \
-             --conf spark.dynamicAllocation.maxExecutors 2
-```
-
-For other cluster modes, such as `yarn` and `k8s`, our program will initiate `SparkContext` for you, and the Spark Dynamic Allocation mode is disabled by default. Thus, generally you wouldn't encounter such problem.
-
-If you are using Spark Dynamic Allocation, you have to disable barrier execution mode at the very beginning of your application as follows:
-
-```python
-from bigdl.orca import OrcaContext
-
-OrcaContext.barrier_mode = False
-```
-
-For Spark Dynamic Allocation mode, you are also recommended to manually set `num_ray_nodes` and `ray_node_cpu_cores` equal to `spark.dynamicAllocation.minExecutors` and `spark.executor.cores` respectively. You can specify `num_ray_nodes` and `ray_node_cpu_cores` in `init_orca_context` as follows:
-
-```python
-init_orca_context(..., num_ray_nodes=2, ray_node_cpu_cores=4)
-```
diff --git a/docs/readthedocs/source/doc/UserGuide/notebooks.md b/docs/readthedocs/source/doc/UserGuide/notebooks.md
deleted file mode 100644
index eb24d3af..00000000
--- a/docs/readthedocs/source/doc/UserGuide/notebooks.md
+++ /dev/null
@@ -1,72 +0,0 @@
-# Colab notebooks
-
----
-
-## Quick Start
-
-- **TensorFlow 1.15 Quickstart**
-
-![](../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/tf_lenet_mnist.ipynb) &nbsp;![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/tf_lenet_mnist.ipynb)
-
-- **Keras 2.3 Quickstart**
-
-![](../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/keras_lenet_mnist.ipynb) &nbsp;![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/keras_lenet_mnist.ipynb)  
-   
-- **TensorFlow 2 Quickstart**
-
-![](../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/tf2_keras_lenet_mnist.ipynb) &nbsp;![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/tf2_keras_lenet_mnist.ipynb)
-
-- **PyTorch Quickstart**
-
-![](../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/pytorch_lenet_mnist.ipynb) &nbsp;![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/pytorch_lenet_mnist.ipynb)
-
-
-## Common Use Case
-
-- **Use `torch.distributed` in Orca**
-
-![](../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/pytorch_distributed_lenet_mnist.ipynb) &nbsp;![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/pytorch_distributed_lenet_mnist.ipynb)
-
-
-- **Use Spark Dataframe for Deep Learning**
-
-![](../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/ncf_dataframe.ipynb) &nbsp;![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/ncf_dataframe.ipynb)
-
-- **Use Distributed Pandas for Deep Learning**
-
-![](../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/ncf_xshards_pandas.ipynb) &nbsp;![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/ncf_xshards_pandas.ipynb)
-
-- **Use AutoML for Time-Series Forecasting**
-
-![](../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/chronos/colab-notebook/chronos_autots_nyc_taxi.ipynb) &nbsp;![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/chronos/colab-notebook/chronos_autots_nyc_taxi.ipynb)
-
-- **Use TSDataset and Forecaster for Time-Series Forecasting**
-
-![](../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/chronos/colab-notebook/chronos_nyc_taxi_tsdataset_forecaster.ipynb) &nbsp;![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/chronos/colab-notebook/chronos_nyc_taxi_tsdataset_forecaster.ipynb)
-
-- **Use Anomaly Detector for Unsupervised Anomaly Detection**
-
-![](../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/chronos/colab-notebook/chronos_minn_traffic_anomaly_detector.ipynb) &nbsp;![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/chronos/colab-notebook/chronos_minn_traffic_anomaly_detector.ipynb)
-
-- **Enable AutoML for PyTorch**
-
-![](../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/autoestimator_pytorch_lenet_mnist.ipynb) &nbsp;![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/autoestimator_pytorch_lenet_mnist.ipynb)
-
-- **Use AutoXGBoost to auto-tune XGBoost parameters**
-
-![](../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/autoxgboost_regressor_sklearn_boston.ipynb) &nbsp;![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/autoxgboost_regressor_sklearn_boston.ipynb)
-
-
-## AI Application Case
-
-- **Use Pytorch for Fashion MNIST Image Classification**
-
-![](../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/examples/fashion_mnist_bigdl.ipynb) &nbsp;![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/examples/fashion_mnist_bigdl.ipynb)
-
-- **Use Keras for Text Classification**
-
-![](../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/examples/basic_text_classification.ipynb) &nbsp;![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/examples/basic_text_classification.ipynb)
-
-- **Use Pytorch for Image Super Resolution**
-
-![](../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/examples/super_resolution.ipynb) &nbsp;![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/examples/super_resolution.ipynb)
diff --git a/docs/readthedocs/source/doc/UserGuide/python.md b/docs/readthedocs/source/doc/UserGuide/python.md
deleted file mode 100644
index 3848b5c0..00000000
--- a/docs/readthedocs/source/doc/UserGuide/python.md
+++ /dev/null
@@ -1,172 +0,0 @@
-# Python User Guide
-
----
-Supported Platforms: Linux and macOS. For Windows, Refer to [Windows User Guide](./win.md).
-
-### 1. Install
-- We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the Python environment as follows:
-
-  ```bash
-  conda create -n bigdl python=3.7  # "bigdl" is conda environment name, you can use any name you like.
-  conda activate bigdl
-  ```
-
-- You need to install JDK in the environment, and properly set the environment variable `JAVA_HOME`. __JDK8__ is highly recommended.
-
-  You may take the following commands as a reference for installing [OpenJDK](https://openjdk.java.net/install/):
-
-  ```bash
-  # For Ubuntu
-  sudo apt-get install openjdk-8-jre
-  export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
-
-  # For CentOS
-  su -c "yum install java-1.8.0-openjdk"
-  export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.282.b08-1.el7_9.x86_64/jre
-
-  export PATH=$PATH:$JAVA_HOME/bin
-  java -version  # Verify the version of JDK.
-  ```
-
-#### 1.1 Official Release
-
-You can install the latest release version of BigDL (built on top of Spark 2.4.6 by default) as follows:
-```bash
-pip install bigdl
-```
-_**Note:** Installing BigDL will automatically install all the BigDL packages including
-`bigdl-nano`, `bigdl-dllib`, `bigdl-orca`, `bigdl-chronos`, `bigdl-friesian`, `bigdl-serving` and their dependencies if they haven't been detected in your conda environment._
-
-#### 1.2 Nightly Build
-
-You can install the latest nightly build of BigDL as follows:
-
-```bash
-pip install --pre --upgrade bigdl
-```
-
-Alternatively, you can find the list of the nightly build versions [here](https://pypi.org/project/BigDL/#history), and install a specific version as follows:
-
-```bash
-pip install bigdl==version
-```
-
-_**Note:** If you are using a custom URL of Python Package Index, you may need to check whether the latest packages have been sync'ed with pypi.
-Or you can add the option `-i https://pypi.python.org/simple` when pip install to use pypi as the index-url._
-
-You could uninstall all the packages of BigDL as follows:
-
-```bash
-pip uninstall bigdl-dllib bigdl-core bigdl-tf bigdl-math bigdl-orca bigdl-chronos bigdl-friesian bigdl-nano bigdl-serving bigdl
-```
-
-#### 1.3 BigDL on Spark 3
-
-You can install BigDL built on top of Spark 3.1.3 as follows:
-```bash
-pip install bigdl-spark3  # Install the latest release version
-pip install --pre --upgrade bigdl-spark3  # Install the latest nightly build version
-```
-You can find the list of the nightly build versions built on top of Spark 3.1.3 [here](https://pypi.org/project/bigdl-spark3/#history).
-
-You could uninstall all the packages of BigDL on Spark3 as follows:
-
-```bash
-pip uninstall bigdl-dllib-spark3 bigdl-core bigdl-tf bigdl-math bigdl-orca-spark3 bigdl-chronos-spark3 bigdl-friesian-spark3 bigdl-nano bigdl-serving bigdl-spark3
-```
-
----
-### 2. Run
-
-_**Note:** Installing BigDL from pip will automatically install `pyspark`. To avoid possible conflicts, you are highly recommended to  **unset the environment variable `SPARK_HOME`**  if it exists in your environment._
-
-
-#### 2.1 Interactive Shell
-
-You may test if the installation is successful using the interactive Python shell as follows:
-
-* Type `python` in the command line to start a REPL.
-* Try to run the example code below to verify the installation:
-
-  ```python
-  from bigdl.orca import init_orca_context
-
-  sc = init_orca_context()  # Initiation of bigdl on the underlying cluster.
-  ```
-
-#### 2.2 Jupyter Notebook
-
-You can start the Jupyter notebook as you normally do using the following command and run BigDL programs directly in a Jupyter notebook:
-
-```bash
-jupyter notebook --notebook-dir=./ --ip=* --no-browser
-```
-
-#### 2.3 Python Script
-
-You can directly write BigDL programs in a Python file (e.g. script.py) and run in the command line as a normal Python program:
-
-```bash
-python script.py
-```
-
----
-### 3. Python Dependencies
-
-We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to manage your Python dependencies. Libraries installed in the current conda environment will be automatically distributed to the cluster when calling `init_orca_context`. You can also add extra dependencies as `.py`, `.zip` and `.egg` files by specifying `extra_python_lib` argument in `init_orca_context`.
-
-For more details, please refer to [Orca Context](../Orca/Overview/orca-context.md).
-
----
-### 4. Compatibility
-
-BigDL has been tested on __Python 3.6 and 3.7__ with the following library versions:
-
-```bash
-pyspark==2.4.6 or 3.1.3
-ray==1.9.2
-tensorflow==1.15.0 or >2.0
-pytorch>=1.5.0
-torchvision>=0.6.0
-horovod==0.19.2
-mxnet>=1.6.0
-bayesian-optimization==1.1.0
-dask==2.14.0
-h5py==2.10.0
-numpy==1.18.1
-opencv-python==4.2.0.34
-pandas==1.0.3
-Pillow==7.1.1
-protobuf==3.12.0
-psutil==5.7.0
-py4j==0.10.7
-redis==3.4.1
-scikit-learn==0.22.2.post1
-scipy==1.4.1
-tensorboard==1.15.0
-tensorboardX>=2.1
-tensorflow-datasets==3.2.0
-tensorflow-estimator==1.15.1
-tensorflow-gan==2.0.0
-tensorflow-hub==0.8.0
-tensorflow-metadata==0.21.1
-tensorflow-probability==0.7.0
-Theano==1.0.4
-```
-
----
-### 5. Known Issues
-
-- If you meet the following error when `pip install bigdl`:
-  ```
-  ERROR: Could not find a version that satisfies the requirement pypandoc (from versions: none)
-  ERROR: No matching distribution found for pypandoc
-  Could not import pypandoc - required to package PySpark
-  Traceback (most recent call last):
-    File "/root/anaconda3/lib/python3.8/site-packages/setuptools/installer.py", line 126, in fetch_build_egg
-      subprocess.check_call(cmd)
-    File "/root/anaconda3/lib/python3.8/subprocess.py", line 364, in check_call
-      raise CalledProcessError(retcode, cmd)
-  subprocess.CalledProcessError: Command '['/root/anaconda3/bin/python', '-m', 'pip', '--disable-pip-version-check', 'wheel', '--no-deps', '-w', '/tmp/tmprefr87ue', '--quiet', 'pypandoc']' returned non-zero exit status 1.
-  ```
-  This is actually caused by `pip install pyspark` in your Python environment. You can fix it by running `pip install pypandoc` first and then `pip install bigdl`.
diff --git a/docs/readthedocs/source/doc/UserGuide/scala.md b/docs/readthedocs/source/doc/UserGuide/scala.md
deleted file mode 100644
index f4c7446e..00000000
--- a/docs/readthedocs/source/doc/UserGuide/scala.md
+++ /dev/null
@@ -1,199 +0,0 @@
-# Scala User Guide
-
----
-Supported Platforms: Linux and macOS. _**Note:** Windows is currently not supported._
- 
-### 1. Try BigDL Examples
-This section will show you how to download BigDL prebuild packages and run the build-in examples.
-
-#### 1.1 Download and config
-You can download the BigDL official releases and nightly build from the [Release Page](../release.md). After extracting the prebuild package, you need to set environment variables **BIGDL_HOME** and **SPARK_HOME** as follows:
-
-```bash
-export SPARK_HOME=folder path where you extract the Spark package
-export BIGDL_HOME=folder path where you extract the BigDL package
-```
-
-#### 1.2 Use Spark interactive shell
-You can  try BigDL using the Spark interactive shell as follows:
-
-```bash
-${BIGDL_HOME}/bin/spark-shell-with-dllib.sh
-```
-
-You will then see a welcome message like below:
-
-```
-Welcome to
-      ____              __
-     / __/__  ___ _____/ /__
-    _\ \/ _ \/ _ `/ __/  '_/
-   /___/ .__/\_,_/_/ /_/\_\   version 2.4.6
-      /_/
-         
-Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
-Type in expressions to have them evaluated.
-Type :help for more information.
-```
-
-Before you try BigDL APIs, you should use `initNNcontext` to verify your environment:
-
-```scala
-scala> import com.intel.analytics.bigdl.dllib.NNContext
-import com.intel.analytics.bigdl.dllib.NNContext
-
-scala> val sc = NNContext.initNNContext("Run Example")
-2021-01-26 10:19:52 WARN  SparkContext:66 - Using an existing SparkContext; some configuration may not take effect.
-2021-01-26 10:19:53 WARN  SparkContext:66 - Using an existing SparkContext; some configuration may not take effect.
-sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@487f025
-```
-Once the environment is successfully initiated, you'll be able to play with dllib API's.
-For instance, to experiment with the ````dllib.keras```` APIs in dllib, you may try below code:
-```scala
-scala> import com.intel.analytics.bigdl.dllib.keras.layers._
-scala> import com.intel.analytics.bigdl.numeric.NumericFloat
-scala> import com.intel.analytics.bigdl.dllib.utils.Shape
-
-scala> val seq = Sequential()
-       val layer = ConvLSTM2D(32, 4, returnSequences = true, borderMode = "same",
-            inputShape = Shape(8, 40, 40, 32))
-       seq.add(layer)
-```
-
-#### 1.3 Run BigDL examples
-
-You can run a bigdl-dllib program, e.g., the [Language Model](https://github.com/intel-analytics/BigDL/tree/branch-2.0/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/example/languagemodel), as a standard Spark program (running on either a local machine or a distributed cluster) as follows:
-
-1. Prepare the dataset, please refer [Prepare PTB Data](https://github.com/intel-analytics/BigDL/tree/branch-2.0/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/example/languagemodel) for details
-
-2. Run the following command:
-```bash
-# Spark local mode
-${BIGDL_HOME}/bin/spark-submit-with-dllib.sh \
-  --master local[2] \
-  --class com.intel.analytics.bigdl.dllib.example.languagemodel.PTBWordLM \
-  ${BIGDL_HOME}/jars/bigdl-dllib-2.1.0-SNAPSHOT-jar-with-dependencies.jar \   #change to your jar file if your download is not spark_2.4.3-2.0.0
-  -f DATA_PATH \
-  -b 4 \
-  --numLayers 2 --vocab 100 --hidden 6 \
-  --numSteps 3 --learningRate 0.005 -e 1 \
-  --learningRateDecay 0.001 --keepProb 0.5
-
-# Spark standalone mode
-## ${SPARK_HOME}/sbin/start-master.sh
-## check master URL from http://localhost:8080
-${BIGDL_HOME}/bin/spark-submit-with-dllib.sh \
-  --master spark://... \
-  --executor-cores cores_per_executor \
-  --total-executor-cores total_cores_for_the_job \
-  --class com.intel.analytics.bigdl.dllib.example.languagemodel.PTBWordLM \
-  ${BIGDL_HOME}/jars/bigdl-dllib-2.1.0-SNAPSHOT-jar-with-dependencies.jar \   #change to your jar file if your download is not spark_2.4.3-2.0.0
-  -f DATA_PATH \
-  -b 4 \
-  --numLayers 2 --vocab 100 --hidden 6 \
-  --numSteps 3 --learningRate 0.005 -e 1 \
-  --learningRateDecay 0.001 --keepProb 0.5
-
-# Spark yarn client mode
-${BIGDL_HOME}/bin/spark-submit-with-dllib.sh \
- --master yarn \
- --deploy-mode client \
- --executor-cores cores_per_executor \
- --num-executors executors_number \
- --class com.intel.analytics.bigdl.dllib.example.languagemodel.PTBWordLM \
- ${BIGDL_HOME}/jars/bigdl-dllib-2.1.0-SNAPSHOT-jar-with-dependencies.jar \   #change to your jar file if your download is not spark_2.4.3-2.0.0
- -f DATA_PATH \
- -b 4 \
- --numLayers 2 --vocab 100 --hidden 6 \
- --numSteps 3 --learningRate 0.005 -e 1 \
- --learningRateDecay 0.001 --keepProb 0.5
-
-# Spark yarn cluster mode
-${BIGDL_HOME}/bin/spark-submit-with-dllib.sh \
- --master yarn \
- --deploy-mode cluster \
- --executor-cores cores_per_executor \
- --num-executors executors_number \
- --class com.intel.analytics.bigdl.dllib.example.languagemodel.PTBWordLM \
- ${BIGDL_HOME}/jars/bigdl-dllib-2.1.0-SNAPSHOT-jar-with-dependencies.jar \   #change to your jar file if your download is not spark_2.4.3-2.0.0
- -f DATA_PATH \
- -b 4 \
- --numLayers 2 --vocab 100 --hidden 6 \
- --numSteps 3 --learningRate 0.005 -e 1 \
- --learningRateDecay 0.001 --keepProb 0.5
-```
-
-  The parameters used in the above command are:
-
-  * -f: The path where you put your PTB data.
-  * -b: The mini-batch size. The mini-batch size is expected to be a multiple of *total cores* used in the job. In this example, the mini-batch size is suggested to be set to *total cores * 4*
-  * --learningRate: learning rate for adagrad
-  * --learningRateDecay: learning rate decay for adagrad
-  * --hidden: hiddensize for lstm
-  * --vocabSize: vocabulary size, default 10000
-  * --numLayers: numbers of lstm cell, default 2 lstm cells
-  * --numSteps: number of words per record in LM
-  * --keepProb: the probability to do dropout
-
-If you are to run your own program, do remember to do the initialize before call other bigdl-dllib API's, as shown below.
-```scala
- // Scala code example
- import com.intel.analytics.bigdl.dllib.NNContext
- NNContext.initNNContext()
-```
---- 
-
-### 2. Build BigDL Applications
-
-This section will show you how to build your own deep learning project with BigDL. 
-
-#### 2.1 Add BigDL dependency
-##### 2.1.1 official Release
-Currently, BigDL releases are hosted on maven central; below is an example to add the BigDL dllib dependency to your own project:
-
-```xml
-<dependency>
-    <groupId>com.intel.analytics.bigdl</groupId>
-    <artifactId>bigdl-dllib-spark_2.4.6</artifactId>
-    <version>0.14.0</version>
-</dependency>
-```
-
-You can find the other SPARK version [here](https://search.maven.org/search?q=bigdl-dllib), such as `spark_3.1.2`.   
-
-
-SBT developers can use
-```sbt
-libraryDependencies += "com.intel.analytics.bigdl" % "bigdl-dllib-spark_2.4.6" % "0.14.0"
-```
-
-##### 2.1.2 Nightly Build
-
-Currently, BigDL nightly build is hosted on [SonaType](https://oss.sonatype.org/content/groups/public/com/intel/analytics/bigdl/).
-
-To link your application with the latest BigDL nightly build, you should add some dependencies like [official releases](#11-official-release), but change `2.0.0` to the snapshot version (such as 0.14.0-snapshot), and add below repository to your pom.xml.
-
-
-```xml
-<repository>
-    <id>sonatype</id>
-    <name>sonatype repository</name>
-    <url>https://oss.sonatype.org/content/groups/public/</url>
-    <releases>
-        <enabled>true</enabled>
-    </releases>
-    <snapshots>
-        <enabled>true</enabled>
-    </snapshots>
-</repository>
-```
-
-SBT developers can use
-```sbt
-resolvers += "ossrh repository" at "https://oss.sonatype.org/content/repositories/snapshots/"
-```
-
-
-#### 2.2 Build a Scala project
-To enable BigDL in project, you should add BigDL to your project's dependencies using maven or sbt. Here is a [simple MLP example](https://github.com/intel-analytics/BigDL/tree/branch-2.0/apps/SimpleMlp) to show you how to use BigDL to build your own deep learning project using maven or sbt, and how to run the simple example in IDEA and spark-submit.
-
diff --git a/docs/readthedocs/source/doc/UserGuide/win.md b/docs/readthedocs/source/doc/UserGuide/win.md
deleted file mode 100644
index 2ddd1030..00000000
--- a/docs/readthedocs/source/doc/UserGuide/win.md
+++ /dev/null
@@ -1,111 +0,0 @@
-# Windows User Guide
-## Prerequisite
-
-
-### Confirm your windows version
-
-To use BigDL on Windows, we recommend using [Windows Subsystem for Linux 2 (WSL2)](https://learn.microsoft.com/en-us/windows/wsl/about#what-is-wsl-2). The recommended Windows versions are Windows 10 version 2004 or higher (Build 19041 and higher), or Windows 11.
-
-
-### Install WSL2
-
-To install WSL2, simply open a PowerShell or Windows Command Prompt as **administrator** and enter the below command. Restart your machine and wait until WSL2 is successfully installed.
-
-```powershell
-wsl --install
-```
-
-```eval_rst
-.. note::
-    By default, the above command installs the latest required components for WSL2 and **Ubuntu** as default Linux distribution, and it requires Windows 10 version 2004 or higher. If you're using older versions of Windows or need customization, please refer to `WSL installation guide <https://learn.microsoft.com/en-us/windows/wsl/install>`_.
-```
-
-## Installation Guide
-
-You can treat WSL2 shell as a normal Linux shell and run normal bash commands in it. If you're using WSL2 shell for the first time, it may require you to set up some user information. Using WSL2, you can install BigDL the same way as you do on a Linux system.
-
-
-### Install Conda
-
-Conda is the recommend way to manage the BigDL environment. Download and install conda using below commands.
-
-```bash
-wget https://repo.continuum.io/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh
-chmod +x Miniconda3-4.5.4-Linux-x86_64.sh
-./Miniconda3-4.5.4-Linux-x86_64.sh
-```
-
-```eval_rst
-.. note::
-    On WSL2, you need to use a Linux version of Conda intead of a Windows version. For other available conda versions, refer to `conda install <https://conda.io/projects/conda/en/latest/user-guide/install/index.html>`_, or `miniconda install <https://docs.conda.io/en/main/miniconda.html>`_.
-```
-
-### Install BigDL
-
-After installing conda, use conda to create and activate an environment for bigdl.
-
-```bash
-conda create -n bigdl-env
-conda activate bigdl-env
-```
-
-Then install BigDL as a whole, or specific bigdl library the same way as you do on a Linux system. For example,
-
-```bash
-pip install bigdl
-```
-
-```eval_rst
-.. card::
-
-    **Related Readings**
-    ^^^
-    * `BigDL Installation Guide <./python.html>`_
-    * `Nano Installation Guide <../Nano/Overview/install.html>`_
-    * `Chronos Installation Guide <../Chronos/Overview/install.html>`_
-```
-
-### Setup Jupyter Notebook Environment
-
-Fist, install JupyterLab using pip:
-
-```bash
-pip install jupyterlab
-```
-
-Then start JupyterLab using:
-
-```bash
-jupyter lab
-```
-
-```eval_rst
-.. note::
-    Once you started Juypterlab, it will open automatically in your browser. If it does not open automatically, you can manually enter the notebook server’s URL into the browser (The URL is shown on the terminal where you run the command). The default workspace of jupyter is located at the directory where you start the jupyterlab. For more information about JupyterLab installation and usage, refer to `JupyterLab User Guide <https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html#>`_.
-```
-
-## Tips and Known Issues
-
-### 1. ImportError: libgomp.so.1: cannot open shared object file: No such file or directory
-
-This error may appear when you try to import torch. This is caused by Ubuntu 14.04 or later not installing libgomp1 by default. Just install libgomp1 to resolve it:
-
-```bash
-sudo apt-get install libgomp1
-```
-
-### 2. ERROR: Could not build wheels for pycocotools, which is required to install pyproject.toml-based projects
-
-pycocotools is a dependency of Intel neural-compressor which is used for inference quantization in BigDL-Nano. This error is usually caused by GCC library not installed in system.  Just install gcc to resolve it:
-
-```bash
-sudo apt-get install gcc
-```
-
-### 3. ValueError: After taking into account object store and redis memory usage, the amount of memory on this node available for tasks and actors is less than -75% of total.
-
-When running ray applications, you need to set the `memory` and `object_store_memory` properly according to your system memory capacity. This error indicates you have used too large memory configurations and you need to decrease them. For example on a laptop with 8G memory, you may set the memory configurations as below:
-
-```bash
-python yoloV3.py --memory 2g --object_store_memory 1g
-```
diff --git a/docs/readthedocs/source/index.rst b/docs/readthedocs/source/index.rst
index 958cd6d9..90d9292a 100644
--- a/docs/readthedocs/source/index.rst
+++ b/docs/readthedocs/source/index.rst
@@ -2,19 +2,19 @@
    :google-site-verification: S66K6GAclKw1RroxU0Rka_2d1LZFVe27M0gRneEsIVI
 
 ################################################
-The BigDL Project
+The IPEX Project
 ################################################
 
 ------
 
 ************************************************
-BigDL-LLM
+IPEX-LLM
 ************************************************
 
 .. raw:: html
 
    <p>
-      <a href="https://github.com/intel-analytics/BigDL/"><code><span>bigdl-llm</span></code></a> is a library for running <strong>LLM</strong> (large language model) on Intel <strong>XPU</strong> (from <em>Laptop</em> to <em>GPU</em> to <em>Cloud</em>) using <strong>INT4/FP4/INT8/FP8</strong> with very low latency <sup><a href="#footnote-perf" id="ref-perf">[1]</a></sup> (for any <strong>PyTorch</strong> model).
+      <a href="https://github.com/intel-analytics/ipex-llm/"><code><span>ipex-llm</span></code></a> is a library for running <strong>LLM</strong> (large language model) on Intel <strong>XPU</strong> (from <em>Laptop</em> to <em>GPU</em> to <em>Cloud</em>) using <strong>INT4/FP4/INT8/FP8</strong> with very low latency <sup><a href="#footnote-perf" id="ref-perf">[1]</a></sup> (for any <strong>PyTorch</strong> model).
    </p>
 
 .. note::
@@ -24,28 +24,28 @@ BigDL-LLM
 ============================================
 Latest update 🔥
 ============================================
-- [2024/03] **LangChain** added support for ``bigdl-llm``; see the details `here <https://python.langchain.com/docs/integrations/llms/bigdl>`_.
-- [2024/02] ``bigdl-llm`` now supports directly loading model from `ModelScope <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/ModelScope-Models>`_ (`魔搭 <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/ModelScope-Models>`_).
-- [2024/02] ``bigdl-llm`` added inital **INT2** support (based on llama.cpp `IQ2 <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF-IQ2>`_ mechanism), which makes it possible to run large-size LLM (e.g., Mixtral-8x7B) on Intel GPU with 16GB VRAM.
-- [2024/02] Users can now use ``bigdl-llm`` through `Text-Generation-WebUI <https://github.com/intel-analytics/text-generation-webui>`_ GUI.
-- [2024/02] ``bigdl-llm`` now supports `Self-Speculative Decoding <doc/LLM/Inference/Self_Speculative_Decoding.html>`_, which in practice brings **~30% speedup** for FP16 and BF16 inference latency on Intel `GPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/Speculative-Decoding>`_ and `CPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/Speculative-Decoding>`_ respectively.
-- [2024/02] ``bigdl-llm`` now supports a comprehensive list of LLM finetuning on Intel GPU (including `LoRA <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/LLM-Finetuning/LoRA>`_, `QLoRA <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/LLM-Finetuning/QLoRA>`_, `DPO <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/LLM-Finetuning/DPO>`_, `QA-LoRA <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/LLM-Finetuning/QA-LoRA>`_ and `ReLoRA <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/LLM-Finetuning/ReLora>`_).
-- [2024/01] Using ``bigdl-llm`` `QLoRA <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/LLM-Finetuning/QLoRA>`_, we managed to finetune LLaMA2-7B in **21 minutes** and LLaMA2-70B in **3.14 hours** on 8 Intel Max 1550 GPU for `Standford-Alpaca <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora>`_ (see the blog `here <https://www.intel.com/content/www/us/en/developer/articles/technical/finetuning-llms-on-intel-gpus-using-bigdl-llm.html>`_).
-- [2024/01] 🔔🔔🔔 **The default** ``bigdl-llm`` **GPU Linux installation has switched from PyTorch 2.0 to PyTorch 2.1, which requires new oneAPI and GPU driver versions. (See the** `GPU installation guide <https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html>`_ **for more details.)**
-- [2023/12] ``bigdl-llm`` now supports `ReLoRA <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/LLM-Finetuning/ReLora>`_ (see `"ReLoRA: High-Rank Training Through Low-Rank Updates" <https://arxiv.org/abs/2307.05695>`_).
-- [2023/12] ``bigdl-llm`` now supports `Mixtral-8x7B <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral>`_ on both Intel `GPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral>`_ and `CPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mixtral>`_.
-- [2023/12] ``bigdl-llm`` now supports `QA-LoRA <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/LLM-Finetuning/QA-LoRA>`_ (see `"QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models" <https://arxiv.org/abs/2309.14717>`_).
-- [2023/12] ``bigdl-llm`` now supports `FP8 and FP4 inference <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types>`_ on Intel **GPU**.
-- [2023/11] Initial support for directly loading `GGUF <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF>`_, `AWQ <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ>`_ and `GPTQ <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GPTQ>`_ models in to ``bigdl-llm`` is available.
-- [2023/11] ``bigdl-llm`` now supports `vLLM continuous batching <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/vLLM-Serving>`_ on both Intel `GPU  <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/vLLM-Serving>`_ and `CPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/vLLM-Serving>`_.
-- [2023/10] ``bigdl-llm`` now supports `QLoRA finetuning <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/LLM-Finetuning/QLoRA>`_ on both Intel `GPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/LLM-Finetuning/QLoRA>`_ and `CPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/QLoRA-FineTuning>`_.
-- [2023/10] ``bigdl-llm`` now supports `FastChat serving <https://github.com/intel-analytics/BigDL/tree/main/python/llm/src/bigdl/llm/serving>`_ on on both Intel CPU and GPU.
-- [2023/09] ``bigdl-llm`` now supports `Intel GPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU>`_ (including Arc, Flex and MAX)
-- [2023/09] ``bigdl-llm`` `tutorial <https://github.com/intel-analytics/bigdl-llm-tutorial>`_ is released.
-- Over 30 models have been verified on ``bigdl-llm``, including *LLaMA/LLaMA2, ChatGLM2/ChatGLM3, Mistral, Falcon, MPT, LLaVA, WizardCoder, Dolly, Whisper, Baichuan/Baichuan2, InternLM, Skywork, QWen/Qwen-VL, Aquila, MOSS* and more; see the complete list `here <https://github.com/intel-analytics/bigdl#verified-models>`_.
+- [2024/03] **LangChain** added support for ``ipex-llm``; see the details `here <https://python.langchain.com/docs/integrations/llms/ipex>`_.
+- [2024/02] ``ipex-llm`` now supports directly loading model from `ModelScope <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/ModelScope-Models>`_ (`魔搭 <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/ModelScope-Models>`_).
+- [2024/02] ``ipex-llm`` added inital **INT2** support (based on llama.cpp `IQ2 <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF-IQ2>`_ mechanism), which makes it possible to run large-size LLM (e.g., Mixtral-8x7B) on Intel GPU with 16GB VRAM.
+- [2024/02] Users can now use ``ipex-llm`` through `Text-Generation-WebUI <https://github.com/intel-analytics/text-generation-webui>`_ GUI.
+- [2024/02] ``ipex-llm`` now supports `Self-Speculative Decoding <doc/LLM/Inference/Self_Speculative_Decoding.html>`_, which in practice brings **~30% speedup** for FP16 and BF16 inference latency on Intel `GPU <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/Speculative-Decoding>`_ and `CPU <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/Speculative-Decoding>`_ respectively.
+- [2024/02] ``ipex-llm`` now supports a comprehensive list of LLM finetuning on Intel GPU (including `LoRA <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/LLM-Finetuning/LoRA>`_, `QLoRA <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/LLM-Finetuning/QLoRA>`_, `DPO <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/LLM-Finetuning/DPO>`_, `QA-LoRA <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/LLM-Finetuning/QA-LoRA>`_ and `ReLoRA <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/LLM-Finetuning/ReLora>`_).
+- [2024/01] Using ``ipex-llm`` `QLoRA <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/LLM-Finetuning/QLoRA>`_, we managed to finetune LLaMA2-7B in **21 minutes** and LLaMA2-70B in **3.14 hours** on 8 Intel Max 1550 GPU for `Standford-Alpaca <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora>`_ (see the blog `here <https://www.intel.com/content/www/us/en/developer/articles/technical/finetuning-llms-on-intel-gpus-using-ipex-llm.html>`_).
+- [2024/01] 🔔🔔🔔 **The default** ``ipex-llm`` **GPU Linux installation has switched from PyTorch 2.0 to PyTorch 2.1, which requires new oneAPI and GPU driver versions. (See the** `GPU installation guide <https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html>`_ **for more details.)**
+- [2023/12] ``ipex-llm`` now supports `ReLoRA <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/LLM-Finetuning/ReLora>`_ (see `"ReLoRA: High-Rank Training Through Low-Rank Updates" <https://arxiv.org/abs/2307.05695>`_).
+- [2023/12] ``ipex-llm`` now supports `Mixtral-8x7B <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral>`_ on both Intel `GPU <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral>`_ and `CPU <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mixtral>`_.
+- [2023/12] ``ipex-llm`` now supports `QA-LoRA <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/LLM-Finetuning/QA-LoRA>`_ (see `"QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models" <https://arxiv.org/abs/2309.14717>`_).
+- [2023/12] ``ipex-llm`` now supports `FP8 and FP4 inference <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types>`_ on Intel **GPU**.
+- [2023/11] Initial support for directly loading `GGUF <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF>`_, `AWQ <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ>`_ and `GPTQ <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GPTQ>`_ models in to ``ipex-llm`` is available.
+- [2023/11] ``ipex-llm`` now supports `vLLM continuous batching <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/vLLM-Serving>`_ on both Intel `GPU  <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/vLLM-Serving>`_ and `CPU <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/vLLM-Serving>`_.
+- [2023/10] ``ipex-llm`` now supports `QLoRA finetuning <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/LLM-Finetuning/QLoRA>`_ on both Intel `GPU <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/LLM-Finetuning/QLoRA>`_ and `CPU <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/QLoRA-FineTuning>`_.
+- [2023/10] ``ipex-llm`` now supports `FastChat serving <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/src/ipex-llm/llm/serving>`_ on on both Intel CPU and GPU.
+- [2023/09] ``ipex-llm`` now supports `Intel GPU <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU>`_ (including Arc, Flex and MAX)
+- [2023/09] ``ipex-llm`` `tutorial <https://github.com/intel-analytics/ipex-llm-tutorial>`_ is released.
+- Over 30 models have been verified on ``ipex-llm``, including *LLaMA/LLaMA2, ChatGLM2/ChatGLM3, Mistral, Falcon, MPT, LLaVA, WizardCoder, Dolly, Whisper, Baichuan/Baichuan2, InternLM, Skywork, QWen/Qwen-VL, Aquila, MOSS* and more; see the complete list `here <https://github.com/intel-analytics/ipex#verified-models>`_.
 
 ============================================
-``bigdl-llm`` demos
+``ipex-llm`` demos
 ============================================
 
 See the **optimized performance** of ``chatglm2-6b`` and ``llama-2-13b-chat`` models on 12th Gen Intel Core CPU and Intel Arc GPU below.
@@ -80,12 +80,12 @@ See the **optimized performance** of ``chatglm2-6b`` and ``llama-2-13b-chat`` mo
    </table>
 
 ============================================
-``bigdl-llm`` quickstart
+``ipex-llm`` quickstart
 ============================================
 
 - `Windows GPU installation <doc/LLM/Quickstart/install_windows_gpu.html>`_
-- `Run BigDL-LLM in Text-Generation-WebUI <doc/LLM/Quickstart/webui_quickstart.html>`_
-- `Run BigDL-LLM using Docker <https://github.com/intel-analytics/BigDL/tree/main/docker/llm>`_
+- `Run IPEX-LLM in Text-Generation-WebUI <doc/LLM/Quickstart/webui_quickstart.html>`_
+- `Run IPEX-LLM using Docker <https://github.com/intel-analytics/ipex-llm/tree/main/docker/llm>`_
 - `CPU quickstart <#cpu-quickstart>`_
 - `GPU quickstart <#gpu-quickstart>`_
 
@@ -93,7 +93,7 @@ See the **optimized performance** of ``chatglm2-6b`` and ``llama-2-13b-chat`` mo
 CPU Quickstart
 --------------------------------------------
 
-You may install ``bigdl-llm`` on Intel CPU as follows as follows:
+You may install ``ipex-llm`` on Intel CPU as follows as follows:
 
 .. note::
 
@@ -101,11 +101,11 @@ You may install ``bigdl-llm`` on Intel CPU as follows as follows:
 
 .. code-block:: console
 
-   pip install --pre --upgrade bigdl-llm[all]
+   pip install --pre --upgrade ipex-llm[all]
 
 .. note::
 
-   ``bigdl-llm`` has been tested on Python 3.9, 3.10 and 3.11
+   ``ipex-llm`` has been tested on Python 3.9, 3.10 and 3.11
 
 You can then apply INT4 optimizations to any Hugging Face *Transformers* models as follows.
 
@@ -126,7 +126,7 @@ You can then apply INT4 optimizations to any Hugging Face *Transformers* models
 GPU Quickstart
 --------------------------------------------
 
-You may install ``bigdl-llm`` on Intel GPU as follows as follows:
+You may install ``ipex-llm`` on Intel GPU as follows as follows:
 
 .. note::
 
@@ -135,11 +135,11 @@ You may install ``bigdl-llm`` on Intel GPU as follows as follows:
 .. code-block:: console
 
    # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-   pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+   pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 
 .. note::
 
-   ``bigdl-llm`` has been tested on Python 3.9, 3.10 and 3.11
+   ``ipex-llm`` has been tested on Python 3.9, 3.10 and 3.11
 
 You can then apply INT4 optimizations to any Hugging Face *Transformers* models on Intel GPU as follows.
 
@@ -158,97 +158,4 @@ You can then apply INT4 optimizations to any Hugging Face *Transformers* models
    output_ids = model.generate(input_ids, ...)
    output = tokenizer.batch_decode(output_ids.cpu())
 
-**For more details, please refer to the bigdl-llm** `Document <doc/LLM/index.html>`_, `Readme <https://github.com/intel-analytics/BigDL/tree/main/python/llm>`_, `Tutorial <https://github.com/intel-analytics/bigdl-llm-tutorial>`_ and `API Doc <doc/PythonAPI/LLM/index.html>`_.
-
-------
-
-************************************************
-Overview of the complete BigDL project
-************************************************
-`BigDL <https://github.com/intel-analytics/bigdl>`_ seamlessly scales your data analytics & AI applications from laptop to cloud, with the following libraries:
-
-- `LLM <https://github.com/intel-analytics/BigDL/tree/main/python/llm>`_: Low-bit (INT3/INT4/INT5/INT8) large language model library for Intel CPU/GPU
-- `Orca <doc/Orca/index.html>`_: Distributed Big Data & AI (TF & PyTorch) Pipeline on Spark and Ray
-- `Nano <doc/Nano/index.html>`_: Transparent Acceleration of Tensorflow & PyTorch Programs on Intel CPU/GPU
-- `DLlib <doc/DLlib/index.html>`_: "Equivalent of Spark MLlib" for Deep Learning
-- `Chronos <doc/Chronos/index.html>`_: Scalable Time Series Analysis using AutoML
-- `Friesian <doc/Friesian/index.html>`_: End-to-End Recommendation Systems
-- `PPML <doc/PPML/index.html>`_: Secure Big Data and AI (with SGX Hardware Security)
-
-------
-
-************************************************
-Choosing the right BigDL library
-************************************************
-
-.. graphviz::
-
-    digraph BigDLDecisionTree {
-        graph [pad=0.1 ranksep=0.3 tooltip=" "]
-        node [color="#0171c3" shape=box fontname="Arial" fontsize=14 tooltip=" "]
-        edge [tooltip=" "]
-        
-        Feature1 [label="Hardware Secured Big Data & AI?"]
-        Feature2 [label="Python vs. Scala/Java?"]
-        Feature3 [label="What type of application?"]
-        Feature4 [label="Domain?"]
-        
-        LLM[href="https://github.com/intel-analytics/BigDL/blob/main/python/llm" target="_blank" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-LLM document"]
-        Orca[href="../doc/Orca/index.html" target="_blank" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-Orca document"]
-        Nano[href="../doc/Nano/index.html" target="_blank" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-Nano document"]
-        DLlib1[label="DLlib" href="../doc/DLlib/index.html" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-DLlib document"]
-        DLlib2[label="DLlib" href="../doc/DLlib/index.html" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-DLlib document"]
-        Chronos[href="../doc/Chronos/index.html" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-Chronos document"]
-        Friesian[href="../doc/Friesian/index.html" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-Friesian document"]
-        PPML[href="../doc/PPML/index.html" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-PPML document"]
-        
-        ArrowLabel1[label="No" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
-        ArrowLabel2[label="Yes" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
-        ArrowLabel3[label="Python" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
-        ArrowLabel4[label="Scala/Java" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
-        ArrowLabel5[label="Large Language Model" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
-        ArrowLabel6[label="Big Data + \n AI (TF/PyTorch)" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
-        ArrowLabel7[label="Accelerate \n TensorFlow / PyTorch" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
-        ArrowLabel8[label="DL for Spark MLlib" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
-        ArrowLabel9[label="High Level App Framework" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
-        ArrowLabel10[label="Time Series" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
-        ArrowLabel11[label="Recommender System" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
-        
-        Feature1 -> ArrowLabel1[dir=none]
-        ArrowLabel1 -> Feature2
-        Feature1 -> ArrowLabel2[dir=none]
-        ArrowLabel2 -> PPML
-        
-        Feature2 -> ArrowLabel3[dir=none]
-        ArrowLabel3 -> Feature3
-        Feature2 -> ArrowLabel4[dir=none]
-        ArrowLabel4 -> DLlib1
-        
-        Feature3 -> ArrowLabel5[dir=none]
-        ArrowLabel5 -> LLM
-        Feature3 -> ArrowLabel6[dir=none]
-        ArrowLabel6 -> Orca
-        Feature3 -> ArrowLabel7[dir=none]
-        ArrowLabel7 -> Nano
-        Feature3 -> ArrowLabel8[dir=none]
-        ArrowLabel8 -> DLlib2
-        Feature3 -> ArrowLabel9[dir=none]
-        ArrowLabel9 -> Feature4
-     
-        Feature4 -> ArrowLabel10[dir=none]
-        ArrowLabel10 -> Chronos
-        Feature4 -> ArrowLabel11[dir=none]
-        ArrowLabel11 -> Friesian
-    }
-
-------
-
-.. raw:: html
-
-    <div>
-        <p>
-            <sup><a href="#ref-perf" id="footnote-perf">[1]</a>
-               Performance varies by use, configuration and other factors. <code><span>bigdl-llm</span></code> may not optimize to the same degree for non-Intel products. Learn more at <a href="https://www.Intel.com/PerformanceIndex">www.Intel.com/PerformanceIndex</a>.
-            </sup>
-        </p>
-    </div>
+**For more details, please refer to the ipex-llm** `Document <doc/LLM/index.html>`_, `Readme <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm>`_, `Tutorial <https://github.com/intel-analytics/ipex-llm-tutorial>`_ and `API Doc <doc/PythonAPI/LLM/index.html>`_.
\ No newline at end of file
diff --git a/python/llm/README.md b/python/llm/README.md
index dc7df9ff..0a23874d 100644
--- a/python/llm/README.md
+++ b/python/llm/README.md
@@ -1,5 +1,5 @@
-## BigDL-LLM
-**[`bigdl-llm`](https://bigdl.readthedocs.io/en/latest/doc/LLM/index.html)** is a library for running **LLM** (large language model) on Intel **XPU** (from *Laptop* to *GPU* to *Cloud*) using **INT4** with very low latency[^1] (for any **PyTorch** model).
+## IPEX-LLM
+**[`ipex-llm`](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/index.html)** is a library for running **LLM** (large language model) on Intel **XPU** (from *Laptop* to *GPU* to *Cloud*) using **INT4** with very low latency[^1] (for any **PyTorch** model).
 
 > *It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [ggml](https://github.com/ggerganov/ggml), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [qlora](https://github.com/artidoro/qlora), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.*
     
@@ -34,7 +34,7 @@ See the ***optimized performance*** of `chatglm2-6b` and `llama-2-13b-chat` mode
 </table>
 
 ### Verified models
-Over 20 models have been optimized/verified on `bigdl-llm`, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, Mistral, Falcon, MPT, Dolly, StarCoder, Whisper, Baichuan, InternLM, QWen, Aquila, MOSS,* and more; see the complete list below.
+Over 20 models have been optimized/verified on `ipex-llm`, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, Mistral, Falcon, MPT, Dolly, StarCoder, Whisper, Baichuan, InternLM, QWen, Aquila, MOSS,* and more; see the complete list below.
   
 | Model      | CPU Example                                                    | GPU Example                                                     |
 |------------|----------------------------------------------------------------|-----------------------------------------------------------------|
@@ -89,14 +89,14 @@ Over 20 models have been optimized/verified on `bigdl-llm`, including *LLaMA/LLa
 | DeciLM-7B | [link](example/CPU/HF-Transformers-AutoModels/Model/deciLM-7b) | [link](example/GPU/HF-Transformers-AutoModels/Model/deciLM-7b) |
 | Deepseek | [link](example/CPU/HF-Transformers-AutoModels/Model/deepseek) | [link](example/GPU/HF-Transformers-AutoModels/Model/deepseek) |
 
-### Working with `bigdl-llm`
+### Working with `ipex-llm`
 
 <details><summary>Table of Contents</summary>
 
-- [BigDL-LLM](#bigdl-llm)
+- [IPEX-LLM](#ipex-llm)
   - [Demos](#demos)
   - [Verified models](#verified-models)
-  - [Working with `bigdl-llm`](#working-with-bigdl-llm)
+  - [Working with `ipex-llm`](#working-with-ipex-llm)
     - [Install](#install)
       - [CPU](#cpu)
       - [GPU](#gpu)
@@ -108,31 +108,31 @@ Over 20 models have been optimized/verified on `bigdl-llm`, including *LLaMA/LLa
       - [2. Native INT4 model](#2-native-int4-model)
       - [3. LangChain API](#3-langchain-api)
       - [4. CLI Tool](#4-cli-tool)
-  - [`bigdl-llm` API Doc](#bigdl-llm-api-doc)
-  - [`bigdl-llm` Dependency](#bigdl-llm-dependency)
+  - [`ipex-llm` API Doc](#ipex-llm-api-doc)
+  - [`ipex-llm` Dependency](#ipex-llm-dependency)
 
 </details>
 
 #### Install
 ##### CPU
-You may install **`bigdl-llm`** on Intel CPU as follows:
+You may install **`ipex-llm`** on Intel CPU as follows:
 ```bash
-pip install --pre --upgrade bigdl-llm[all]
+pip install --pre --upgrade ipex-llm[all]
 ```
-> Note: `bigdl-llm` has been tested on Python 3.9
+> Note: `ipex-llm` has been tested on Python 3.9
 
 ##### GPU
-You may install **`bigdl-llm`** on Intel GPU as follows:
+You may install **`ipex-llm`** on Intel GPU as follows:
 ```bash
 # below command will install intel_extension_for_pytorch==2.0.110+xpu as default
 # you can install specific ipex/torch version for your need
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
-> Note: `bigdl-llm` has been tested on Python 3.9
+> Note: `ipex-llm` has been tested on Python 3.9
 
 #### Run Model
  
-You may run the models using **`bigdl-llm`** through one of the following APIs:
+You may run the models using **`ipex-llm`** through one of the following APIs:
 1. [Hugging Face `transformers` API](#1-hugging-face-transformers-api)
 2. [Native INT4 Model](#2-native-int4-model)
 3. [LangChain API](#3-langchain-api)
@@ -182,7 +182,7 @@ See the complete examples [here](example/GPU).
 ###### More Low-Bit Support
 - Save and load
 
-  After the model is optimized using `bigdl-llm`, you may save and load the model as follows:
+  After the model is optimized using `ipex-llm`, you may save and load the model as follows:
   ```python
   model.save_low_bit(model_path)
   new_model = AutoModelForCausalLM.load_low_bit(model_path)
@@ -207,7 +207,7 @@ You may also convert Hugging Face *Transformers* models into native INT4 model f
 ```python
 #convert the model
 from ipex_llm import llm_convert
-bigdl_llm_path = llm_convert(model='/path/to/model/',
+ipex_llm_path = llm_convert(model='/path/to/model/',
         outfile='/path/to/output/', outtype='int4', model_family="llama")
 
 #load the converted model
@@ -224,7 +224,7 @@ output = llm.batch_decode(output_ids)
 See the complete example [here](example/CPU/Native-Models/native_int4_pipeline.py). 
 
 ##### 3. LangChain API
-You may run the models using the LangChain API in `bigdl-llm`.
+You may run the models using the LangChain API in `ipex-llm`.
 
 - **Using Hugging Face `transformers` model**
 
@@ -236,9 +236,9 @@ You may run the models using the LangChain API in `bigdl-llm`.
   from langchain.chains.question_answering import load_qa_chain
 
   embeddings = TransformersEmbeddings.from_model_id(model_id=model_path)
-  bigdl_llm = TransformersLLM.from_model_id(model_id=model_path, ...)
+  ipex_llm = TransformersLLM.from_model_id(model_id=model_path, ...)
 
-  doc_chain = load_qa_chain(bigdl_llm, ...)
+  doc_chain = load_qa_chain(ipex_llm, ...)
   output = doc_chain.run(...)
   ```
   See the examples [here](example/CPU/LangChain/transformers_int4).
@@ -257,16 +257,16 @@ You may run the models using the LangChain API in `bigdl-llm`.
   #switch to ChatGLMEmbeddings/GptneoxEmbeddings/BloomEmbeddings/StarcoderEmbeddings to load other models
   embeddings = LlamaEmbeddings(model_path='/path/to/converted/model.bin')
   #switch to ChatGLMLLM/GptneoxLLM/BloomLLM/StarcoderLLM to load other models
-  bigdl_llm = LlamaLLM(model_path='/path/to/converted/model.bin')
+  ipex_llm = LlamaLLM(model_path='/path/to/converted/model.bin')
 
-  doc_chain = load_qa_chain(bigdl_llm, ...)
+  doc_chain = load_qa_chain(ipex_llm, ...)
   doc_chain.run(...)
   ```
 
   See the examples [here](example/CPU/LangChain/native_int4).
 
 ##### 4. CLI Tool
->**Note**: Currently `bigdl-llm` CLI supports *LLaMA* (e.g., *vicuna*), *GPT-NeoX* (e.g., *redpajama*), *BLOOM* (e.g., *pheonix*) and *GPT2* (e.g., *starcoder*) model architecture; for other models, you may use the Hugging Face `transformers` or LangChain APIs.
+>**Note**: Currently `ipex-llm` CLI supports *LLaMA* (e.g., *vicuna*), *GPT-NeoX* (e.g., *redpajama*), *BLOOM* (e.g., *pheonix*) and *GPT2* (e.g., *starcoder*) model architecture; for other models, you may use the Hugging Face `transformers` or LangChain APIs.
 
  - ##### Convert model
  
@@ -300,14 +300,14 @@ You may run the models using the LangChain API in `bigdl-llm`.
    llm-chat -m "/path/to/output/model.bin" -x llama
    ```
 
-### `bigdl-llm` API Doc
-See the inital `bigdl-llm` API Doc [here](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/LLM/index.html).
+### `ipex-llm` API Doc
+See the inital `ipex-llm` API Doc [here](https://ipex-llm.readthedocs.io/en/latest/doc/PythonAPI/LLM/index.html).
 
-[^1]: Performance varies by use, configuration and other factors. `bigdl-llm` may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex.
+[^1]: Performance varies by use, configuration and other factors. `ipex-llm` may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex.
 
-### `bigdl-llm` Dependency
-The native code/lib in `bigdl-llm` has been built using the following tools.
-Note that lower  `LIBC` version on your Linux system may be incompatible with `bigdl-llm`.
+### `ipex-llm` Dependency
+The native code/lib in `ipex-llm` has been built using the following tools.
+Note that lower  `LIBC` version on your Linux system may be incompatible with `ipex-llm`.
 
 | Model family | Platform | Compiler           | GLIBC |
 | ------------ | -------- | ------------------ | ----- |
diff --git a/python/llm/dev/benchmark/README.md b/python/llm/dev/benchmark/README.md
index c44a7c7a..4e5104fb 100644
--- a/python/llm/dev/benchmark/README.md
+++ b/python/llm/dev/benchmark/README.md
@@ -61,7 +61,7 @@ with torch.inference_mode():
 
 ### Inference on multi GPUs
 Similarly, put this file into your benchmark directory, and then wrap your optimized model with `BenchmarkWrapper` (`model = BenchmarkWrapper(model)`).
-For example, just need to apply following code patch on [Deepspeed Autotp example code](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/Deepspeed-AutoTP/deepspeed_autotp.py) to calculate 1st and the rest token performance:
+For example, just need to apply following code patch on [Deepspeed Autotp example code](https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/Deepspeed-AutoTP/deepspeed_autotp.py) to calculate 1st and the rest token performance:
 ```python
  import torch
  import transformers
diff --git a/python/llm/dev/benchmark/all-in-one/run-deepspeed-spr.sh b/python/llm/dev/benchmark/all-in-one/run-deepspeed-spr.sh
index 25bded0b..06f0dc2e 100644
--- a/python/llm/dev/benchmark/all-in-one/run-deepspeed-spr.sh
+++ b/python/llm/dev/benchmark/all-in-one/run-deepspeed-spr.sh
@@ -1,5 +1,5 @@
 #!/bin/bash
-source bigdl-llm-init
+source ipex-llm-init
 unset OMP_NUM_THREADS # deepspeed will set it for each instance automatically
 source /opt/intel/oneccl/env/setvars.sh
 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib
diff --git a/python/llm/dev/benchmark/all-in-one/run-hbm.sh b/python/llm/dev/benchmark/all-in-one/run-hbm.sh
index d57b5ec3..96cdcb12 100644
--- a/python/llm/dev/benchmark/all-in-one/run-hbm.sh
+++ b/python/llm/dev/benchmark/all-in-one/run-hbm.sh
@@ -1,5 +1,5 @@
 #!/bin/bash
-source bigdl-llm-init
+source ipex-llm-init
 
 sockets_num=$(lscpu | grep "Socket(s)" | awk -F ':' '{print $2}')
 cores_per_socket=$(lscpu | grep "Core(s) per socket" | awk -F ':' '{print $2}')
diff --git a/python/llm/dev/benchmark/all-in-one/run-spr.sh b/python/llm/dev/benchmark/all-in-one/run-spr.sh
index f86de3b9..964c46b7 100644
--- a/python/llm/dev/benchmark/all-in-one/run-spr.sh
+++ b/python/llm/dev/benchmark/all-in-one/run-spr.sh
@@ -1,5 +1,5 @@
 #!/bin/bash
-source bigdl-llm-init
+source ipex-llm-init
 export OMP_NUM_THREADS=48
 
 # set following parameters according to the actual specs of the test machine
diff --git a/python/llm/dev/benchmark/harness/README.md b/python/llm/dev/benchmark/harness/README.md
index 46b39865..4dfcf09a 100644
--- a/python/llm/dev/benchmark/harness/README.md
+++ b/python/llm/dev/benchmark/harness/README.md
@@ -1,7 +1,7 @@
 # Harness Evaluation
-[Harness evaluation](https://github.com/EleutherAI/lm-evaluation-harness) allows users to eaisly get accuracy on various datasets. Here we have enabled harness evaluation with BigDL-LLM under 
+[Harness evaluation](https://github.com/EleutherAI/lm-evaluation-harness) allows users to eaisly get accuracy on various datasets. Here we have enabled harness evaluation with IPEX-LLM under 
 [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) settings.
-Before running, make sure to have [bigdl-llm](../../../README.md) installed.
+Before running, make sure to have [ipex-llm](../../../README.md) installed.
 
 ## Install Harness
 ```bash
@@ -16,15 +16,15 @@ run `python run_llb.py`. `run_llb.py` combines some arguments in `main.py` to ma
 
 ### Evaluation on CPU
 ```python
-python run_llb.py --model bigdl-llm --pretrained /path/to/model --precision nf3 sym_int4 nf4 --device cpu --tasks hellaswag arc mmlu truthfulqa --batch 1 --no_cache
+python run_llb.py --model ipex-llm --pretrained /path/to/model --precision nf3 sym_int4 nf4 --device cpu --tasks hellaswag arc mmlu truthfulqa --batch 1 --no_cache
 ```
 ### Evaluation on Intel GPU
 ```python
-python run_llb.py --model bigdl-llm --pretrained /path/to/model --precision nf3 sym_int4 nf4 --device xpu --tasks hellaswag arc mmlu truthfulqa --batch 1 --no_cache
+python run_llb.py --model ipex-llm --pretrained /path/to/model --precision nf3 sym_int4 nf4 --device xpu --tasks hellaswag arc mmlu truthfulqa --batch 1 --no_cache
 ```
 ### Evaluation using multiple Intel GPU
 ```python
-python run_multi_llb.py --model bigdl-llm --pretrained /path/to/model --precision nf3 sym_int4 nf4 --device xpu:0,2,3 --tasks hellaswag arc mmlu truthfulqa --batch 1 --no_cache
+python run_multi_llb.py --model ipex-llm --pretrained /path/to/model --precision nf3 sym_int4 nf4 --device xpu:0,2,3 --tasks hellaswag arc mmlu truthfulqa --batch 1 --no_cache
 ```
 Taking example above, the script will fork 3 processes, each for one xpu, to execute the tasks.
 ## Results
diff --git a/python/llm/dev/benchmark/whisper/README.md b/python/llm/dev/benchmark/whisper/README.md
index f9c023f5..189435db 100644
--- a/python/llm/dev/benchmark/whisper/README.md
+++ b/python/llm/dev/benchmark/whisper/README.md
@@ -1,7 +1,7 @@
 # Whisper Test
 The Whisper Test allows users to evaluate the performance and accuracy of [Whisper](https://huggingface.co/openai/whisper-base) speech-to-text models.
 For accuracy, the model is tested on the [LibriSpeech](https://huggingface.co/datasets/librispeech_asr) dataset using [Word Error Rate (WER)](https://github.com/huggingface/evaluate/tree/main/metrics/wer) metric.
-Before running, make sure to have [bigdl-llm](../../../README.md) installed.
+Before running, make sure to have [ipex-llm](../../../README.md) installed.
 
 ## Install Dependencies
 ```bash
@@ -17,7 +17,7 @@ The LibriSpeech dataset contains 'clean' and 'other' splits.
 You can specify the split to evaluate with ```--data_type```.
 By default, we set it to ```other```.
 You can specify the device to run the test on with  ```--device```.
-To run on Intel GPU, set it to ```xpu```, and refer to [GPU installation guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for details on installation and optimal configuration.
+To run on Intel GPU, set it to ```xpu```, and refer to [GPU installation guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for details on installation and optimal configuration.
 
 
 > **Note**
diff --git a/python/llm/dev/release.sh b/python/llm/dev/release.sh
index 9444febf..4d124136 100644
--- a/python/llm/dev/release.sh
+++ b/python/llm/dev/release.sh
@@ -38,8 +38,8 @@ if [ "${version}" != "default" ]; then
     echo $version > $BIGDL_DIR/python/version.txt
 fi
 
-bigdl_version=$(cat $BIGDL_DIR/python/version.txt | head -1)
-echo "The effective version is: ${bigdl_version}"
+ipex_llm_version=$(cat $BIGDL_DIR/python/version.txt | head -1)
+echo "The effective version is: ${ipex_llm_version}"
 
 if [ "$platform" == "linux" ]; then
     verbose_pname="manylinux2010_x86_64"
@@ -72,14 +72,14 @@ fi
 
 if [ ${upload} == true ]; then
     # upload to pypi
-    upload_to_pypi_command="twine upload dist/bigdl_llm-${bigdl_version}-*-${verbose_pname}.whl"
+    upload_to_pypi_command="twine upload dist/ipex_llm-${ipex_llm_version}-*-${verbose_pname}.whl"
     echo "Please manually upload with this command: $upload_to_pypi_command"
     $upload_to_pypi_command
 
     # upload to sourceforge
     rsync -avzr -e \
     "sshpass -p '${SOURCEFORGE_PW}' ssh -o StrictHostKeyChecking=no" \
-    ./dist/bigdl_llm-${bigdl_version}-*-${verbose_pname}.whl \
-    intelanalytics@frs.sourceforge.net:/home/frs/project/analytics-zoo/bigdl-llm-whl/bigdl-llm/${bigdl_version}/
+    ./dist/ipex_llm-${ipex_llm_version}-*-${verbose_pname}.whl \
+    intelanalytics@frs.sourceforge.net:/home/frs/project/analytics-zoo/ipex-llm-whl/ipex-llm/${ipex_llm_version}/
 
 fi
diff --git a/python/llm/dev/release_default_linux.sh b/python/llm/dev/release_default_linux.sh
index 9c5c1214..df23bf47 100644
--- a/python/llm/dev/release_default_linux.sh
+++ b/python/llm/dev/release_default_linux.sh
@@ -16,8 +16,8 @@
 # limitations under the License.
 #
 
-# This is the default script with maven parameters to release bigdl-math for linux.
-# Note that if the maven parameters to build bigdl-math need to be changed,
+# This is the default script with maven parameters to release ipex-math for linux.
+# Note that if the maven parameters to build ipex-math need to be changed,
 # make sure to change this file accordingly.
 # If you want to customize the release, please use release.sh and specify maven parameters instead.
 
diff --git a/python/llm/dev/release_default_windows.sh b/python/llm/dev/release_default_windows.sh
index 878bf35c..4f3aa50c 100644
--- a/python/llm/dev/release_default_windows.sh
+++ b/python/llm/dev/release_default_windows.sh
@@ -16,8 +16,8 @@
 # limitations under the License.
 #
 
-# This is the default script with maven parameters to release bigdl-math for linux.
-# Note that if the maven parameters to build bigdl-math need to be changed,
+# This is the default script with maven parameters to release ipex-math for linux.
+# Note that if the maven parameters to build ipex-math need to be changed,
 # make sure to change this file accordingly.
 # If you want to customize the release, please use release.sh and specify maven parameters instead.
 
diff --git a/python/llm/example/CPU/Applications/autogen/README.md b/python/llm/example/CPU/Applications/autogen/README.md
index 41e39727..ceb9fd7a 100644
--- a/python/llm/example/CPU/Applications/autogen/README.md
+++ b/python/llm/example/CPU/Applications/autogen/README.md
@@ -1,10 +1,10 @@
-## Running AutoGen Agent Chat with BigDL-LLM on Local Models
-This example is adapted from the [Official AutoGen Teachablility tutorial](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_teachability.ipynb). We use a version of FastChat modified for BigDL to create a teachable chat agent with [AutoGen](https://microsoft.github.io/autogen/) that works with locally deployed LLMs. This special agent can remember things you tell it over time, unlike regular chatbots that forget after each conversation. It does this by saving what it learns on disk, and then bring up the learnt information in future chats. This means you can teach it lots of new things—like facts, new skills, preferences, etc.
+## Running AutoGen Agent Chat with IPEX-LLM on Local Models
+This example is adapted from the [Official AutoGen Teachablility tutorial](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_teachability.ipynb). We use a version of FastChat modified for IPEX-LLM to create a teachable chat agent with [AutoGen](https://microsoft.github.io/autogen/) that works with locally deployed LLMs. This special agent can remember things you tell it over time, unlike regular chatbots that forget after each conversation. It does this by saving what it learns on disk, and then bring up the learnt information in future chats. This means you can teach it lots of new things—like facts, new skills, preferences, etc.
 
 In this example, we illustrate teaching the agent something it doesn't initially know. When we ask, `What is the Vicuna model?`, it doesn't have the answer. We then inform it, `Vicuna is a 13B-parameter language model released by Meta.` We repeat the process for the Orca model, telling the agent, `Orca is a 13B-parameter language model developed by Microsoft. It outperforms Vicuna on most tasks.` Finally, we test if the agent has learned by asking, `How does the Vicuna model compare to the Orca model?` The agent's response confirms it has retained and can use the information we taught it.
 
 
-### 1. Setup BigDL-LLM Environment
+### 1. Setup IPEX-LLM Environment
 ```bash
 # create autogen running directory
 mkdir autogen
@@ -14,9 +14,9 @@ cd autogen
 conda create -n autogen python=3.9
 conda activate autogen
 
-# install fastchat-adapted bigdl-llm
-# we recommend using bigdl-llm version >= 2.5.0b20240110
-pip install --pre --upgrade bigdl-llm[serving]
+# install fastchat-adapted ipex-llm
+# we recommend using ipex-llm version >= 2.5.0b20240110
+pip install --pre --upgrade ipex-llm[serving]
 
 # install recommend transformers version
 pip install transformers==4.36.2
@@ -73,7 +73,7 @@ python -m ipex_llm.serving.model_worker --model-path ... --device cpu
 ```
 
 Change the Model Name:
-> Assume you use the model `Mistral-7B-Instruct-v0.2` and your model is downloaded to `autogen/model/Mistral-7B-Instruct-v0.2`. You should rename the model to `autogen/model/bigdl` and run `python -m ipex_llm.serving.model_worker --model-path ... --device cpu`. This ensures the proper usage of the BigDL-adapted FastChat.
+> Assume you use the model `Mistral-7B-Instruct-v0.2` and your model is downloaded to `autogen/model/Mistral-7B-Instruct-v0.2`. You should rename the model to `autogen/model/ipex-llm` and run `python -m ipex_llm.serving.model_worker --model-path ... --device cpu`. This ensures the proper usage of the IPEX-LLM-adapted FastChat.
 
 Potential Error Note:
 > If you get `RuntimeError: Error register to Controller` in the worker terminal, please set `export no_proxy='localhost'` to ensure the registration
@@ -175,4 +175,4 @@ teachable_agent (to user):
 Based on the given memories, the Vicuna model and the Orca model are both 13B-parameter language models, meaning they have similar capacity and architecture. However, the text states that the Orca model, developed by Microsoft, outperforms the Vicuna model on most tasks. Therefore, the Orca model can be considered more advanced or effective than the Vicuna model based on the provided information. It's important to note that this comparison is based on the specific task or set of tasks mentioned in the text, and the performance of the models may vary depending on the specific use case or dataset.
 
 --------------------------------------------------------------------------------
-```
\ No newline at end of file
+```
diff --git a/python/llm/example/CPU/Applications/hf-agent/README.md b/python/llm/example/CPU/Applications/hf-agent/README.md
index 2bfe5d1e..edbae072 100644
--- a/python/llm/example/CPU/Applications/hf-agent/README.md
+++ b/python/llm/example/CPU/Applications/hf-agent/README.md
@@ -1,10 +1,10 @@
-# BigDL-LLM Transformers INT4 Optimization for HuggingFace Transformers Agent
-In this example, we apply low-bit optimizations to [HuggingFace Transformers Agents](https://huggingface.co/docs/transformers/transformers_agents) using BigDL-LLM, which allows LLMs to use tools such as image generation, image captioning, text summarization, etc.
+# IPEX-LLM Transformers INT4 Optimization for HuggingFace Transformers Agent
+In this example, we apply low-bit optimizations to [HuggingFace Transformers Agents](https://huggingface.co/docs/transformers/transformers_agents) using IPEX-LLM, which allows LLMs to use tools such as image generation, image captioning, text summarization, etc.
 
 For illustration purposes, we utilize the [lmsys/vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5) as the reference model. We use [lmsys/vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5) to create an agent, and then ask the agent to generate the caption for an image from coco dataset, i.e. [demo.jpg](https://cocodataset.org/#explore?id=264959)
 
 ## 0. Requirements
-To run this example with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model#recommended-requirements) for more information.
+To run this example with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model#recommended-requirements) for more information.
 
 
 ### 1. Install
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 pip install pillow # additional package required for opening images
 ```
 
@@ -26,7 +26,7 @@ Arguments info:
 - `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the Vicuna model (e.g. `lmsys/vicuna-7b-v1.5`) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be `'lmsys/vicuna-7b-v1.5'`.
 - `--image-path IMAGE_PATH`: argument defining the image to be infered.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Vicuna model based on the capabilities of your machine.
 
@@ -41,8 +41,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/Applications/streaming-llm/README.md b/python/llm/example/CPU/Applications/streaming-llm/README.md
index 0bc1a627..a008b1d2 100644
--- a/python/llm/example/CPU/Applications/streaming-llm/README.md
+++ b/python/llm/example/CPU/Applications/streaming-llm/README.md
@@ -1,7 +1,7 @@
-# Low-Bit Streaming LLM using BigDL-LLM
+# Low-Bit Streaming LLM using IPEX-LLM
 
-In this example, we apply low-bit optimizations to [Streaming-LLM](https://github.com/mit-han-lab/streaming-llm/tree/main#efficient-streaming-language-models-with-attention-sinks) using BigDL-LLM, which can deploy low-bit(including FP4/INT4/FP8/INT8) LLMs for infinite-length inputs.
-Only one code change is needed to load the model using bigdl-llm as follows:
+In this example, we apply low-bit optimizations to [Streaming-LLM](https://github.com/mit-han-lab/streaming-llm/tree/main#efficient-streaming-language-models-with-attention-sinks) using IPEX-LLM, which can deploy low-bit(including FP4/INT4/FP8/INT8) LLMs for infinite-length inputs.
+Only one code change is needed to load the model using ipex-llm as follows:
 ```python
 from ipex_llm.transformers import AutoModelForCausalLM
 model = AutoModelForCausalLM.from_pretrained(model_name_or_path, load_in_4bit=True, trust_remote_code=True, optimize_model=False)
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all]
+pip install --pre --upgrade ipex-llm[all]
 ```
 
 ## Run Example
diff --git a/python/llm/example/CPU/Deepspeed-AutoTP/README.md b/python/llm/example/CPU/Deepspeed-AutoTP/README.md
index 8cc4d7ab..ed738567 100644
--- a/python/llm/example/CPU/Deepspeed-AutoTP/README.md
+++ b/python/llm/example/CPU/Deepspeed-AutoTP/README.md
@@ -1,4 +1,4 @@
-### Run Tensor-Parallel BigDL Transformers INT4 Inference with Deepspeed
+### Run Tensor-Parallel IPEX-LLM Transformers INT4 Inference with Deepspeed
 
 #### 1. Install Dependencies
 
@@ -33,23 +33,23 @@ model = deepspeed.init_inference(
 
 Then, returned model is converted into a deepspeed InferenceEnginee type.
 
-#### 3. Optimize Model with BigDL-LLM Low Bit
+#### 3. Optimize Model with IPEX-LLM Low Bit
 
-Distributed model managed by deepspeed can be further optimized with BigDL low-bit Python API, e.g. sym_int4:
+Distributed model managed by deepspeed can be further optimized with IPEX low-bit Python API, e.g. sym_int4:
 
 ```python
-# Apply BigDL-LLM INT4 optimizations on transformers
+# Apply IPEX-LLM INT4 optimizations on transformers
 from ipex_llm import optimize_model
 
 model = optimize_model(model.module.to(f'cpu'), low_bit='sym_int4')
 model = model.to(f'cpu:{local_rank}') # move partial model to local rank
 ```
 
-Then, a bigdl-llm transformers is returned, which in the following, can serve in parallel with native APIs.
+Then, a ipex-llm transformers is returned, which in the following, can serve in parallel with native APIs.
 
 #### 4. Start Python Code
 
-You can try deepspeed with BigDL LLM by:
+You can try deepspeed with IPEX LLM by:
 
 ```bash
 bash run.sh
@@ -59,7 +59,7 @@ If you want to run your own application, there are **necessary configurations in
 
 ```bash
 # run.sh
-source bigdl-llm-init
+source ipex-llm-init
 unset OMP_NUM_THREADS # deepspeed will set it for each instance automatically
 source /opt/intel/oneccl/env/setvars.sh
 ......
diff --git a/python/llm/example/CPU/Deepspeed-AutoTP/install.sh b/python/llm/example/CPU/Deepspeed-AutoTP/install.sh
index fa2b0910..7cb48587 100644
--- a/python/llm/example/CPU/Deepspeed-AutoTP/install.sh
+++ b/python/llm/example/CPU/Deepspeed-AutoTP/install.sh
@@ -19,5 +19,5 @@ pip install https://intel-extension-for-pytorch.s3.amazonaws.com/torch_ccl/cpu/o
 pip install deepspeed==0.11.1
 # 4. exclude intel deepspeed extension, which is only for XPU
 pip uninstall intel-extension-for-deepspeed
-# 5. install bigdl-llm
-pip install --pre --upgrade bigdl-llm[all]
+# 5. install ipex-llm
+pip install --pre --upgrade ipex-llm[all]
diff --git a/python/llm/example/CPU/Deepspeed-AutoTP/run.sh b/python/llm/example/CPU/Deepspeed-AutoTP/run.sh
index 1eb07388..73fb34a2 100644
--- a/python/llm/example/CPU/Deepspeed-AutoTP/run.sh
+++ b/python/llm/example/CPU/Deepspeed-AutoTP/run.sh
@@ -1,5 +1,5 @@
 #/bin/bash
-source bigdl-llm-init
+source ipex-llm-init
 unset OMP_NUM_THREADS # deepspeed will set it for each instance automatically
 source /opt/intel/oneccl/env/setvars.sh
 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ/README.md
index cb60ae65..cecbe84a 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ/README.md
@@ -1,6 +1,6 @@
 # AWQ
 
-This example shows how to directly run 4-bit AWQ models using BigDL-LLM on Intel CPU.
+This example shows how to directly run 4-bit AWQ models using IPEX-LLM on Intel CPU.
 
 ## Verified Models
 
@@ -23,11 +23,11 @@ This example shows how to directly run 4-bit AWQ models using BigDL-LLM on Intel
 
 ## Requirements
 
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#system-support) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#system-support) for more information.
 
 ## Example: Predict Tokens using `generate()` API
 
-In the example [generate.py](./generate.py), we show a basic use case for a AWQ model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a AWQ model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 
 ### 1. Install
 
@@ -38,7 +38,7 @@ conda create -n llm python=3.9
 conda activate llm
 
 pip install autoawq==0.1.8 --no-deps
-pip install --pre --upgrade bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install ipex-llm with 'all' option
 pip install transformers==4.35.0
 pip install accelerate==0.25.0
 pip install einops
@@ -60,7 +60,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the model based on the capabilities of your machine.
 
@@ -79,8 +79,8 @@ For optimal performance on server, it is recommended to set several environment
 E.g. on Linux,
 
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF/README.md
index 9eec31d5..33c28850 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF/README.md
@@ -1,5 +1,5 @@
 # Loading GGUF models
-In this directory, you will find examples on how to load GGUF model into `bigdl-llm`.
+In this directory, you will find examples on how to load GGUF model into `ipex-llm`.
 
 ## Verified Models(Q4_0)
 - [Llama-2-7B-Chat-GGUF](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/tree/main)
@@ -12,23 +12,23 @@ In this directory, you will find examples on how to load GGUF model into `bigdl-
 - [Yuan2-2B-Februa-hf-GGUF](https://huggingface.co/IEITYuan/Yuan2-2B-Februa-hf-GGUF/tree/main)
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#system-support) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#system-support) for more information.
 
 **Important: Please make sure you have installed `transformers==4.36.0` to run the example.**
 
 
 ## Example: Load gguf model using `from_gguf()` API
-In the example [generate.py](./generate.py), we show a basic use case to load a GGUF LLaMA2 model into `bigdl-llm` using `from_gguf()` API, with BigDL-LLM optimizations.
+In the example [generate.py](./generate.py), we show a basic use case to load a GGUF LLaMA2 model into `ipex-llm` using `from_gguf()` API, with IPEX-LLM optimizations.
 
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install transformers==4.36.0  # upgrade transformers
 ```
 ### 2. Run
@@ -46,8 +46,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Advanced-Quantizations/GPTQ/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Advanced-Quantizations/GPTQ/README.md
index 52b05c3c..d91f997e 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Advanced-Quantizations/GPTQ/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Advanced-Quantizations/GPTQ/README.md
@@ -1,18 +1,18 @@
 # GPTQ
-This example shows how to directly run 4-bit GPTQ models using BigDL-LLM on Intel CPU. For illustration purposes, we utilize the ["TheBloke/Llama-2-7B-GPTQ"](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ) as a reference.
+This example shows how to directly run 4-bit GPTQ models using IPEX-LLM on Intel CPU. For illustration purposes, we utilize the ["TheBloke/Llama-2-7B-GPTQ"](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ) as a reference.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 pip install transformers==4.34.0
 BUILD_CUDA_EXT=0 pip install git+https://github.com/PanQiWei/AutoGPTQ.git@1de9ab6
 pip install optimum==0.14.0
@@ -28,7 +28,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Llama2 model based on the capabilities of your machine.
 
@@ -43,8 +43,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/README.md
index 4509079c..618f1902 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/README.md
@@ -1,14 +1,14 @@
-# BigDL-LLM Transformers INT4 Optimization for Large Language Model
-You can use BigDL-LLM to run any Huggingface Transformer models with INT4 optimizations on either servers or laptops. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
+# IPEX-LLM Transformers INT4 Optimization for Large Language Model
+You can use IPEX-LLM to run any Huggingface Transformer models with INT4 optimizations on either servers or laptops. This directory contains example scripts to help you quickly get started using IPEX-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
 
 ## Recommended Requirements
 To run the examples, we recommend using Intel® Xeon® processors (server), or >= 12th Gen Intel® Core™ processor (client).
 
-For OS, BigDL-LLM supports Ubuntu 20.04 or later (glibc>=2.17), CentOS 7 or later (glibc>=2.17), and Windows 10/11.
+For OS, IPEX-LLM supports Ubuntu 20.04 or later (glibc>=2.17), CentOS 7 or later (glibc>=2.17), and Windows 10/11.
 
 ## Best Known Configuration on Linux
-For better performance, it is recommended to set environment variables on Linux with the help of BigDL-LLM:
+For better performance, it is recommended to set environment variables on Linux with the help of IPEX-LLM:
 ```bash
-pip install bigdl-llm
-source bigdl-llm-init
+pip install ipex-llm
+source ipex-llm-init
 ```
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/aquila/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/aquila/README.md
index 0a5914ed..63468b19 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/aquila/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/aquila/README.md
@@ -1,31 +1,31 @@
 # Aquila
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Aquila models. For illustration purposes, we utilize the [BAAI/AquilaChat-7B](https://huggingface.co/BAAI/AquilaChat-7B) as a reference Aquila model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Aquila models. For illustration purposes, we utilize the [BAAI/AquilaChat-7B](https://huggingface.co/BAAI/AquilaChat-7B) as a reference Aquila model.
 
 > **Note**: If you want to download the Hugging Face *Transformers* model, please refer to [here](https://huggingface.co/docs/hub/models-downloading#using-git).
 >
-> BigDL-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
+> IPEX-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Aquila model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Aquila model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 ```
 
 ### 2. Run
 After setting up the Python environment, you could run the example by following steps.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Aquila model based on the capabilities of your machine.
 
@@ -41,8 +41,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/aquila2/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/aquila2/README.md
index 07bb91b5..50e7b83d 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/aquila2/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/aquila2/README.md
@@ -1,31 +1,31 @@
 # Aquila2
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Aquila2 models. For illustration purposes, we utilize the [BAAI/AquilaChat2-7B](https://huggingface.co/BAAI/AquilaChat2-7B) as a reference Aquila2 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Aquila2 models. For illustration purposes, we utilize the [BAAI/AquilaChat2-7B](https://huggingface.co/BAAI/AquilaChat2-7B) as a reference Aquila2 model.
 
 > **Note**: If you want to download the Hugging Face *Transformers* model, please refer to [here](https://huggingface.co/docs/hub/models-downloading#using-git).
 >
-> BigDL-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
+> IPEX-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Aquila2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Aquila2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 ```
 
 ### 2. Run
 After setting up the Python environment, you could run the example by following steps.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Aquila2 model based on the capabilities of your machine.
 
@@ -41,8 +41,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan/README.md
index 4f15618b..b7ed859e 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan/README.md
@@ -1,18 +1,18 @@
 # Baichuan
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Baichuan models. For illustration purposes, we utilize the [baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat) as a reference Baichuan model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Baichuan models. For illustration purposes, we utilize the [baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat) as a reference Baichuan model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Baichuan model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Baichuan model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 pip install transformers_stream_generator  # additional package required for Baichuan-13B-Chat to conduct generation
 ```
 
@@ -26,7 +26,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'AI是什么？'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Baichuan model based on the capabilities of your machine.
 
@@ -41,8 +41,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan2/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan2/README.md
index 51495d3d..e5d9a1aa 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan2/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan2/README.md
@@ -1,18 +1,18 @@
 # Baichuan2
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Baichuan2 models. For illustration purposes, we utilize the [baichuan-inc/Baichuan2-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat) as a reference Baichuan model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Baichuan2 models. For illustration purposes, we utilize the [baichuan-inc/Baichuan2-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat) as a reference Baichuan model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Baichuan model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Baichuan model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 pip install transformers_stream_generator  # additional package required for Baichuan-13B-Chat to conduct generation
 ```
 
@@ -26,7 +26,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'AI是什么？'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Baichuan model based on the capabilities of your machine.
 
@@ -41,8 +41,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/bluelm/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/bluelm/README.md
index 492b0208..addec52f 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/bluelm/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/bluelm/README.md
@@ -1,18 +1,18 @@
 # BlueLM
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on BlueLM models. For illustration purposes, we utilize the [vivo-ai/BlueLM-7B-Chat](https://huggingface.co/vivo-ai/BlueLM-7B-Chat) as a reference BlueLM model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on BlueLM models. For illustration purposes, we utilize the [vivo-ai/BlueLM-7B-Chat](https://huggingface.co/vivo-ai/BlueLM-7B-Chat) as a reference BlueLM model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a BlueLM model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a BlueLM model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 ```
 
 ### 2. Run
@@ -25,7 +25,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'AI是什么？'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the BlueLM model based on the capabilities of your machine.
 
@@ -40,8 +40,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm/README.md
index f549ae78..d56d070e 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm/README.md
@@ -1,31 +1,31 @@
 # ChatGLM
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on ChatGLM models. For illustration purposes, we utilize the [THUDM/chatglm-6b](https://huggingface.co/THUDM/chatglm-6b) as a reference ChatGLM model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on ChatGLM models. For illustration purposes, we utilize the [THUDM/chatglm-6b](https://huggingface.co/THUDM/chatglm-6b) as a reference ChatGLM model.
 
 > **Note**: If you want to download the Hugging Face *Transformers* model, please refer to [here](https://huggingface.co/docs/hub/models-downloading#using-git).
 >
-> BigDL-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
+> IPEX-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 ```
 
 ### 2. Run
 After setting up the Python environment, you could run the example by following steps.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the ChatGLM model based on the capabilities of your machine.
 
@@ -41,8 +41,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm2/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm2/README.md
index 969db1f8..54acc3b6 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm2/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm2/README.md
@@ -1,19 +1,19 @@
 # ChatGLM2
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on ChatGLM2 models. For illustration purposes, we utilize the [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b) as a reference ChatGLM2 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on ChatGLM2 models. For illustration purposes, we utilize the [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b) as a reference ChatGLM2 model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example 1: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 ```
 
 ### 2. Run
@@ -26,7 +26,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'AI是什么？'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the ChatGLM2 model based on the capabilities of your machine.
 
@@ -41,8 +41,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
@@ -76,14 +76,14 @@ Inference time: xxxx s
 ```
 
 ## Example 2: Stream Chat using `stream_chat()` API
-In the example [streamchat.py](./streamchat.py), we show a basic use case for a ChatGLM2 model to stream chat, with BigDL-LLM INT4 optimizations.
+In the example [streamchat.py](./streamchat.py), we show a basic use case for a ChatGLM2 model to stream chat, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 ```
 
 ### 2. Run
@@ -102,7 +102,7 @@ Arguments info:
 - `--question QUESTION`: argument defining the question to ask. It is default to be `"晚上睡不着应该怎么办"`.
 - `--disable-stream`: argument defining whether to stream chat. If include `--disable-stream` when running the script, the stream chat is disabled and `chat()` API is used.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the ChatGLM2 model based on the capabilities of your machine.
 
@@ -118,8 +118,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm3/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm3/README.md
index 93e8c9c1..966f0894 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm3/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm3/README.md
@@ -1,19 +1,19 @@
 # ChatGLM3
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on ChatGLM3 models. For illustration purposes, we utilize the [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b) as a reference ChatGLM3 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on ChatGLM3 models. For illustration purposes, we utilize the [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b) as a reference ChatGLM3 model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example 1: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM3 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM3 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install ipex-llm with 'all' option
 ```
 
 ### 2. Run
@@ -26,7 +26,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'AI是什么？'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the ChatGLM3 model based on the capabilities of your machine.
 
@@ -41,8 +41,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
@@ -77,14 +77,14 @@ AI stands for Artificial Intelligence. It refers to the development of computer
 ```
 
 ## Example 2: Stream Chat using `stream_chat()` API
-In the example [streamchat.py](./streamchat.py), we show a basic use case for a ChatGLM3 model to stream chat, with BigDL-LLM INT4 optimizations.
+In the example [streamchat.py](./streamchat.py), we show a basic use case for a ChatGLM3 model to stream chat, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install ipex-llm with 'all' option
 ```
 
 ### 2. Run
@@ -103,7 +103,7 @@ Arguments info:
 - `--question QUESTION`: argument defining the question to ask. It is default to be `"晚上睡不着应该怎么办"`.
 - `--disable-stream`: argument defining whether to stream chat. If include `--disable-stream` when running the script, the stream chat is disabled and `chat()` API is used.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the ChatGLM3 model based on the capabilities of your machine.
 
@@ -119,8 +119,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/codellama/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/codellama/README.md
index e22b20ac..be3687cf 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/codellama/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/codellama/README.md
@@ -1,18 +1,18 @@
 # CodeLlama
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on CodeLlama models. For illustration purposes, we utilize the [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) as reference CodeLlama models.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on CodeLlama models. For illustration purposes, we utilize the [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) as reference CodeLlama models.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a CodeLlama model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a CodeLlama model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 pip install transformers==4.34.1 # CodeLlamaTokenizer is supported in higher version of transformers
 ```
 
@@ -26,7 +26,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'def print_hello_world():'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the CodeLlama model based on the capabilities of your machine.
 
@@ -41,8 +41,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/codeshell/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/codeshell/README.md
index 9b2d976d..59c935c5 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/codeshell/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/codeshell/README.md
@@ -1,31 +1,31 @@
 # CodeShell-7B
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on CodeShell models. For illustration purposes, we utilize the [WisdomShell/CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B) as a reference CodeShell model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on CodeShell models. For illustration purposes, we utilize the [WisdomShell/CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B) as a reference CodeShell model.
 
 > **Note**: If you want to download the Hugging Face *Transformers* model, please refer to [here](https://huggingface.co/docs/hub/models-downloading#using-git).
 >
-> BigDL-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
+> IPEX-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a CodeShell model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a CodeShell model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 ```
 
 ### 2. Run
 After setting up the Python environment, you could run the example by following steps.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the CodeShell model based on the capabilities of your machine.
 
@@ -41,8 +41,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/deciLM-7b/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/deciLM-7b/README.md
index fbf91242..420627c5 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/deciLM-7b/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/deciLM-7b/README.md
@@ -1,18 +1,18 @@
 # DeciLM-7B
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on DeciLM-7B models. For illustration purposes, we utilize the [Deci/DeciLM-7B-instruct](https://huggingface.co/Deci/DeciLM-7B-instruct) as a reference DeciLM-7B model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on DeciLM-7B models. For illustration purposes, we utilize the [Deci/DeciLM-7B-instruct](https://huggingface.co/Deci/DeciLM-7B-instruct) as a reference DeciLM-7B model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a DeciLM-7B model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a DeciLM-7B model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install transformers==4.35.2 # required by DeciLM-7B
 ```
 
@@ -26,7 +26,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the DeciLM-7B model based on the capabilities of your machine.
 
@@ -41,8 +41,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/deepseek-moe/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/deepseek-moe/README.md
index 365bc6a8..ece21c6f 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/deepseek-moe/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/deepseek-moe/README.md
@@ -1,32 +1,32 @@
 # DeepSeek-MoE
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on DeepSeek-MoE models. For illustration purposes, we utilize the [deepseek-ai/deepseek-moe-16b-chat](https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat) as a reference DeepSeek-MoE model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on DeepSeek-MoE models. For illustration purposes, we utilize the [deepseek-ai/deepseek-moe-16b-chat](https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat) as a reference DeepSeek-MoE model.
 
 > **Note**: If you want to download the Hugging Face *Transformers* model, please refer to [here](https://huggingface.co/docs/hub/models-downloading#using-git).
 >
-> BigDL-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
+> IPEX-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a DeepSeek-MoE model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a DeepSeek-MoE model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install einops  # additional package required for DeepSeek-MoE to conduct generation
 ```
 
 ### 2. Run
 After setting up the Python environment, you could run the example by following steps.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the DeepSeek-MoE model based on the capabilities of your machine.
 
@@ -45,8 +45,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/deepseek/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/deepseek/README.md
index 936ade9b..232ca8be 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/deepseek/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/deepseek/README.md
@@ -1,18 +1,18 @@
 # Deepseek
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Deepseek models. For illustration purposes, we utilize the [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) as a reference Deepseek model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Deepseek models. For illustration purposes, we utilize the [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) as a reference Deepseek model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Deepseek model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Deepseek model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 ```
 
 ### 2. Run
@@ -25,7 +25,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Deepseek model based on the capabilities of your machine.
 
@@ -40,8 +40,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/distil-whisper/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/distil-whisper/README.md
index 35086459..882671c6 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/distil-whisper/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/distil-whisper/README.md
@@ -1,21 +1,21 @@
 # Distil-Whisper
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Distil-Whisper models. For illustration purposes, we utilize the [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2) as a reference Distil-Whisper model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Distil-Whisper models. For illustration purposes, we utilize the [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2) as a reference Distil-Whisper model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Recognize Tokens using `generate()` API
-In the example [recognize.py](./recognize.py), we show a basic use case for a Distil-Whisper model to conduct transcription using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [recognize.py](./recognize.py), we show a basic use case for a Distil-Whisper model to conduct transcription using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install datasets soundfile librosa # required by audio processing
 ```
 
@@ -31,7 +31,7 @@ Arguments info:
 - `--chunk-length CHUNK_LENGTH`: argument defining the maximum number of chuncks of sampling_rate samples used to trim and pad longer or shorter audio sequences. For audio recordings less than 30 seconds, it can be set to 0 for better performance. It is default to be 15.
 - `--batch-size BATCH_SIZE`: argument defining the batch_size of pipeline inference, it usually equals of length of the audio divided by chunk-length. It is default to be 16.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Distil-Whisper model based on the capabilities of your machine.
 
@@ -47,8 +47,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v1/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v1/README.md
index 684c3f81..d59677ba 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v1/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v1/README.md
@@ -1,18 +1,18 @@
 # Dolly v1
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Dolly v1 models. For illustration purposes, we utilize the [databricks/dolly-v1-6b](https://huggingface.co/databricks/dolly-v1-6b) as a reference Dolly v1 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Dolly v1 models. For illustration purposes, we utilize the [databricks/dolly-v1-6b](https://huggingface.co/databricks/dolly-v1-6b) as a reference Dolly v1 model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Dolly v1 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Dolly v1 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 ```
 
 ### 2. Run
@@ -25,7 +25,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Dolly v1 model based on the capabilities of your machine.
 
@@ -40,8 +40,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v2/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v2/README.md
index 26a66020..219e13ee 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v2/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v2/README.md
@@ -1,18 +1,18 @@
 # Dolly v2
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Dolly v2 models. For illustration purposes, we utilize the [databricks/dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b) as a reference Dolly v2 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Dolly v2 models. For illustration purposes, we utilize the [databricks/dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b) as a reference Dolly v2 model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Dolly v2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Dolly v2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 ```
 
 ### 2. Run
@@ -25,7 +25,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Dolly v2 model based on the capabilities of your machine.
 
@@ -40,8 +40,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/falcon/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/falcon/README.md
index 1c3ba984..20a19a76 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/falcon/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/falcon/README.md
@@ -1,24 +1,24 @@
 # Falcon
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Falcon models. For illustration purposes, we utilize the [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) and [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) as reference Falcon models.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Falcon models. For illustration purposes, we utilize the [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) and [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) as reference Falcon models.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Falcon model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Falcon model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 pip install einops # additional package required for falcon-7b-instruct and falcon-40b-instruct to conduct generation
 ```
 
 ### 2. (Optional) Download Model and Replace File
-If you select the Falcon models ([tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) or [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct)), please note that their code (`modelling_RW.py`) does not support KV cache at the moment. To address issue, we have provided two updated files ([falcon-7b-instruct/modelling_RW.py](./falcon-7b-instruct/modelling_RW.py) and [falcon-40b-instruct/modelling_RW.py](./falcon-40b-instruct/modelling_RW.py)), which can be used to achieve the best performance using BigDL-LLM INT4 optimizations with KV cache support.
+If you select the Falcon models ([tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) or [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct)), please note that their code (`modelling_RW.py`) does not support KV cache at the moment. To address issue, we have provided two updated files ([falcon-7b-instruct/modelling_RW.py](./falcon-7b-instruct/modelling_RW.py) and [falcon-40b-instruct/modelling_RW.py](./falcon-40b-instruct/modelling_RW.py)), which can be used to achieve the best performance using IPEX-LLM INT4 optimizations with KV cache support.
 After transformers 4.36, only transformer models are supported since remote code diverges from transformer model code, make sure set `trust_remote_code=False`.
 ```python
  model = AutoModelForCausalLM.from_pretrained(model_path,
@@ -60,7 +60,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Falcon model based on the capabilities of your machine.
 
@@ -75,8 +75,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/flan-t5/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/flan-t5/README.md
index 9af2109e..2d102180 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/flan-t5/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/flan-t5/README.md
@@ -1,27 +1,27 @@
 # Flan-t5
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Flan-t5 models. For illustration purposes, we utilize the [google/flan-t5-xxl](https://huggingface.co/google/flan-t5-xxl) as a reference Flan-t5 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Flan-t5 models. For illustration purposes, we utilize the [google/flan-t5-xxl](https://huggingface.co/google/flan-t5-xxl) as a reference Flan-t5 model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Flan-t5 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Flan-t5 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 ```
 
 ### 2. Run
 After setting up the Python environment, you could run the example by following steps.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Flan-t5 model based on the capabilities of your machine.
 
@@ -37,8 +37,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/fuyu/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/fuyu/README.md
index 5121474e..e54a8546 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/fuyu/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/fuyu/README.md
@@ -1,20 +1,20 @@
 # Fuyu
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Fuyu models. For illustration purposes, we utilize the [adept/fuyu-8b](https://huggingface.co/adept/fuyu-8b) as a reference Fuyu model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Fuyu models. For illustration purposes, we utilize the [adept/fuyu-8b](https://huggingface.co/adept/fuyu-8b) as a reference Fuyu model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for an Fuyu model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for an Fuyu model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 
 pip install transformers==4.35 pillow # additional package required for Fuyu to conduct generation
 ```
@@ -22,7 +22,7 @@ pip install transformers==4.35 pillow # additional package required for Fuyu to
 ### 2. Run
 After setting up the Python environment, you could run the example by following steps.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Fuyu model based on the capabilities of your machine.
 
@@ -38,8 +38,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/gemma/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/gemma/README.md
index 672ac2a4..758a60b2 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/gemma/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/gemma/README.md
@@ -1,24 +1,24 @@
 # Gemma
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Google Gemma models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [google/gemma-7b-it ](https://huggingface.co/google/gemma-7b-it) and [google/gemma-2b-it](https://huggingface.co/google/gemma-2b-it) as reference Gemma models.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Google Gemma models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [google/gemma-7b-it ](https://huggingface.co/google/gemma-7b-it) and [google/gemma-2b-it](https://huggingface.co/google/gemma-2b-it) as reference Gemma models.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 **Important: According to Gemma's requirement, please make sure you have installed `transformers==4.38.1` to run the example.**
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Gemma model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Gemma model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 
 # According to Gemma's requirement, please make sure you are using a stable version of Transformers, 4.38.1 or newer.
 pip install transformers==4.38.1
@@ -30,7 +30,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 
 # According to Gemma's requirement, please make sure you are using a stable version of Transformers, 4.38.1 or newer.
 pip install transformers==4.38.1
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm-xcomposer/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm-xcomposer/README.md
index 0b608b34..cb898b32 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm-xcomposer/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm-xcomposer/README.md
@@ -1,20 +1,20 @@
 # InternLM_XComposer
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on InternLM_XComposer models. For illustration purposes, we utilize the [internlm/internlm-xcomposer-vl-7b](https://huggingface.co/internlm/internlm-xcomposer-vl-7b) as a reference InternLM_XComposer model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on InternLM_XComposer models. For illustration purposes, we utilize the [internlm/internlm-xcomposer-vl-7b](https://huggingface.co/internlm/internlm-xcomposer-vl-7b) as a reference InternLM_XComposer model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Multi-turn chat centered around an image using `chat()` API
-In the example [chat.py](./chat.py), we show a basic use case for an InternLM_XComposer model to start a multi-turn chat centered around an image using `chat()` API, with BigDL-LLM INT4 optimizations.
+In the example [chat.py](./chat.py), we show a basic use case for an InternLM_XComposer model to start a multi-turn chat centered around an image using `chat()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 
 pip install accelerate timm==0.4.12 sentencepiece==0.1.99 gradio==3.44.4 markdown2==2.4.10 xlsxwriter==3.1.2 einops # additional package required for InternLM_XComposer to conduct generation
 
@@ -43,7 +43,7 @@ For `internlm/internlm-xcomposer-vl-7b`, you should replace the `modeling_Intern
 ### 3. Run
 After setting up the Python environment, you could run the example by following steps.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the LLaVA model based on the capabilities of your machine.
 
@@ -59,8 +59,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm/README.md
index 3f73833b..b37e342c 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm/README.md
@@ -1,19 +1,19 @@
 # InternLM
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on InternLM models. For illustration purposes, we utilize the [internlm/internlm-chat-7b-8k](https://huggingface.co/internlm/internlm-chat-7b-8k) as a reference InternLM model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on InternLM models. For illustration purposes, we utilize the [internlm/internlm-chat-7b-8k](https://huggingface.co/internlm/internlm-chat-7b-8k) as a reference InternLM model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a InternLM model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a InternLM model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 ```
 
 ### 2. Run
@@ -26,7 +26,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'AI是什么？'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the InternLM model based on the capabilities of your machine.
 
@@ -41,8 +41,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm2/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm2/README.md
index 139dc6f4..c7d8022a 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm2/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm2/README.md
@@ -1,19 +1,19 @@
 # InternLM2
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on InternLM2 models. For illustration purposes, we utilize the [internlm/internlm2-chat-7b](https://huggingface.co/internlm/internlm2-chat-7b) as a reference InternLM2 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on InternLM2 models. For illustration purposes, we utilize the [internlm/internlm2-chat-7b](https://huggingface.co/internlm/internlm2-chat-7b) as a reference InternLM2 model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a InternLM2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a InternLM2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install ipex-llm with 'all' option
 ```
 
 ### 2. Run
@@ -26,7 +26,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'AI是什么?'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the InternLM2 model based on the capabilities of your machine.
 
@@ -41,8 +41,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/llama2/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/llama2/README.md
index e4c7b899..191102eb 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/llama2/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/llama2/README.md
@@ -1,18 +1,18 @@
 # Llama2
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Llama2 models. For illustration purposes, we utilize the [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) and [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as reference Llama2 models.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Llama2 models. For illustration purposes, we utilize the [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) and [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as reference Llama2 models.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 ```
 
 ### 2. Run
@@ -25,7 +25,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Llama2 model based on the capabilities of your machine.
 
@@ -40,8 +40,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mistral/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mistral/README.md
index 13e80deb..40fbd43d 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mistral/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mistral/README.md
@@ -1,22 +1,22 @@
 # Mistral
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Mistral models. For illustration purposes, we utilize the [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) and [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) as reference Mistral models.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Mistral models. For illustration purposes, we utilize the [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) and [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) as reference Mistral models.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 **Important: According to [Mistral Troubleshooting](https://huggingface.co/mistralai/Mistral-7B-v0.1#troubleshooting), please make sure you have installed `transformers==4.34.0` to run the example.**
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Mistral model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Mistral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 
 # Refer to https://huggingface.co/mistralai/Mistral-7B-v0.1#troubleshooting, please make sure you are using a stable version of Transformers, 4.34.0 or newer.
 pip install transformers==4.34.0
@@ -25,7 +25,7 @@ pip install transformers==4.34.0
 ### 2. Run
 After setting up the Python environment, you could run the example by following steps.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Mistral model based on the capabilities of your machine.
 
@@ -41,8 +41,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mixtral/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mixtral/README.md
index 0ed5c788..edd46b62 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mixtral/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mixtral/README.md
@@ -1,24 +1,24 @@
 # Mixtral
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Mixtral models on [Intel CPUs](../README.md). For illustration purposes, we utilize the [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) as a reference Mixtral model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Mixtral models on [Intel CPUs](../README.md). For illustration purposes, we utilize the [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) as a reference Mixtral model.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel CPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM on Intel CPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 **Important: Please make sure you have installed `transformers==4.36.0` to run the example.**
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Mixtral model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel CPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Mixtral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel CPUs.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install PyTorch CPU as default
 pip install torch==2.0.1 --index-url https://download.pytorch.org/whl/cpu
-pip install --pre --upgrade bigdl-llm[all]
+pip install --pre --upgrade ipex-llm[all]
 
 # Please make sure you are using a stable version of Transformers, 4.36.0 or newer.
 pip install transformers==4.36.0
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/moss/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/moss/README.md
index 298a5bfa..749f49e2 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/moss/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/moss/README.md
@@ -1,19 +1,19 @@
 # MOSS
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on MOSS models. For illustration purposes, we utilize the [fnlp/moss-moon-003-sft](https://huggingface.co/fnlp/moss-moon-003-sft) as a reference MOSS model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on MOSS models. For illustration purposes, we utilize the [fnlp/moss-moon-003-sft](https://huggingface.co/fnlp/moss-moon-003-sft) as a reference MOSS model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a MOSS model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a MOSS model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 ```
 
 ### 2. Run
@@ -26,7 +26,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'AI是什么？'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the MOSS model based on the capabilities of your machine.
 
@@ -41,8 +41,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mpt/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mpt/README.md
index aee9f9a6..e70aa2ac 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mpt/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mpt/README.md
@@ -1,18 +1,18 @@
 # MPT
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on MPT models. For illustration purposes, we utilize the [mosaicml/mpt-7b-chat](https://huggingface.co/mosaicml/mpt-7b-chat) and [mosaicml/mpt-30b-chat](https://huggingface.co/mosaicml/mpt-30b-chat) as reference MPT models.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on MPT models. For illustration purposes, we utilize the [mosaicml/mpt-7b-chat](https://huggingface.co/mosaicml/mpt-7b-chat) and [mosaicml/mpt-30b-chat](https://huggingface.co/mosaicml/mpt-30b-chat) as reference MPT models.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for an MPT model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for an MPT model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 pip install einops  # additional package required for mpt-7b-chat and mpt-30b-chat to conduct generation
 ```
 
@@ -26,7 +26,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the MPT model based on the capabilities of your machine.
 
@@ -41,8 +41,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/phi-1_5/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/phi-1_5/README.md
index a6f6017b..7d9ece5b 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/phi-1_5/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/phi-1_5/README.md
@@ -1,32 +1,32 @@
 # phi-1_5
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on phi-1_5 models. For illustration purposes, we utilize the [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5) as a reference phi-1_5 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on phi-1_5 models. For illustration purposes, we utilize the [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5) as a reference phi-1_5 model.
 
 > **Note**: If you want to download the Hugging Face *Transformers* model, please refer to [here](https://huggingface.co/docs/hub/models-downloading#using-git).
 >
-> BigDL-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
+> IPEX-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a phi-1_5 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a phi-1_5 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install einops  # additional package required for phi-1_5 to conduct generation
 ```
 
 ### 2. Run
 After setting up the Python environment, you could run the example by following steps.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the phi-1_5 model based on the capabilities of your machine.
 
@@ -42,8 +42,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/phi-2/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/phi-2/README.md
index 4400e0e5..caf033f3 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/phi-2/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/phi-2/README.md
@@ -1,32 +1,32 @@
 # phi-2
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on phi-2 models. For illustration purposes, we utilize the [microsoft/phi-2](https://huggingface.co/microsoft/phi-2 as a reference phi-2 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on phi-2 models. For illustration purposes, we utilize the [microsoft/phi-2](https://huggingface.co/microsoft/phi-2 as a reference phi-2 model.
 
 > **Note**: If you want to download the Hugging Face *Transformers* model, please refer to [here](https://huggingface.co/docs/hub/models-downloading#using-git).
 >
-> BigDL-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
+> IPEX-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a phi-2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a phi-2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install einops  # additional package required for phi-2 to conduct generation
 ```
 
 ### 2. Run
 After setting up the Python environment, you could run the example by following steps.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the phi-2 model based on the capabilities of your machine.
 
@@ -42,8 +42,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/phixtral/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/phixtral/README.md
index 4382aec0..918c081a 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/phixtral/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/phixtral/README.md
@@ -1,32 +1,32 @@
 # Phixtral-4x2_8
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on phi models. For illustration purposes, we utilize the [microsoft/phixtral-4x2_8](https://huggingface.co/mlabonne/phixtral-4x2_8) as a reference phixtral model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on phi models. For illustration purposes, we utilize the [microsoft/phixtral-4x2_8](https://huggingface.co/mlabonne/phixtral-4x2_8) as a reference phixtral model.
 
 > **Note**: If you want to download the Hugging Face *Transformers* model, please refer to [here](https://huggingface.co/docs/hub/models-downloading#using-git).
 >
-> BigDL-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
+> IPEX-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a phixtral model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a phixtral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install einops  # additional package required for phi to conduct generation
 ```
 
 ### 2. Run
 After setting up the Python environment, you could run the example by following steps.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the phixtral model based on the capabilities of your machine.
 
@@ -42,8 +42,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/phoenix/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/phoenix/README.md
index accd6016..601eb997 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/phoenix/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/phoenix/README.md
@@ -1,19 +1,19 @@
 # Phoenix
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Phoenix models. For illustration purposes, we utilize the [FreedomIntelligence/phoenix-inst-chat-7b](https://huggingface.co/FreedomIntelligence/phoenix-inst-chat-7b) as a reference Phoenix model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Phoenix models. For illustration purposes, we utilize the [FreedomIntelligence/phoenix-inst-chat-7b](https://huggingface.co/FreedomIntelligence/phoenix-inst-chat-7b) as a reference Phoenix model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Phoenix model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Phoenix model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 ```
 
 ### 2. Run
@@ -26,7 +26,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Phoenix model based on the capabilities of your machine.
 
@@ -41,8 +41,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen-vl/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen-vl/README.md
index fb46bd56..bd1b66d4 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen-vl/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen-vl/README.md
@@ -1,20 +1,20 @@
 # Qwen-VL
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Qwen-VL models. For illustration purposes, we utilize the [Qwen/Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) as a reference Qwen-VL model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Qwen-VL models. For illustration purposes, we utilize the [Qwen/Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) as a reference Qwen-VL model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Multimodal chat using `chat()` API
-In the example [chat.py](./chat.py), we show a basic use case for a Qwen-VL model to start a multimodal chat using `chat()` API, with BigDL-LLM INT4 optimizations.
+In the example [chat.py](./chat.py), we show a basic use case for a Qwen-VL model to start a multimodal chat using `chat()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 
 pip install accelerate tiktoken einops transformers_stream_generator==0.0.4 scipy torchvision pillow tensorboard matplotlib # additional package required for Qwen-VL-Chat to conduct generation
 
@@ -35,8 +35,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen/README.md
index 150109b0..ce689b6f 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen/README.md
@@ -1,14 +1,14 @@
 # Qwen
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Qwen models. For illustration purposes, we utilize the [Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat) as a reference Qwen model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Qwen models. For illustration purposes, we utilize the [Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat) as a reference Qwen model.
 
 ## 0. Requirements
 
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
 
-In the example [generate.py](./generate.py), we show a basic use case for a Qwen model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Qwen model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 
 ### 1. Install
 
@@ -18,13 +18,13 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 pip install tiktoken einops transformers_stream_generator  # additional package required for Qwen-7B-Chat to conduct generation
 ```
 
 ### 2. Run
 
-The minimum Qwen model version currently supported by BigDL-LLM is the version on November 30, 2023.
+The minimum Qwen model version currently supported by IPEX-LLM is the version on November 30, 2023.
 
 ```
 python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT
@@ -36,7 +36,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'AI是什么？'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Qwen model based on the capabilities of your machine.
 
@@ -55,8 +55,8 @@ For optimal performance on server, it is recommended to set several environment
 E.g. on Linux,
 
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen1.5/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen1.5/README.md
index 4f115651..52037de5 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen1.5/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen1.5/README.md
@@ -1,19 +1,19 @@
 # Qwen1.5
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Qwen1.5 models. For illustration purposes, we utilize the [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) as a reference Qwen1.5 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Qwen1.5 models. For illustration purposes, we utilize the [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) as a reference Qwen1.5 model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Qwen model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Qwen model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install ipex-llm with 'all' option
 pip install transformers==4.37.0 # install the transformers which support Qwen2
 ```
 
@@ -27,7 +27,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'AI是什么?'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Qwen model based on the capabilities of your machine.
 
@@ -42,8 +42,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/redpajama/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/redpajama/README.md
index 003e76d5..0692286f 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/redpajama/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/redpajama/README.md
@@ -1,19 +1,19 @@
 # RedPajama
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on RedPajama models. For illustration purposes, we utilize the [togethercomputer/RedPajama-INCITE-7B-Chat](https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Chat) as a reference RedPajama model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on RedPajama models. For illustration purposes, we utilize the [togethercomputer/RedPajama-INCITE-7B-Chat](https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Chat) as a reference RedPajama model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a RedPajama model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a RedPajama model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 ```
 
 ### 2. Run
@@ -26,7 +26,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the RedPajama model based on the capabilities of your machine.
 
@@ -41,8 +41,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/replit/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/replit/README.md
index b25e602d..a35ceb73 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/replit/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/replit/README.md
@@ -1,20 +1,20 @@
 # Replit
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Replit models. For illustration purposes, we utilize the [replit/replit-code-v1-3b](https://huggingface.co/replit/replit-code-v1-3b) as a reference Replit model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Replit models. For illustration purposes, we utilize the [replit/replit-code-v1-3b](https://huggingface.co/replit/replit-code-v1-3b) as a reference Replit model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for an Replit model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for an Replit model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 ```
 
 ### 2. Run
@@ -32,8 +32,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/skywork/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/skywork/README.md
index a717b931..53b790f2 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/skywork/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/skywork/README.md
@@ -1,18 +1,18 @@
 # Skywork
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Skywork models. For illustration purposes, we utilize the [Skywork/Skywork-13B-base](https://huggingface.co/Skywork/Skywork-13B-base) as the reference Skywork model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Skywork models. For illustration purposes, we utilize the [Skywork/Skywork-13B-base](https://huggingface.co/Skywork/Skywork-13B-base) as the reference Skywork model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Skywork model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Skywork model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 ```
 
 ### 2. Run
@@ -25,7 +25,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'AI是什么？'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Skywork model based on the capabilities of your machine.
 
@@ -40,8 +40,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/solar/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/solar/README.md
index 063ffe35..cdfe9b8f 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/solar/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/solar/README.md
@@ -1,18 +1,18 @@
 # SOLAR
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on SOLAR models. For illustration purposes, we utilize the [upstage/SOLAR-10.7B-Instruct-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0) as a reference SOLAR model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on SOLAR models. For illustration purposes, we utilize the [upstage/SOLAR-10.7B-Instruct-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0) as a reference SOLAR model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a SOLAR model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a SOLAR model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install transformers==4.35.2 # required by SOLAR
 ```
 
@@ -26,7 +26,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the SOLAR model based on the capabilities of your machine.
 
@@ -41,8 +41,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/starcoder/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/starcoder/README.md
index 1509f5cb..d81e438b 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/starcoder/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/starcoder/README.md
@@ -1,18 +1,18 @@
 # StarCoder
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on StarCoder models. For illustration purposes, we utilize the [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) as a reference StarCoder model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on StarCoder models. For illustration purposes, we utilize the [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) as a reference StarCoder model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for an StarCoder model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for an StarCoder model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 ```
 
 ### 2. Run
@@ -25,7 +25,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'def print_hello_world():'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the StarCoder model based on the capabilities of your machine.
 
@@ -40,8 +40,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/vicuna/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/vicuna/README.md
index 3abd2c43..89604bc6 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/vicuna/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/vicuna/README.md
@@ -1,18 +1,18 @@
 # Vicuna
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Vicuna models. For illustration purposes, we utilize the [lmsys/vicuna-13b-v1.3](https://huggingface.co/lmsys/vicuna-13b-v1.3) and [eachadea/vicuna-7b-1.1](https://huggingface.co/eachadea/vicuna-7b-1.1) as reference Vicuna models.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Vicuna models. For illustration purposes, we utilize the [lmsys/vicuna-13b-v1.3](https://huggingface.co/lmsys/vicuna-13b-v1.3) and [eachadea/vicuna-7b-1.1](https://huggingface.co/eachadea/vicuna-7b-1.1) as reference Vicuna models.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Vicuna model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Vicuna model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 ```
 
 ### 2. Run
@@ -25,7 +25,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Vicuna model based on the capabilities of your machine.
 
@@ -40,8 +40,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/whisper/readme.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/whisper/readme.md
index 5dca740f..d2e957e6 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/whisper/readme.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/whisper/readme.md
@@ -1,19 +1,19 @@
 # Whisper
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Whisper models. For illustration purposes, we utilize the [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) as a reference Whisper model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Whisper models. For illustration purposes, we utilize the [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) as a reference Whisper model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example 1: Recognize Tokens using `generate()` API
-In the example [recognize.py](./recognize.py), we show a basic use case for a Whisper model to conduct transcription using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [recognize.py](./recognize.py), we show a basic use case for a Whisper model to conduct transcription using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 pip install datasets soundfile librosa # required by audio processing
 ```
 
@@ -27,7 +27,7 @@ Arguments info:
 - `--repo-id-or-data-path REPO_ID_OR_DATA_PATH`: argument defining the huggingface repo id for the audio dataset to be downloaded, or the path to the huggingface dataset folder. It is default to be `'hf-internal-testing/librispeech_asr_dummy'`.
 - `--language LANGUAGE`: argument defining language to be transcribed. It is default to be `english`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Whisper model based on the capabilities of your machine.
 
@@ -43,8 +43,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
@@ -62,14 +62,14 @@ Inference time: xxxx s
 
 
 ## Example 2: Recognize Long Segment using `generate()` API
-In the example [long-segment-recognize.py](./long-segment-recognize.py), we show a basic use case for a Whisper model to conduct transcription using `pipeline()` API for long audio input, with BigDL-LLM INT4 optimizations.
+In the example [long-segment-recognize.py](./long-segment-recognize.py), we show a basic use case for a Whisper model to conduct transcription using `pipeline()` API for long audio input, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 pip install datasets soundfile librosa # required by audio processing
 ```
 
@@ -86,7 +86,7 @@ Arguments info:
 - `--chunk-length CHUNK_LENGTH`: argument defining the maximum number of chuncks of sampling_rate samples used to trim and pad longer or shorter audio sequences. It is default to be 30, and chunk-length should not be larger than 30s for whisper model.
 - `--batch-size`: argument defining the batch_size of pipeline inference, it usually equals of length of the audio divided by chunk-length. It is default to be 2.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Whisper model based on the capabilities of your machine.
 
@@ -102,8 +102,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. long segment recognize for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/wizardcoder-python/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/wizardcoder-python/README.md
index 65d052a0..25d6f20e 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/wizardcoder-python/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/wizardcoder-python/README.md
@@ -1,18 +1,18 @@
 # WizardCoder-Python
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on WizardCoder-Python models. For illustration purposes, we utilize the [WizardLM/WizardCoder-Python-7B-V1.0](https://huggingface.co/WizardLM/WizardCoder-Python-7B-V1.0) as a reference WizardCoder-Python model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on WizardCoder-Python models. For illustration purposes, we utilize the [WizardLM/WizardCoder-Python-7B-V1.0](https://huggingface.co/WizardLM/WizardCoder-Python-7B-V1.0) as a reference WizardCoder-Python model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a WizardCoder-Python model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a WizardCoder-Python model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 ```
 
 ### 2. Run
@@ -25,7 +25,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'def print_hello_world():'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `64`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the WizardCoder-Python model based on the capabilities of your machine.
 
@@ -40,8 +40,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/yi/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/yi/README.md
index 2ac2b82e..2205a4af 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/yi/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/yi/README.md
@@ -1,20 +1,20 @@
 # Yi
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Yi models. For illustration purposes, we utilize the [01-ai/Yi-6B](https://huggingface.co/01-ai/Yi-6B) as a reference Yi model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Yi models. For illustration purposes, we utilize the [01-ai/Yi-6B](https://huggingface.co/01-ai/Yi-6B) as a reference Yi model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for an Yi model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for an Yi model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install einops # additional package required for Yi-6B to conduct generation
 ```
 
@@ -28,7 +28,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'AI是什么？'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Yi model based on the capabilities of your machine.
 
@@ -43,8 +43,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/yuan2/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/yuan2/README.md
index 4b39d9ef..05f7a32f 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/yuan2/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/yuan2/README.md
@@ -1,22 +1,22 @@
 # Yuan2
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Yuan2 models. For illustration purposes, we utilize the [IEITYuan/Yuan2-2B-hf](https://huggingface.co/IEITYuan/Yuan2-2B-hf) as a reference Yuan2 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Yuan2 models. For illustration purposes, we utilize the [IEITYuan/Yuan2-2B-hf](https://huggingface.co/IEITYuan/Yuan2-2B-hf) as a reference Yuan2 model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 In addition, you need to modify some files in Yuan2-2B-hf folder, since Flash attention dependency is for CUDA usage and currently cannot be installed on Intel CPUs. To manually turn it off, please refer to [this issue](https://github.com/IEIT-Yuan/Yuan-2.0/issues/92). We also provide two modified files([config.json](yuan2-2B-instruct/config.json) and [yuan_hf_model.py](yuan2-2B-instruct/yuan_hf_model.py)), which can be used to replace the original content in config.json and yuan_hf_model.py.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for an Yuan2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for an Yuan2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install einops # additional package required for Yuan2 to conduct generation
 pip install pandas # additional package required for Yuan2 to conduct generation
 ```
@@ -31,7 +31,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'IEITYuan/Yuan2-2B-hf'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `100`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Yuan2 model based on the capabilities of your machine.
 
@@ -46,8 +46,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/ziya/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/ziya/README.md
index e1f79120..2dfb7adc 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Model/ziya/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/ziya/README.md
@@ -1,32 +1,32 @@
 # Ziya
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Ziya models. For illustration purposes, we utilize the [IDEA-CCNL/Ziya-Coding-34B-v1.0](https://huggingface.co/IDEA-CCNL/Ziya-Coding-34B-v1.0) as a reference Ziya model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Ziya models. For illustration purposes, we utilize the [IDEA-CCNL/Ziya-Coding-34B-v1.0](https://huggingface.co/IDEA-CCNL/Ziya-Coding-34B-v1.0) as a reference Ziya model.
 
 > **Note**: If you want to download the Hugging Face *Transformers* model, please refer to [here](https://huggingface.co/docs/hub/models-downloading#using-git).
 >
-> BigDL-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
+> IPEX-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Ziya model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Ziya model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install einops  # additional package required for Ziya to conduct generation
 ```
 
 ### 2. Run
 After setting up the Python environment, you could run the example by following steps.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Ziya model based on the capabilities of your machine.
 
@@ -42,8 +42,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/More-Data-Types/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/More-Data-Types/README.md
index 6a992c85..d5dc789c 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/More-Data-Types/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/More-Data-Types/README.md
@@ -1,6 +1,6 @@
-# BigDL-LLM Transformers Low-Bit Inference Pipeline for Large Language Model
+# IPEX-LLM Transformers Low-Bit Inference Pipeline for Large Language Model
 
-In this example, we show a pipeline to apply BigDL-LLM low-bit optimizations (including INT8/INT5/INT4) to any Hugging Face Transformers model, and then run inference on the optimized low-bit model.
+In this example, we show a pipeline to apply IPEX-LLM low-bit optimizations (including INT8/INT5/INT4) to any Hugging Face Transformers model, and then run inference on the optimized low-bit model.
 
 ## Prepare Environment
 We suggest using conda to manage environment:
@@ -8,7 +8,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all]
+pip install --pre --upgrade ipex-llm[all]
 ```
 
 ## Run Example
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/README.md
index ebe9f774..b1bf7495 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/README.md
@@ -1,6 +1,6 @@
-# Running Hugging Face Transformers model using BigDL-LLM on Intel CPU
+# Running Hugging Face Transformers model using IPEX-LLM on Intel CPU
 
-This folder contains examples of running any Hugging Face Transformers model on BigDL-LLM (using the standard AutoModel APIs):
+This folder contains examples of running any Hugging Face Transformers model on IPEX-LLM (using the standard AutoModel APIs):
 
 - [Model](Model): examples of running Hugging Face Transformers models (e.g., LLaMA2, ChatGLM2, Falcon, MPT, Baichuan2, etc.) using INT4 optimizations
 - [More-Data-Types](More-Data-Types): examples of applying other low bit optimizations (NF4/INT5/INT8, etc.)
diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Save-Load/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Save-Load/README.md
index 6a992c85..d5dc789c 100644
--- a/python/llm/example/CPU/HF-Transformers-AutoModels/Save-Load/README.md
+++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Save-Load/README.md
@@ -1,6 +1,6 @@
-# BigDL-LLM Transformers Low-Bit Inference Pipeline for Large Language Model
+# IPEX-LLM Transformers Low-Bit Inference Pipeline for Large Language Model
 
-In this example, we show a pipeline to apply BigDL-LLM low-bit optimizations (including INT8/INT5/INT4) to any Hugging Face Transformers model, and then run inference on the optimized low-bit model.
+In this example, we show a pipeline to apply IPEX-LLM low-bit optimizations (including INT8/INT5/INT4) to any Hugging Face Transformers model, and then run inference on the optimized low-bit model.
 
 ## Prepare Environment
 We suggest using conda to manage environment:
@@ -8,7 +8,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all]
+pip install --pre --upgrade ipex-llm[all]
 ```
 
 ## Run Example
diff --git a/python/llm/example/CPU/LangChain/README.md b/python/llm/example/CPU/LangChain/README.md
index 2e64afdd..3382590a 100644
--- a/python/llm/example/CPU/LangChain/README.md
+++ b/python/llm/example/CPU/LangChain/README.md
@@ -1,10 +1,10 @@
 ## Langchain Examples
 
-This folder contains examples showcasing how to use `langchain` with `bigdl`. 
+This folder contains examples showcasing how to use `langchain` with `ipex`. 
 
-### Install BigDL
+### Install IPEX
 
-Ensure `bigdl-llm` is installed by following the [BigDL-LLM Installation Guide](https://github.com/intel-analytics/BigDL/tree/main/python/llm#install). 
+Ensure `ipex-llm` is installed by following the [IPEX-LLM Installation Guide](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm#install). 
 
 ### Install Dependences Required by the Examples
 
@@ -36,7 +36,7 @@ To run the example, execute the following command in the current directory:
 ```bash
 python transformers_int4/rag.py -m <path_to_model> [-q <your_question>] [-i <path_to_input_txt>]
 ```
-> Note: If `-i` is not specified, it will use a short introduction to Big-DL as input by default. if `-q` is not specified, `What is BigDL?` will be used by default. 
+> Note: If `-i` is not specified, it will use a short introduction to Big-DL as input by default. if `-q` is not specified, `What is IPEX?` will be used by default. 
 
 
 ### Example: Math
@@ -70,4 +70,4 @@ python transformers_int4/voiceassistant.py -m <path_to_model> [-q <your_question
 
 ### Legacy (Native INT4 examples)
 
-BigDL also provides langchain integrations using native INT4 mode. Those examples can be foud in [native_int4](./native_int4/) folder. For detailed instructions of settting up and running `native_int4` examples, refer to [Native INT4 Examples README](./README_nativeint4.md). 
+IPEX also provides langchain integrations using native INT4 mode. Those examples can be foud in [native_int4](./native_int4/) folder. For detailed instructions of settting up and running `native_int4` examples, refer to [Native INT4 Examples README](./README_nativeint4.md). 
diff --git a/python/llm/example/CPU/LangChain/README_nativeint4.md b/python/llm/example/CPU/LangChain/README_nativeint4.md
index 06d0294d..c8da1045 100644
--- a/python/llm/example/CPU/LangChain/README_nativeint4.md
+++ b/python/llm/example/CPU/LangChain/README_nativeint4.md
@@ -1,9 +1,9 @@
 # Langchain Native INT4 examples
 
-The examples in [native_int4](./native_int4) folder show how to use langchain with `bigdl-llm` native INT4 mode.
+The examples in [native_int4](./native_int4) folder show how to use langchain with `ipex-llm` native INT4 mode.
 
-## Install bigdl-llm
-Follow the instructions in [Install](https://github.com/intel-analytics/BigDL/tree/main/python/llm#install).
+## Install ipex-llm
+Follow the instructions in [Install](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm#install).
 
 ## Install Required Dependencies for langchain examples. 
 
@@ -14,8 +14,8 @@ pip install -U pandas==2.0.3
 ```
 
 
-## Convert Models using bigdl-llm
-Follow the instructions in [Convert model](https://github.com/intel-analytics/BigDL/tree/main/python/llm#convert-model).
+## Convert Models using ipex-llm
+Follow the instructions in [Convert model](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm#convert-model).
 
 
 ## Run the examples
diff --git a/python/llm/example/CPU/LlamaIndex/README.md b/python/llm/example/CPU/LlamaIndex/README.md
index ce731086..0f65e403 100644
--- a/python/llm/example/CPU/LlamaIndex/README.md
+++ b/python/llm/example/CPU/LlamaIndex/README.md
@@ -1,12 +1,12 @@
 # LlamaIndex Examples
 
 
-This folder contains examples showcasing how to use [**LlamaIndex**](https://github.com/run-llama/llama_index) with `bigdl-llm`.
+This folder contains examples showcasing how to use [**LlamaIndex**](https://github.com/run-llama/llama_index) with `ipex-llm`.
 > [**LlamaIndex**](https://github.com/run-llama/llama_index) is a data framework designed to improve large language models by providing tools for easier data ingestion, management, and application integration. 
 
 ## Prerequisites
 
-Ensure `bigdl-llm` is installed by following the [BigDL-LLM Installation Guide](https://github.com/intel-analytics/BigDL/tree/main/python/llm#install) before proceeding with the examples provided here. 
+Ensure `ipex-llm` is installed by following the [IPEX-LLM Installation Guide](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm#install) before proceeding with the examples provided here. 
 
 
 ## Retrieval-Augmented Generation (RAG) Example
diff --git a/python/llm/example/CPU/ModelScope-Models/README.md b/python/llm/example/CPU/ModelScope-Models/README.md
index ba7b7a02..8be1159d 100644
--- a/python/llm/example/CPU/ModelScope-Models/README.md
+++ b/python/llm/example/CPU/ModelScope-Models/README.md
@@ -1,19 +1,19 @@
 # Run ModelScope Model
 
-In this directory, you will find example on how you could apply BigDL-LLM INT4 optimizations on ModelScope models. For illustration purposes, we utilize the [ZhipuAI/chatglm3-6b](https://modelscope.cn/models/ZhipuAI/chatglm3-6b/summary) as a reference ModelScope model.
+In this directory, you will find example on how you could apply IPEX-LLM INT4 optimizations on ModelScope models. For illustration purposes, we utilize the [ZhipuAI/chatglm3-6b](https://modelscope.cn/models/ZhipuAI/chatglm3-6b/summary) as a reference ModelScope model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM3 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM3 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install ipex-llm with 'all' option
 # Refer to https://github.com/modelscope/modelscope/issues/765, please make sure you are using 1.11.0 version
 pip install modelscope==1.11.0
 ```
@@ -28,7 +28,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'AI是什么？'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the ChatGLM3 model based on the capabilities of your machine.
 
@@ -43,8 +43,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/Native-Models/README.md b/python/llm/example/CPU/Native-Models/README.md
index 873909b6..8a181ce6 100644
--- a/python/llm/example/CPU/Native-Models/README.md
+++ b/python/llm/example/CPU/Native-Models/README.md
@@ -1,8 +1,8 @@
-# BigDL-LLM Native INT4 Inference Pipeline for Large Language Model
+# IPEX-LLM Native INT4 Inference Pipeline for Large Language Model
 
-In this example, we show a pipeline to convert a large language model to BigDL-LLM native INT4 format, and then run inference on the converted INT4 model.
+In this example, we show a pipeline to convert a large language model to IPEX-LLM native INT4 format, and then run inference on the converted INT4 model.
 
-> **Note**: BigDL-LLM native INT4 format currently supports model family **LLaMA** (such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.), **LLaMA 2** (such as Llama-2-7B-chat, Llama-2-13B-chat), **GPT-NeoX** (such as RedPajama), **BLOOM** (such as Phoenix) and **StarCoder**.
+> **Note**: IPEX-LLM native INT4 format currently supports model family **LLaMA** (such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.), **LLaMA 2** (such as Llama-2-7B-chat, Llama-2-13B-chat), **GPT-NeoX** (such as RedPajama), **BLOOM** (such as Phoenix) and **StarCoder**.
 
 ## Prepare Environment
 We suggest using conda to manage environment:
@@ -10,7 +10,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all]
+pip install --pre --upgrade ipex-llm[all]
 ```
 
 ## Run Example
@@ -30,7 +30,7 @@ arguments info:
 ### Model family LLaMA
 #### [lmsys/vicuna-13b-v1.3](https://huggingface.co/lmsys/vicuna-13b-v1.3)
 ```log
---------------------  bigdl-llm based tokenizer  --------------------
+--------------------  ipex-llm based tokenizer  --------------------
 Inference time: xxxx s
 Output:
 ['\n She was always exploring new places and meeting new people.  One day, she stumbled upon a mysterious door in the woods that led her to']
@@ -42,20 +42,20 @@ Output:
 ['\nShe had read so many stories about brave heroes and their magical journeys that she decided to set out on her own adventure. \n']
 --------------------  fast forward  --------------------
 
-bigdl-llm timings:        load time =    xxxx ms
-bigdl-llm timings:      sample time =    xxxx ms /    32 runs   (    xxxx ms per token)
-bigdl-llm timings: prompt eval time =    xxxx ms /     1 tokens (    xxxx ms per token)
-bigdl-llm timings:        eval time =    xxxx ms /    32 runs   (    xxxx ms per token)
-bigdl-llm timings:       total time =    xxxx ms
+ipex-llm timings:        load time =    xxxx ms
+ipex-llm timings:      sample time =    xxxx ms /    32 runs   (    xxxx ms per token)
+ipex-llm timings: prompt eval time =    xxxx ms /     1 tokens (    xxxx ms per token)
+ipex-llm timings:        eval time =    xxxx ms /    32 runs   (    xxxx ms per token)
+ipex-llm timings:       total time =    xxxx ms
 Inference time (fast forward): xxxx s
 Output:
-{'id': 'cmpl-e5811030-cc60-462b-9857-13d43e3a1896', 'object': 'text_completion', 'created': 1690450682, 'model': './bigdl_llm_llama_q4_0.bin', 'choices': [{'text': '\nShe was a curious and brave child, always eager to explore the world around her. She loved nothing more than setting off into the woods or down to the', 'index': 0, 'logprobs': None, 'finish_reason': 'length'}], 'usage': {'prompt_tokens': 19, 'completion_tokens': 32, 'total_tokens': 51}}
+{'id': 'cmpl-e5811030-cc60-462b-9857-13d43e3a1896', 'object': 'text_completion', 'created': 1690450682, 'model': './ipex_llm_llama_q4_0.bin', 'choices': [{'text': '\nShe was a curious and brave child, always eager to explore the world around her. She loved nothing more than setting off into the woods or down to the', 'index': 0, 'logprobs': None, 'finish_reason': 'length'}], 'usage': {'prompt_tokens': 19, 'completion_tokens': 32, 'total_tokens': 51}}
 ```
 
 ### Model family LLaMA 2
 #### [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
 ```log
---------------------  bigdl-llm based tokenizer  --------------------
+--------------------  ipex-llm based tokenizer  --------------------
 Inference time: xxxx s
 Output:
 [' She lived in a small village surrounded by vast fields of golden wheat and blue skies.  One day, she decided to go on an adventure to']
@@ -69,20 +69,20 @@ Output:
 --------------------  fast forward  --------------------
 Llama.generate: prefix-match hit
 
-bigdl-llm timings:        load time =     xxxx ms
-bigdl-llm timings:      sample time =     xxxx ms /    32 runs   (    xxxx ms per token)
-bigdl-llm timings: prompt eval time =     xxxx ms /     1 tokens (    xxxx ms per token)
-bigdl-llm timings:        eval time =     xxxx ms /    32 runs   (    xxxx ms per token)
-bigdl-llm timings:       total time =     xxxx ms
+ipex-llm timings:        load time =     xxxx ms
+ipex-llm timings:      sample time =     xxxx ms /    32 runs   (    xxxx ms per token)
+ipex-llm timings: prompt eval time =     xxxx ms /     1 tokens (    xxxx ms per token)
+ipex-llm timings:        eval time =     xxxx ms /    32 runs   (    xxxx ms per token)
+ipex-llm timings:       total time =     xxxx ms
 Inference time (fast forward): xxxx s
 Output:
-{'id': 'cmpl-556b831b-749f-4b06-801e-c920620cb8f5', 'object': 'text_completion', 'created': 1690449478, 'model': './bigdl_llm_llama_q4_0.bin', 'choices': [{'text': ' She lived in a small village at the edge of a big forest, surrounded by tall trees and sparkling streams.  One day, while wandering around the', 'index': 0, 'logprobs': None, 'finish_reason': 'length'}], 'usage': {'prompt_tokens': 19, 'completion_tokens': 32, 'total_tokens': 51}}
+{'id': 'cmpl-556b831b-749f-4b06-801e-c920620cb8f5', 'object': 'text_completion', 'created': 1690449478, 'model': './ipex_llm_llama_q4_0.bin', 'choices': [{'text': ' She lived in a small village at the edge of a big forest, surrounded by tall trees and sparkling streams.  One day, while wandering around the', 'index': 0, 'logprobs': None, 'finish_reason': 'length'}], 'usage': {'prompt_tokens': 19, 'completion_tokens': 32, 'total_tokens': 51}}
 ```
 
 ### Model family GPT-NeoX
 #### [togethercomputer/RedPajama-INCITE-7B-Chat](https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Chat)
 ```log
---------------------  bigdl-llm based tokenizer  --------------------
+--------------------  ipex-llm based tokenizer  --------------------
 Inference time: xxxx s
 Output:
 ['\nThis was no surprise since her mom and dad both loved adventure too. But what really stood out about this little girl is that she loved the stories! Her']
@@ -101,13 +101,13 @@ gptneox_print_timings:        eval time =     xxxx ms /    31 runs   (    xxxx m
 gptneox_print_timings:       total time =     xxxx ms
 Inference time (fast forward): xxxx s
 Output:
-{'id': 'cmpl-8b17585d-635a-43af-94a0-bd9c19ffc5a8', 'object': 'text_completion', 'created': 1690451587, 'model': './bigdl_llm_gptneox_q4_0.bin', 'choices': [{'text': '\nOn one fine day her mother brought home an old shoe box full of toys and gave it to her daughter as she was not able to make the toy house', 'index': 0, 'logprobs': None, 'finish_reason': 'length'}], 'usage': {'prompt_tokens': 18, 'completion_tokens': 32, 'total_tokens': 50}}
+{'id': 'cmpl-8b17585d-635a-43af-94a0-bd9c19ffc5a8', 'object': 'text_completion', 'created': 1690451587, 'model': './ipex_llm_gptneox_q4_0.bin', 'choices': [{'text': '\nOn one fine day her mother brought home an old shoe box full of toys and gave it to her daughter as she was not able to make the toy house', 'index': 0, 'logprobs': None, 'finish_reason': 'length'}], 'usage': {'prompt_tokens': 18, 'completion_tokens': 32, 'total_tokens': 50}}
 ```
 
 ### Model family BLOOM
 #### [FreedomIntelligence/phoenix-inst-chat-7b](https://huggingface.co/FreedomIntelligence/phoenix-inst-chat-7b)
 ```log
---------------------  bigdl-llm based tokenizer  --------------------
+--------------------  ipex-llm based tokenizer  --------------------
 Inference time: xxxx s
 Output:
 [' She was always eager to explore new places and meet new people. One day, she decided to embark on an epic journey across the land of the giants']
@@ -127,13 +127,13 @@ inference:     predict time =     xxxx ms / 31 tokens / xxxx ms per token
 inference:       total time =     xxxx ms
 Inference time (fast forward): xxxx s
 Output:
-{'id': 'cmpl-e7039a29-dc80-4729-a446-301573a5315f', 'object': 'text_completion', 'created': 1690449783, 'model': './bigdl_llm_bloom_q4_0.bin', 'choices': [{'text': ' She had the spirit of exploration, and her adventurous nature drove her to seek out new things every day. Little did she know that her adventures would take an', 'index': 0, 'logprobs': None, 'finish_reason': None}], 'usage': {'prompt_tokens': 17, 'completion_tokens': 32, 'total_tokens': 49}}
+{'id': 'cmpl-e7039a29-dc80-4729-a446-301573a5315f', 'object': 'text_completion', 'created': 1690449783, 'model': './ipex_llm_bloom_q4_0.bin', 'choices': [{'text': ' She had the spirit of exploration, and her adventurous nature drove her to seek out new things every day. Little did she know that her adventures would take an', 'index': 0, 'logprobs': None, 'finish_reason': None}], 'usage': {'prompt_tokens': 17, 'completion_tokens': 32, 'total_tokens': 49}}
 ```
 
 ### Model family StarCoder
 #### [bigcode/starcoder](https://huggingface.co/bigcode/starcoder)
 ```log
---------------------  bigdl-llm based tokenizer  --------------------
+--------------------  ipex-llm based tokenizer  --------------------
 Inference time: xxxx s
 Output:
 ['\nOne day, she went on an adventure with a dragon. \nThe dragon was very angry, and he wanted to eat her.']
@@ -146,12 +146,12 @@ Output:
 --------------------  fast forward  --------------------
 
 
-bigdl-llm:    mem per token =     xxxx bytes
-bigdl-llm:      sample time =     xxxx ms
-bigdl-llm: evel prompt time =     xxxx ms / 11 tokens / xxxx ms per token
-bigdl-llm:     predict time =     xxxx ms / 31 tokens / xxxx ms per token
-bigdl-llm:       total time =     xxxx ms
+ipex-llm:    mem per token =     xxxx bytes
+ipex-llm:      sample time =     xxxx ms
+ipex-llm: evel prompt time =     xxxx ms / 11 tokens / xxxx ms per token
+ipex-llm:     predict time =     xxxx ms / 31 tokens / xxxx ms per token
+ipex-llm:       total time =     xxxx ms
 Inference time (fast forward): xxxx s
 Output:
-{'id': 'cmpl-d0266eb2-5e18-4fbc-bcc4-dec236f506f6', 'object': 'text_completion', 'created': 1690450075, 'model': './bigdl_llm_starcoder_q4_0.bin', 'choices': [{'text': ' She loved to play with dolls and other stuff, but she loved the most to play with cats and other dogs.  She loved to', 'index': 0, 'logprobs': None, 'finish_reason': None}], 'usage': {'prompt_tokens': 21, 'completion_tokens': 32, 'total_tokens': 53}}
+{'id': 'cmpl-d0266eb2-5e18-4fbc-bcc4-dec236f506f6', 'object': 'text_completion', 'created': 1690450075, 'model': './ipex_llm_starcoder_q4_0.bin', 'choices': [{'text': ' She loved to play with dolls and other stuff, but she loved the most to play with cats and other dogs.  She loved to', 'index': 0, 'logprobs': None, 'finish_reason': None}], 'usage': {'prompt_tokens': 21, 'completion_tokens': 32, 'total_tokens': 53}}
 ```
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/README.md b/python/llm/example/CPU/PyTorch-Models/Model/README.md
index 0a09bf2e..189eefca 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/README.md
@@ -1,14 +1,14 @@
-# BigDL-LLM INT4 Optimization for Large Language Model
-You can use `optimize_model` API to accelerate general PyTorch models on Intel servers and PCs. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
+# IPEX-LLM INT4 Optimization for Large Language Model
+You can use `optimize_model` API to accelerate general PyTorch models on Intel servers and PCs. This directory contains example scripts to help you quickly get started using IPEX-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
 
 ## Recommended Requirements
 To run the examples, we recommend using Intel® Xeon® processors (server), or >= 12th Gen Intel® Core™ processor (client).
 
-For OS, BigDL-LLM supports Ubuntu 20.04 or later, CentOS 7 or later, and Windows 10/11.
+For OS, IPEX-LLM supports Ubuntu 20.04 or later, CentOS 7 or later, and Windows 10/11.
 
 ## Best Known Configuration on Linux
-For better performance, it is recommended to set environment variables on Linux with the help of BigDL-LLM:
+For better performance, it is recommended to set environment variables on Linux with the help of IPEX-LLM:
 ```bash
-pip install bigdl-llm
-source bigdl-llm-init
+pip install ipex-llm
+source ipex-llm-init
 ```
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/aquila2/README.md b/python/llm/example/CPU/PyTorch-Models/Model/aquila2/README.md
index 82c8c15d..67189d60 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/aquila2/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/aquila2/README.md
@@ -1,20 +1,20 @@
 # Aquila2
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Aquila2 models. For illustration purposes, we utilize the [BAAI/AquilaChat2-7B](https://huggingface.co/BAAI/AquilaChat2-7B) as reference Aquila2 models.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Aquila2 models. For illustration purposes, we utilize the [BAAI/AquilaChat2-7B](https://huggingface.co/BAAI/AquilaChat2-7B) as reference Aquila2 models.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Aquila2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Aquila2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 ```
 
 ### 2. Run
@@ -32,8 +32,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/bark/README.md b/python/llm/example/CPU/PyTorch-Models/Model/bark/README.md
index da660e7c..5800acae 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/bark/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/bark/README.md
@@ -1,20 +1,20 @@
 # Bark
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Bark models. For illustration purposes, we utilize the [suno/bark](https://huggingface.co/suno/bark) as reference Bark models.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Bark models. For illustration purposes, we utilize the [suno/bark](https://huggingface.co/suno/bark) as reference Bark models.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Synthesize speech with the given input text
-In the example [synthesize_speech.py](./synthesize_speech.py), we show a basic use case for Bark model to synthesize speech based on the given text, with BigDL-LLM INT4 optimizations.
+In the example [synthesize_speech.py](./synthesize_speech.py), we show a basic use case for Bark model to synthesize speech based on the given text, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install TTS scipy
 ```
 
@@ -45,8 +45,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/bert/README.md b/python/llm/example/CPU/PyTorch-Models/Model/bert/README.md
index d5a093f0..2bbbe626 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/bert/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/bert/README.md
@@ -1,20 +1,20 @@
 # BERT
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate BERT models. For illustration purposes, we utilize the [bert-large-uncased](https://huggingface.co/bert-large-uncased) as reference BERT models.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate BERT models. For illustration purposes, we utilize the [bert-large-uncased](https://huggingface.co/bert-large-uncased) as reference BERT models.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Extract the feature of given text
-In the example [extract_feature.py](./extract_feature.py), we show a basic use case for a BERT model to extract the feature of given text, with BigDL-LLM INT4 optimizations.
+In the example [extract_feature.py](./extract_feature.py), we show a basic use case for a BERT model to extract the feature of given text, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 ```
 
 ### 2. Run
@@ -32,8 +32,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/bluelm/README.md b/python/llm/example/CPU/PyTorch-Models/Model/bluelm/README.md
index f530397a..437d9834 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/bluelm/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/bluelm/README.md
@@ -1,20 +1,20 @@
 # BlueLM
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate BlueLM models. For illustration purposes, we utilize the [vivo-ai/BlueLM-7B-Chat](https://huggingface.co/vivo-ai/BlueLM-7B-Chat) as a reference BlueLM model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate BlueLM models. For illustration purposes, we utilize the [vivo-ai/BlueLM-7B-Chat](https://huggingface.co/vivo-ai/BlueLM-7B-Chat) as a reference BlueLM model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a BlueLM model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a BlueLM model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 ```
 
 ### 2. Run
@@ -32,8 +32,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/chatglm/README.md b/python/llm/example/CPU/PyTorch-Models/Model/chatglm/README.md
index eeec312a..35a15620 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/chatglm/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/chatglm/README.md
@@ -1,20 +1,20 @@
 # ChatGLM
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate ChatGLM models. For illustration purposes, we utilize the [THUDM/chatglm-6b](https://huggingface.co/THUDM/chatglm-6b) as a reference ChatGLM model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate ChatGLM models. For illustration purposes, we utilize the [THUDM/chatglm-6b](https://huggingface.co/THUDM/chatglm-6b) as a reference ChatGLM model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 ```
 
 ### 2. Run
@@ -32,8 +32,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/chatglm3/README.md b/python/llm/example/CPU/PyTorch-Models/Model/chatglm3/README.md
index 56c589fa..195fb0ee 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/chatglm3/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/chatglm3/README.md
@@ -1,20 +1,20 @@
 # ChatGLM3
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate ChatGLM3 models. For illustration purposes, we utilize the [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b) as a reference ChatGLM3 model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate ChatGLM3 models. For illustration purposes, we utilize the [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b) as a reference ChatGLM3 model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM3 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM3 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 ```
 
 ### 2. Run
@@ -32,8 +32,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/codellama/README.md b/python/llm/example/CPU/PyTorch-Models/Model/codellama/README.md
index a776a09a..a97c5bb8 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/codellama/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/codellama/README.md
@@ -1,20 +1,20 @@
 # CodeLlama
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate CodeLlama models. For illustration purposes, we utilize the [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) as reference CodeLlama models.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate CodeLlama models. For illustration purposes, we utilize the [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) as reference CodeLlama models.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a CodeLlama model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a CodeLlama model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install transformers==4.34.1 # CodeLlamaTokenizer is supported in higher version of transformers
 ```
 
@@ -33,8 +33,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/codeshell/README.md b/python/llm/example/CPU/PyTorch-Models/Model/codeshell/README.md
index d00b2d17..1870c4de 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/codeshell/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/codeshell/README.md
@@ -1,20 +1,20 @@
 # CodeShell
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate CodeShell models. For illustration purposes, we utilize the [WisdomShell/CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B ) as a reference CodeShell model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate CodeShell models. For illustration purposes, we utilize the [WisdomShell/CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B ) as a reference CodeShell model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a CodeShell model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a CodeShell model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 ```
 
 ### 2. Run
@@ -32,8 +32,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/deciLM-7b/README.md b/python/llm/example/CPU/PyTorch-Models/Model/deciLM-7b/README.md
index 0731a9b9..15dca0bd 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/deciLM-7b/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/deciLM-7b/README.md
@@ -1,20 +1,20 @@
 # DeciLM-7B
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate DeciLM-7B models. For illustration purposes, we utilize the [Deci/DeciLM-7B-instruct](https://huggingface.co/Deci/DeciLM-7B-instruct) as a reference DeciLM-7B model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate DeciLM-7B models. For illustration purposes, we utilize the [Deci/DeciLM-7B-instruct](https://huggingface.co/Deci/DeciLM-7B-instruct) as a reference DeciLM-7B model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a DeciLM-7B model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a DeciLM-7B model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install transformers==4.35.2 # required by DeciLM-7B
 ```
 
@@ -33,8 +33,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/deepseek-moe/README.md b/python/llm/example/CPU/PyTorch-Models/Model/deepseek-moe/README.md
index adfc8006..feca7acf 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/deepseek-moe/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/deepseek-moe/README.md
@@ -1,20 +1,20 @@
 # DeepSeek-MoE
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate DeepSeek-MoE models. For illustration purposes, we utilize the [deepseek-ai/DeepSeek-MoE-16b-chat](https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat) as a reference DeepSeek-MoE model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate DeepSeek-MoE models. For illustration purposes, we utilize the [deepseek-ai/DeepSeek-MoE-16b-chat](https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat) as a reference DeepSeek-MoE model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a deepseek-moe model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a deepseek-moe model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install einops 
 ```
 
@@ -33,8 +33,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/deepseek/README.md b/python/llm/example/CPU/PyTorch-Models/Model/deepseek/README.md
index ef3c428a..bbbb304e 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/deepseek/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/deepseek/README.md
@@ -1,20 +1,20 @@
 # Deepseek
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Deepseek models. For illustration purposes, we utilize the [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) as a reference Deepseek model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Deepseek models. For illustration purposes, we utilize the [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) as a reference Deepseek model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Deepseek model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Deepseek model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 ```
 
 ### 2. Run
@@ -32,8 +32,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/distil-whisper/README.md b/python/llm/example/CPU/PyTorch-Models/Model/distil-whisper/README.md
index 9aeed650..56efd231 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/distil-whisper/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/distil-whisper/README.md
@@ -1,28 +1,28 @@
 # Distil-Whisper
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Distil-Whisper models. For illustration purposes, we utilize the [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2) as a reference Distil-Whisper model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Distil-Whisper models. For illustration purposes, we utilize the [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2) as a reference Distil-Whisper model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Recognize Tokens using `generate()` API
-In the example [recognize.py](./recognize.py), we show a basic use case for a Distil-Whisper model to conduct transcription using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [recognize.py](./recognize.py), we show a basic use case for a Distil-Whisper model to conduct transcription using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install datasets soundfile librosa # required by audio processing
 ```
 
 ### 2. Run
 After setting up the Python environment, you could run the example by following steps.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Distil-Whisper model based on the capabilities of your machine.
 
@@ -38,8 +38,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/flan-t5/README.md b/python/llm/example/CPU/PyTorch-Models/Model/flan-t5/README.md
index 9af2109e..2d102180 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/flan-t5/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/flan-t5/README.md
@@ -1,27 +1,27 @@
 # Flan-t5
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Flan-t5 models. For illustration purposes, we utilize the [google/flan-t5-xxl](https://huggingface.co/google/flan-t5-xxl) as a reference Flan-t5 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Flan-t5 models. For illustration purposes, we utilize the [google/flan-t5-xxl](https://huggingface.co/google/flan-t5-xxl) as a reference Flan-t5 model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Flan-t5 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Flan-t5 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 ```
 
 ### 2. Run
 After setting up the Python environment, you could run the example by following steps.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Flan-t5 model based on the capabilities of your machine.
 
@@ -37,8 +37,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/fuyu/README.md b/python/llm/example/CPU/PyTorch-Models/Model/fuyu/README.md
index 5121474e..e54a8546 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/fuyu/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/fuyu/README.md
@@ -1,20 +1,20 @@
 # Fuyu
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Fuyu models. For illustration purposes, we utilize the [adept/fuyu-8b](https://huggingface.co/adept/fuyu-8b) as a reference Fuyu model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Fuyu models. For illustration purposes, we utilize the [adept/fuyu-8b](https://huggingface.co/adept/fuyu-8b) as a reference Fuyu model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for an Fuyu model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for an Fuyu model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 
 pip install transformers==4.35 pillow # additional package required for Fuyu to conduct generation
 ```
@@ -22,7 +22,7 @@ pip install transformers==4.35 pillow # additional package required for Fuyu to
 ### 2. Run
 After setting up the Python environment, you could run the example by following steps.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Fuyu model based on the capabilities of your machine.
 
@@ -38,8 +38,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/internlm-xcomposer/README.md b/python/llm/example/CPU/PyTorch-Models/Model/internlm-xcomposer/README.md
index 05ef602e..cedaab04 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/internlm-xcomposer/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/internlm-xcomposer/README.md
@@ -1,20 +1,20 @@
 # InternLM_XComposer
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate InternLM_XComposer models. For illustration purposes, we utilize the [internlm/internlm-xcomposer-vl-7b](https://huggingface.co/internlm/internlm-xcomposer-vl-7b) as a reference InternLM_XComposer model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate InternLM_XComposer models. For illustration purposes, we utilize the [internlm/internlm-xcomposer-vl-7b](https://huggingface.co/internlm/internlm-xcomposer-vl-7b) as a reference InternLM_XComposer model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Multi-turn chat centered around an image using `chat()` API
-In the example [chat.py](./chat.py), we show a basic use case for an InternLM_XComposer model to start a multi-turn chat centered around an image using `chat()` API, with BigDL-LLM 'optimize_model' API.
+In the example [chat.py](./chat.py), we show a basic use case for an InternLM_XComposer model to start a multi-turn chat centered around an image using `chat()` API, with IPEX-LLM 'optimize_model' API.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 
 pip install accelerate timm==0.4.12 sentencepiece==0.1.99 gradio==3.44.4 markdown2==2.4.10 xlsxwriter==3.1.2 einops # additional package required for InternLM_XComposer to conduct generation
 
@@ -43,7 +43,7 @@ For `internlm/internlm-xcomposer-vl-7b`, you should replace the `modeling_Intern
 ### 3. Run
 After setting up the Python environment, you could run the example by following steps.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the LLaVA model based on the capabilities of your machine.
 
@@ -59,8 +59,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/internlm2/README.md b/python/llm/example/CPU/PyTorch-Models/Model/internlm2/README.md
index 1f4b8731..7e55c5f3 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/internlm2/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/internlm2/README.md
@@ -1,20 +1,20 @@
 # InternLM2
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate InternLM2 models. For illustration purposes, we utilize the [internlm/internlm2-chat-7b](https://huggingface.co/internlm/internlm2-chat-7b) as reference InternLM2 model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate InternLM2 models. For illustration purposes, we utilize the [internlm/internlm2-chat-7b](https://huggingface.co/internlm/internlm2-chat-7b) as reference InternLM2 model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a InternLM2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a InternLM2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 ```
 
 ### 2. Run
@@ -32,8 +32,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/llama2/README.md b/python/llm/example/CPU/PyTorch-Models/Model/llama2/README.md
index 65c7947a..a630cc0c 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/llama2/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/llama2/README.md
@@ -1,20 +1,20 @@
 # Llama2
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Llama2 models. For illustration purposes, we utilize the [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) and [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as reference Llama2 models.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Llama2 models. For illustration purposes, we utilize the [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) and [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as reference Llama2 models.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 ```
 
 ### 2. Run
@@ -32,8 +32,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/llava/README.md b/python/llm/example/CPU/PyTorch-Models/Model/llava/README.md
index 32724ffb..d9b2b853 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/llava/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/llava/README.md
@@ -1,21 +1,21 @@
 # LLaVA
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on LLaVA models. For illustration purposes, we utilize the [liuhaotian/llava-v1.5-13b](https://huggingface.co/liuhaotian/llava-v1.5-13b) as a reference LLaVA model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on LLaVA models. For illustration purposes, we utilize the [liuhaotian/llava-v1.5-13b](https://huggingface.co/liuhaotian/llava-v1.5-13b) as a reference LLaVA model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Multi-turn chat centered around an image using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a LLaVA model to start a multi-turn chat centered around an image using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a LLaVA model to start a multi-turn chat centered around an image using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 
 git clone -b v1.1.1 --depth=1 https://github.com/haotian-liu/LLaVA.git # clone the llava libary
 pip install einops # install dependencies required by llava
@@ -26,7 +26,7 @@ cd LLaVA # change the working directory to the LLaVA folder
 ### 2. Run
 After setting up the Python environment, you could run the example by following steps.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the LLaVA model based on the capabilities of your machine.
 
@@ -44,8 +44,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/mamba/README.md b/python/llm/example/CPU/PyTorch-Models/Model/mamba/README.md
index e1d40483..d649ffdb 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/mamba/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/mamba/README.md
@@ -1,20 +1,20 @@
 # Mamba
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Mamba models. For illustration purposes, we utilize the [state-spaces/mamba-1.4b](https://huggingface.co/state-spaces/mamba-1.4b) and [state-spaces/mamba-2.8b](https://huggingface.co/state-spaces/mamba-2.8b) as reference Mamba models.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Mamba models. For illustration purposes, we utilize the [state-spaces/mamba-1.4b](https://huggingface.co/state-spaces/mamba-1.4b) and [state-spaces/mamba-2.8b](https://huggingface.co/state-spaces/mamba-2.8b) as reference Mamba models.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Mamba model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Mamba model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install einops # package required by Mamba
 ```
 
@@ -34,8 +34,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/meta-llama/README.md b/python/llm/example/CPU/PyTorch-Models/Model/meta-llama/README.md
index 305f029f..e3c040fa 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/meta-llama/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/meta-llama/README.md
@@ -1,12 +1,12 @@
 # LlaMA
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on general pytorch models, for example Meta Llama models. **Different from what [Huggingface LlaMA2](../llama2/) example demonstrated, This example directly brings the optimizations of BigDL-LLM to the official LLaMA implementation of which the code style is more flexible.** For illustration purposes, we utilize the [Llama2-7b-Chat](https://ai.meta.com/llama/) as a reference LlaMA model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on general pytorch models, for example Meta Llama models. **Different from what [Huggingface LlaMA2](../llama2/) example demonstrated, This example directly brings the optimizations of IPEX-LLM to the official LLaMA implementation of which the code style is more flexible.** For illustration purposes, we utilize the [Llama2-7b-Chat](https://ai.meta.com/llama/) as a reference LlaMA model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Generating text using a pretrained Llama model
-In the example [example_chat_completion.py](./example_chat_completion.py), we show a basic use case for a Llama model to engage in a conversation with an AI assistant using `chat_completion` API, with BigDL-LLM INT4 optimizations. The process for [example_text_completion.py](./example_text_completion.py) is similar.
+In the example [example_chat_completion.py](./example_chat_completion.py), we show a basic use case for a Llama model to engage in a conversation with an AI assistant using `chat_completion` API, with IPEX-LLM INT4 optimizations. The process for [example_text_completion.py](./example_text_completion.py) is similar.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
@@ -20,7 +20,7 @@ git apply < ../cpu.patch # apply cpu version patch
 pip install -e .
 
 cd -
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 ```
 
 ### 2. Run
@@ -55,8 +55,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-Nano env variables
-source bigdl-nano-init
+# set IPEX-Nano env variables
+source ipex-nano-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/mistral/README.md b/python/llm/example/CPU/PyTorch-Models/Model/mistral/README.md
index 39ad20bd..1f958267 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/mistral/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/mistral/README.md
@@ -1,22 +1,22 @@
 # Mistral
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Mistral models. For illustration purposes, we utilize the [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) and [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) as reference Mistral models.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Mistral models. For illustration purposes, we utilize the [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) and [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) as reference Mistral models.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 **Important: According to [Mistral Troubleshooting](https://huggingface.co/mistralai/Mistral-7B-v0.1#troubleshooting), please make sure you have installed `transformers==4.34.0` to run the example.**
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Mistral model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Mistral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 
 # Refer to https://huggingface.co/mistralai/Mistral-7B-v0.1#troubleshooting, please make sure you are using a stable version of Transformers, 4.34.0 or newer.
 pip install transformers==4.34.0
@@ -37,8 +37,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/mixtral/README.md b/python/llm/example/CPU/PyTorch-Models/Model/mixtral/README.md
index c0846a64..7baa9a4c 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/mixtral/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/mixtral/README.md
@@ -1,24 +1,24 @@
 # Mixtral
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Mixtral models. For illustration purposes, we utilize the [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) as a reference Mixtral model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Mixtral models. For illustration purposes, we utilize the [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) as a reference Mixtral model.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel CPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM on Intel CPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 **Important: Please make sure you have installed `transformers==4.36.0` to run the example.**
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Mixtral model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel CPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Mixtral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel CPUs.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install PyTorch CPU as default
 pip install torch==2.0.1 --index-url https://download.pytorch.org/whl/cpu
-pip install --pre --upgrade bigdl-llm[all]
+pip install --pre --upgrade ipex-llm[all]
 
 # Please make sure you are using a stable version of Transformers, 4.36.0 or newer.
 pip install transformers==4.36.0
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/openai-whisper/readme.md b/python/llm/example/CPU/PyTorch-Models/Model/openai-whisper/readme.md
index 8fa618fc..85f6594a 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/openai-whisper/readme.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/openai-whisper/readme.md
@@ -1,19 +1,19 @@
 # Whisper
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on general pytorch models, for example Openai Whisper models. For illustration purposes, we utilize the [whisper-tiny](https://github.com/openai/whisper/blob/main/model-card.md) as a reference Whisper model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on general pytorch models, for example Openai Whisper models. For illustration purposes, we utilize the [whisper-tiny](https://github.com/openai/whisper/blob/main/model-card.md) as a reference Whisper model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Recognize Tokens using `transcribe()` API
-In the example [recognize.py](./recognize.py), we show a basic use case for a Whisper model to conduct transcription using `transcribe()` API, with BigDL-LLM INT4 optimizations.
+In the example [recognize.py](./recognize.py), we show a basic use case for a Whisper model to conduct transcription using `transcribe()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install ipex-llm[all] # install ipex-llm with 'all' option
 pip install -U openai-whisper
 pip install librosa # required by audio processing
 ```
@@ -28,7 +28,7 @@ Arguments info:
 - `--audio-file AUDIO_FILE`: argument defining the path of the audio file to be recognized.
 - `--language LANGUAGE`: argument defining language to be transcribed. It is default to be `english`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Whisper model based on the capabilities of your machine.
 
@@ -44,8 +44,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/phi-1_5/README.md b/python/llm/example/CPU/PyTorch-Models/Model/phi-1_5/README.md
index f870838c..236cee37 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/phi-1_5/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/phi-1_5/README.md
@@ -1,20 +1,20 @@
 # phi-1_5
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate phi-1_5 models. For illustration purposes, we utilize the [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5) as a reference phi-1_5 model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate phi-1_5 models. For illustration purposes, we utilize the [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5) as a reference phi-1_5 model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a phi-1_5 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a phi-1_5 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install einops 
 ```
 
@@ -33,8 +33,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/phi-2/README.md b/python/llm/example/CPU/PyTorch-Models/Model/phi-2/README.md
index 2d3dd257..c9e8daaf 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/phi-2/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/phi-2/README.md
@@ -1,20 +1,20 @@
 # phi-2
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate phi-2 models. For illustration purposes, we utilize the [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) as a reference phi-2 model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate phi-2 models. For illustration purposes, we utilize the [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) as a reference phi-2 model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a phi-2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a phi-2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install einops 
 ```
 
@@ -33,8 +33,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/phixtral/README.md b/python/llm/example/CPU/PyTorch-Models/Model/phixtral/README.md
index f2bf2412..c3a19031 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/phixtral/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/phixtral/README.md
@@ -1,20 +1,20 @@
 # Phixtral
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Qwen-VL models. For illustration purposes, we utilize the [mlabonne/phixtral-4x2_8](https://huggingface.co/mlabonne/phixtral-4x2_8) as a reference Phixtral model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Qwen-VL models. For illustration purposes, we utilize the [mlabonne/phixtral-4x2_8](https://huggingface.co/mlabonne/phixtral-4x2_8) as a reference Phixtral model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a phixtral model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a phixtral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install einops 
 ```
 
@@ -33,8 +33,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/qwen-vl/README.md b/python/llm/example/CPU/PyTorch-Models/Model/qwen-vl/README.md
index 83acd7d3..57ccdf71 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/qwen-vl/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/qwen-vl/README.md
@@ -1,20 +1,20 @@
 # Qwen-VL
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Qwen-VL models. For illustration purposes, we utilize the [Qwen/Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) as a reference Qwen-VL model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Qwen-VL models. For illustration purposes, we utilize the [Qwen/Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) as a reference Qwen-VL model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Multimodal chat using `chat()` API
-In the example [chat.py](./chat.py), we show a basic use case for a Qwen-VL model to start a multimodal chat using `chat()` API, with BigDL-LLM 'optimize_model' API.
+In the example [chat.py](./chat.py), we show a basic use case for a Qwen-VL model to start a multimodal chat using `chat()` API, with IPEX-LLM 'optimize_model' API.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 
 pip install accelerate tiktoken einops transformers_stream_generator==0.0.4 scipy torchvision pillow tensorboard matplotlib # additional package required for Qwen-VL-Chat to conduct generation
 
@@ -35,8 +35,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/qwen1.5/README.md b/python/llm/example/CPU/PyTorch-Models/Model/qwen1.5/README.md
index 2190be8d..a404cf03 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/qwen1.5/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/qwen1.5/README.md
@@ -1,20 +1,20 @@
 # Qwen1.5
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Qwen1.5 models. For illustration purposes, we utilize the [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) as reference Qwen1.5 model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Qwen1.5 models. For illustration purposes, we utilize the [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) as reference Qwen1.5 model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Qwen1.5 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Qwen1.5 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install transformers==4.37.0 # install transformers which supports Qwen2
 ```
 
@@ -33,8 +33,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/skywork/README.md b/python/llm/example/CPU/PyTorch-Models/Model/skywork/README.md
index 9b1f7d8f..1221f2a3 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/skywork/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/skywork/README.md
@@ -1,20 +1,20 @@
 # Skywork
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Skywork models. For illustration purposes, we utilize the [Skywork/Skywork-13B-base](https://huggingface.co/Skywork/Skywork-13B-base) as the reference Skywork model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Skywork models. For illustration purposes, we utilize the [Skywork/Skywork-13B-base](https://huggingface.co/Skywork/Skywork-13B-base) as the reference Skywork model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Skywork model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Skywork model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 ```
 
 ### 2. Run
@@ -32,8 +32,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/solar/README.md b/python/llm/example/CPU/PyTorch-Models/Model/solar/README.md
index c508828a..0625fb2f 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/solar/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/solar/README.md
@@ -1,20 +1,20 @@
 # SOLAR
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate SOLAR models. For illustration purposes, we utilize the [upstage/SOLAR-10.7B-Instruct-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0) as a reference SOLAR model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate SOLAR models. For illustration purposes, we utilize the [upstage/SOLAR-10.7B-Instruct-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0) as a reference SOLAR model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a SOLAR model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a SOLAR model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install transformers==4.35.2 # required by SOLAR
 ```
 
@@ -33,8 +33,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/wizardcoder-python/README.md b/python/llm/example/CPU/PyTorch-Models/Model/wizardcoder-python/README.md
index 9ebabe22..7cfa8d11 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/wizardcoder-python/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/wizardcoder-python/README.md
@@ -1,20 +1,20 @@
 # WizardCoder-Python
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate WizardCoder-Python models. For illustration purposes, we utilize the [WizardLM/WizardCoder-Python-7B-V1.0](https://huggingface.co/WizardLM/WizardCoder-Python-7B-V1.0) as a reference WizardCoder-Python model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate WizardCoder-Python models. For illustration purposes, we utilize the [WizardLM/WizardCoder-Python-7B-V1.0](https://huggingface.co/WizardLM/WizardCoder-Python-7B-V1.0) as a reference WizardCoder-Python model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a WizardCoder-Python model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a WizardCoder-Python model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 ```
 
 ### 2. Run
@@ -32,8 +32,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/yi/README.md b/python/llm/example/CPU/PyTorch-Models/Model/yi/README.md
index 04a76046..cb4d06a9 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/yi/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/yi/README.md
@@ -1,28 +1,28 @@
 # Yi
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Yi models. For illustration purposes, we utilize the [01-ai/Yi-6B](https://huggingface.co/01-ai/Yi-6B) as a reference Yi model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Yi models. For illustration purposes, we utilize the [01-ai/Yi-6B](https://huggingface.co/01-ai/Yi-6B) as a reference Yi model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Yi model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Yi model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install einops # additional package required for Yi-6B to conduct generation
 ```
 
 ### 2. Run
 After setting up the Python environment, you could run the example by following steps.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Yi model based on the capabilities of your machine.
 
@@ -38,8 +38,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/yuan2/README.md b/python/llm/example/CPU/PyTorch-Models/Model/yuan2/README.md
index 965173a0..c268f7a3 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/yuan2/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/yuan2/README.md
@@ -1,22 +1,22 @@
 # Yuan2
-In this directory, you will find examples on how you could apply BigDL-LLM `optimize_model` API to accelerate Yuan2 models. For illustration purposes, we utilize the [IEITYuan/Yuan2-2B-hf](https://huggingface.co/IEITYuan/Yuan2-2B-hf) as a reference Yuan2 model.
+In this directory, you will find examples on how you could apply IPEX-LLM `optimize_model` API to accelerate Yuan2 models. For illustration purposes, we utilize the [IEITYuan/Yuan2-2B-hf](https://huggingface.co/IEITYuan/Yuan2-2B-hf) as a reference Yuan2 model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 In addition, you need to modify some files in Yuan2-2B-hf folder, since Flash attention dependency is for CUDA usage and currently cannot be installed on Intel CPUs. To manually turn it off, please refer to [this issue](https://github.com/IEIT-Yuan/Yuan-2.0/issues/92). We also provide two modified files([config.json](yuan2-2B-instruct/config.json) and [yuan_hf_model.py](yuan2-2B-instruct/yuan_hf_model.py)), which can be used to replace the original content in config.json and yuan_hf_model.py.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for an Yuan2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for an Yuan2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install einops # additional package required for Yuan2 to conduct generation
 pip install pandas # additional package required for Yuan2 to conduct generation
 ```
@@ -42,8 +42,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/Model/ziya/README.md b/python/llm/example/CPU/PyTorch-Models/Model/ziya/README.md
index 2b6dadcd..2a77221a 100644
--- a/python/llm/example/CPU/PyTorch-Models/Model/ziya/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Model/ziya/README.md
@@ -1,20 +1,20 @@
 # Ziya
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Ziya models. For illustration purposes, we utilize the [IDEA-CCNL/Ziya-Coding-34B-v1.0](https://huggingface.co/IDEA-CCNL/Ziya-Coding-34B-v1.0) as a reference Ziya model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Ziya models. For illustration purposes, we utilize the [IDEA-CCNL/Ziya-Coding-34B-v1.0](https://huggingface.co/IDEA-CCNL/Ziya-Coding-34B-v1.0) as a reference Ziya model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Ziya model to predict the next N tokens using `generate()` API, with BigDL-LLM 'optimize_model' API.
+In the example [generate.py](./generate.py), we show a basic use case for a Ziya model to predict the next N tokens using `generate()` API, with IPEX-LLM 'optimize_model' API.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install einops  # additional package required for Ziya to conduct generation
 ```
 
@@ -32,8 +32,8 @@ For optimal performance on server, it is recommended to set several environment
 
 E.g. on Linux,
 ```bash
-# set BigDL-LLM env variables
-source bigdl-llm-init
+# set IPEX-LLM env variables
+source ipex-llm-init
 
 # e.g. for a server with 48 cores per socket
 export OMP_NUM_THREADS=48
diff --git a/python/llm/example/CPU/PyTorch-Models/More-Data-Types/README.md b/python/llm/example/CPU/PyTorch-Models/More-Data-Types/README.md
index f09a07c2..461cb983 100644
--- a/python/llm/example/CPU/PyTorch-Models/More-Data-Types/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/More-Data-Types/README.md
@@ -1,9 +1,9 @@
-# BigDL-LLM Low Bit Optimization for Large Language Model
+# IPEX-LLM Low Bit Optimization for Large Language Model
 
-In this example, we show how to apply BigDL-LLM low-bit optimizations (including INT8/INT5/INT4) to Llama2 model, and then run inference on the optimized low-bit model.
+In this example, we show how to apply IPEX-LLM low-bit optimizations (including INT8/INT5/INT4) to Llama2 model, and then run inference on the optimized low-bit model.
 
 ## 0. Requirements
-To run this example with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../README.md#system-support) for more information.
+To run this example with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../../README.md#system-support) for more information.
 
 ## Example: Load Model in Low-Bit Optimization
 In the example [generate.py](./generate.py), we show a basic use case of low-bit optimizations (including INT8/INT5/INT4) on a Llama2 model to predict the next N tokens using `generate()` API. By specifying `--low-bit` argument, you could apply other low-bit optimization (e.g. INT8/INT5) on model.
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install ipex-llm with 'all' option
 ```
 
 ### 2. Run
diff --git a/python/llm/example/CPU/PyTorch-Models/README.md b/python/llm/example/CPU/PyTorch-Models/README.md
index 06860d45..ae160ca8 100644
--- a/python/llm/example/CPU/PyTorch-Models/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/README.md
@@ -1,6 +1,6 @@
-# Running PyTorch model using BigDL-LLM on Intel CPU
+# Running PyTorch model using IPEX-LLM on Intel CPU
 
-This folder contains examples of running any PyTorch model on BigDL-LLM (with "one-line code change"):
+This folder contains examples of running any PyTorch model on IPEX-LLM (with "one-line code change"):
 
 - [Model](Model): examples of running PyTorch models (e.g., Openai Whisper, LLaMA2, ChatGLM2, Falcon, MPT, Baichuan2, etc.) using INT4 optimizations
 - [More-Data-Types](More-Data-Types): examples of applying other low bit optimizations (NF4/INT5/INT8, etc.)
diff --git a/python/llm/example/CPU/PyTorch-Models/Save-Load/README.md b/python/llm/example/CPU/PyTorch-Models/Save-Load/README.md
index 16975c92..ae8c0302 100644
--- a/python/llm/example/CPU/PyTorch-Models/Save-Load/README.md
+++ b/python/llm/example/CPU/PyTorch-Models/Save-Load/README.md
@@ -1,9 +1,9 @@
-# Save/Load Low-Bit Models with BigDL-LLM Optimizations
+# Save/Load Low-Bit Models with IPEX-LLM Optimizations
 
-In this example, we show how to save/load model with BigDL-LLM low-bit optimizations (including INT8/INT5/INT4), and then run inference on the optimized low-bit model.
+In this example, we show how to save/load model with IPEX-LLM low-bit optimizations (including INT8/INT5/INT4), and then run inference on the optimized low-bit model.
 
 ## 0. Requirements
-To run this example with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../README.md#system-support) for more information.
+To run this example with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../../README.md#system-support) for more information.
 
 ## Example: Save/Load Model in Low-Bit Optimization
 In the example [generate.py](./generate.py), we show a basic use case of saving/loading model in low-bit optimizations to predict the next N tokens using `generate()` API. Also, saving and loading operations are platform-independent, so you could run it on different platforms.
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install bigdl-llm with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install ipex-llm with 'all' option
 ```
 
 ### 2. Run
diff --git a/python/llm/example/CPU/QLoRA-FineTuning/README.md b/python/llm/example/CPU/QLoRA-FineTuning/README.md
index f8296e40..33543c12 100644
--- a/python/llm/example/CPU/QLoRA-FineTuning/README.md
+++ b/python/llm/example/CPU/QLoRA-FineTuning/README.md
@@ -4,10 +4,10 @@ This example demonstrates how to finetune a llama2-7b model using Big-LLM 4bit o
 
 
 ## Distributed Training Guide
-1. Single node with single socket: [simple example](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/QLoRA-FineTuning#example-finetune-llama2-7b-using-qlora)
-or [alpaca example](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/QLoRA-FineTuning/alpaca-qlora)
-2. [Single node with multiple sockets](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/QLoRA-FineTuning/alpaca-qlora#guide-to-finetuning-qlora-on-one-node-with-multiple-sockets)
-3. [multiple nodes with multiple sockets](https://github.com/intel-analytics/BigDL/blob/main/docker/llm/finetune/qlora/cpu/kubernetes/README.md)
+1. Single node with single socket: [simple example](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/QLoRA-FineTuning#example-finetune-llama2-7b-using-qlora)
+or [alpaca example](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/QLoRA-FineTuning/alpaca-qlora)
+2. [Single node with multiple sockets](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/QLoRA-FineTuning/alpaca-qlora#guide-to-finetuning-qlora-on-one-node-with-multiple-sockets)
+3. [multiple nodes with multiple sockets](https://github.com/intel-analytics/ipex-llm/blob/main/docker/llm/finetune/qlora/cpu/kubernetes/README.md)
 
 ## Example: Finetune llama2-7b using QLoRA
 
@@ -18,7 +18,7 @@ This example is ported from [bnb-4bit-training](https://colab.research.google.co
 ```bash
 conda create -n llm python=3.9
 conda activate llm
-pip install --pre --upgrade bigdl-llm[all]
+pip install --pre --upgrade ipex-llm[all]
 pip install transformers==4.34.0
 pip install peft==0.5.0
 pip install datasets
@@ -27,12 +27,12 @@ pip install bitsandbytes scipy
 ```
 
 ### 2. Finetune model
-If the machine memory is not enough, you can try to set `use_gradient_checkpointing=True` in [here](https://github.com/intel-analytics/BigDL/blob/1747ffe60019567482b6976a24b05079274e7fc8/python/llm/example/CPU/QLoRA-FineTuning/qlora_finetuning_cpu.py#L53C6-L53C6). While gradient checkpointing may improve memory efficiency, it slows training by approximately 20%.
+If the machine memory is not enough, you can try to set `use_gradient_checkpointing=True` in [here](https://github.com/intel-analytics/ipex-llm/blob/1747ffe60019567482b6976a24b05079274e7fc8/python/llm/example/CPU/QLoRA-FineTuning/qlora_finetuning_cpu.py#L53C6-L53C6). While gradient checkpointing may improve memory efficiency, it slows training by approximately 20%.
 We Recommend using micro_batch_size of 8 for better performance using 48cores in this example. You can refer to [this guide](https://huggingface.co/docs/transformers/perf_train_gpu_one) for more details.
-And remember to use `bigdl-llm-init` before you start finetuning, which can accelerate the job.
+And remember to use `ipex-llm-init` before you start finetuning, which can accelerate the job.
 
 ```
-source bigdl-llm-init -t
+source ipex-llm-init -t
 python ./qlora_finetuning_cpu.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --dataset DATASET
 ```
 
@@ -54,7 +54,7 @@ TrainOutput(global_step=200, training_loss=1.0400420665740966, metrics={'train_r
 ```
 
 ### 3. Merge the adapter into the original model
-Using the [export_merged_model.py](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/LLM-Finetuning/QLoRA/export_merged_model.py) to merge.
+Using the [export_merged_model.py](https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/LLM-Finetuning/QLoRA/export_merged_model.py) to merge.
 ```
 python ./export_merged_model.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --adapter_path ./outputs/checkpoint-200 --output_path ./outputs/checkpoint-200-merged
 ```
diff --git a/python/llm/example/CPU/QLoRA-FineTuning/alpaca-qlora/README.md b/python/llm/example/CPU/QLoRA-FineTuning/alpaca-qlora/README.md
index 75ede748..9d2b2b40 100644
--- a/python/llm/example/CPU/QLoRA-FineTuning/alpaca-qlora/README.md
+++ b/python/llm/example/CPU/QLoRA-FineTuning/alpaca-qlora/README.md
@@ -1,13 +1,13 @@
 # Alpaca QLoRA Finetuning (experimental support)
 
-This example ports [Alpaca-LoRA](https://github.com/tloen/alpaca-lora/tree/main) to BigDL-LLM QLoRA on [Intel CPUs](../../README.md).
+This example ports [Alpaca-LoRA](https://github.com/tloen/alpaca-lora/tree/main) to IPEX-LLM QLoRA on [Intel CPUs](../../README.md).
 
 ### 1. Install
 
 ```bash
 conda create -n llm python=3.9
 conda activate llm
-pip install --pre --upgrade bigdl-llm[all]
+pip install --pre --upgrade ipex-llm[all]
 pip install datasets transformers==4.35.0
 pip install fire peft==0.5.0
 pip install accelerate==0.23.0
@@ -17,7 +17,7 @@ pip install bitsandbytes scipy
 ### 2. Configures environment variables
 
 ```bash
-source bigdl-llm-init -t
+source ipex-llm-init -t
 ```
 
 ### 3. Finetuning LLaMA-2-7B on a node:
@@ -28,7 +28,7 @@ Example usage:
 python ./alpaca_qlora_finetuning_cpu.py \
     --base_model "meta-llama/Llama-2-7b-hf" \
     --data_path "yahma/alpaca-cleaned" \
-    --output_dir "./bigdl-qlora-alpaca"
+    --output_dir "./ipex-qlora-alpaca"
 ```
 
 **Note**: You could also specify `--base_model` to the local path of the huggingface model checkpoint folder and `--data_path` to the local path of the dataset JSON file.
@@ -68,7 +68,7 @@ bash finetune_one_node_two_sockets.sh
 
 Now the prompter is for the datasets with `instruction` `input`(optional) and `output`. If you want to use different datasets,
 you can add template file xxx.json in templates. And then update utils.prompter.py's `generate_prompt` method and update `generate_and_tokenize_prompt` method to fix the dataset.
-For example, I want to train llama2-7b with [english_quotes](https://huggingface.co/datasets/Abirate/english_quotes) just like [this example](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/CPU/QLoRA-FineTuning/qlora_finetuning_cpu.py)
+For example, I want to train llama2-7b with [english_quotes](https://huggingface.co/datasets/Abirate/english_quotes) just like [this example](https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/CPU/QLoRA-FineTuning/qlora_finetuning_cpu.py)
 
 1. add template english_quotes.json
 
@@ -109,7 +109,7 @@ def generate_and_tokenize_prompt(data_point):
 python ./quotes_qlora_finetuning_cpu.py \
     --base_model "meta-llama/Llama-2-7b-hf" \
     --data_path "./english_quotes" \
-    --output_dir "./bigdl-qlora-alpaca" \
+    --output_dir "./ipex-qlora-alpaca" \
     --prompt_template_name "english_quotes"
 ```
 
@@ -143,7 +143,7 @@ lora_target_modules: List[str] = ["W_pack"]
 5. (Only for baichuan) According to this [issue](https://github.com/baichuan-inc/Baichuan2/issues/204#issuecomment-1774372008),
    need to modify the [tokenization_baichuan.py](https://huggingface.co/baichuan-inc/Baichuan-7B/blob/main/tokenization_baichuan.py#L74) to fix issue.
 6. finetune as normal
-7. Using the [export_merged_model.py](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/LLM-Finetuning/QLoRA/export_merged_model.py) to merge. But also need to update tokenizer and model to ensure successful merge weight.
+7. Using the [export_merged_model.py](https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/LLM-Finetuning/QLoRA/export_merged_model.py) to merge. But also need to update tokenizer and model to ensure successful merge weight.
 
 ```bash
 from transformers import AutoTokenizer  # noqa: F402
@@ -153,6 +153,6 @@ base_model = AutoModelForCausalLM.from_pretrained(base_model,trust_remote_code=T
 
 ### 4. Finetuning in docker and multiple nodes (k8s)
 
-If you want to run multi-process fine-tuning, or do not want to manually install the above dependencies, we provide a docker solution to quickly start a one-container finetuning. Please refer to [here](https://github.com/intel-analytics/BigDL/tree/main/docker/llm/finetune/qlora/cpu/docker#fine-tune-llm-with-bigdl-llm-container).
+If you want to run multi-process fine-tuning, or do not want to manually install the above dependencies, we provide a docker solution to quickly start a one-container finetuning. Please refer to [here](https://github.com/intel-analytics/ipex-llm/tree/main/docker/llm/finetune/qlora/cpu/docker#fine-tune-llm-with-ipex-llm-container).
 
-Moreover, for users with multiple CPU server resources e.g. Xeon series like SPR and ICX, we give a k8s distributed solution, where machines and processor sockets are allowed to collaborate by one click easily. Please refer to [here](https://github.com/intel-analytics/BigDL/blob/main/docker/llm/finetune/qlora/cpu/kubernetes/README.md) for how to run QLoRA on k8s.
+Moreover, for users with multiple CPU server resources e.g. Xeon series like SPR and ICX, we give a k8s distributed solution, where machines and processor sockets are allowed to collaborate by one click easily. Please refer to [here](https://github.com/intel-analytics/ipex-llm/blob/main/docker/llm/finetune/qlora/cpu/kubernetes/README.md) for how to run QLoRA on k8s.
diff --git a/python/llm/example/CPU/QLoRA-FineTuning/alpaca-qlora/finetune_one_node_two_sockets.sh b/python/llm/example/CPU/QLoRA-FineTuning/alpaca-qlora/finetune_one_node_two_sockets.sh
index b4b42ac1..af770e4b 100644
--- a/python/llm/example/CPU/QLoRA-FineTuning/alpaca-qlora/finetune_one_node_two_sockets.sh
+++ b/python/llm/example/CPU/QLoRA-FineTuning/alpaca-qlora/finetune_one_node_two_sockets.sh
@@ -1,7 +1,7 @@
 export MASTER_ADDR=127.0.0.1
 export SOCKET_CORES=48
 
-source bigdl-llm-init -t
+source ipex-llm-init -t
 mpirun -n 2 \
  --bind-to socket \
  -genv OMP_NUM_THREADS=$SOCKET_CORES \
@@ -14,5 +14,5 @@ mpirun -n 2 \
  --max_steps -1 \
  --base_model "meta-llama/Llama-2-7b-hf" \
  --data_path "yahma/alpaca-cleaned" \
- --output_dir "./bigdl-qlora-alpaca"
+ --output_dir "./ipex-qlora-alpaca"
 
diff --git a/python/llm/example/CPU/README.md b/python/llm/example/CPU/README.md
index 2f86dc82..54b78896 100644
--- a/python/llm/example/CPU/README.md
+++ b/python/llm/example/CPU/README.md
@@ -1,17 +1,17 @@
-# BigDL-LLM Examples on Intel CPU
+# IPEX-LLM Examples on Intel CPU
 
-This folder contains examples of running BigDL-LLM on Intel CPU:
+This folder contains examples of running IPEX-LLM on Intel CPU:
 
-- [HF-Transformers-AutoModels](HF-Transformers-AutoModels): running any ***Hugging Face Transformers*** model on BigDL-LLM (using the standard AutoModel APIs)
-- [QLoRA-FineTuning](QLoRA-FineTuning): running ***QLoRA finetuning*** using BigDL-LLM on intel CPUs
-- [vLLM-Serving](vLLM-Serving): running ***vLLM*** serving framework on intel CPUs (with BigDL-LLM low-bit optimized models)
-- [Deepspeed-AutoTP](Deepspeed-AutoTP): running distributed inference using ***DeepSpeed AutoTP*** (with BigDL-LLM low-bit optimized models)
-- [LangChain](LangChain): running ***LangChain*** applications on BigDL-LLM
+- [HF-Transformers-AutoModels](HF-Transformers-AutoModels): running any ***Hugging Face Transformers*** model on IPEX-LLM (using the standard AutoModel APIs)
+- [QLoRA-FineTuning](QLoRA-FineTuning): running ***QLoRA finetuning*** using IPEX-LLM on intel CPUs
+- [vLLM-Serving](vLLM-Serving): running ***vLLM*** serving framework on intel CPUs (with IPEX-LLM low-bit optimized models)
+- [Deepspeed-AutoTP](Deepspeed-AutoTP): running distributed inference using ***DeepSpeed AutoTP*** (with IPEX-LLM low-bit optimized models)
+- [LangChain](LangChain): running ***LangChain*** applications on IPEX-LLM
 - [Applications](Applications): running LLM applications (such as agent, streaming-llm) on BigDl-LLM
-- [PyTorch-Models](PyTorch-Models): running any PyTorch model on BigDL-LLM (with "one-line code change")
+- [PyTorch-Models](PyTorch-Models): running any PyTorch model on IPEX-LLM (with "one-line code change")
 - [Native-Models](Native-Models): converting & running LLM in `llama`/`chatglm`/`bloom`/`gptneox`/`starcoder` model family using native (cpp) implementation
 - [Speculative-Decoding](Speculative-Decoding): running any ***Hugging Face Transformers*** model with ***self-speculative decoding*** on Intel CPUs
-- [ModelScope-Models](ModelScope-Models): running ***ModelScope*** model with BigDL-LLM on Intel CPUs
+- [ModelScope-Models](ModelScope-Models): running ***ModelScope*** model with IPEX-LLM on Intel CPUs
 
 
 ## System Support
@@ -25,8 +25,8 @@ This folder contains examples of running BigDL-LLM on Intel CPU:
 - Windows 10/11, with or without WSL
 
 ## Best Known Configuration on Linux
-For better performance, it is recommended to set environment variables on Linux with the help of BigDL-LLM:
+For better performance, it is recommended to set environment variables on Linux with the help of IPEX-LLM:
 ```bash
-pip install bigdl-llm
-source bigdl-llm-init
+pip install ipex-llm
+source ipex-llm-init
 ```
diff --git a/python/llm/example/CPU/Speculative-Decoding/README.md b/python/llm/example/CPU/Speculative-Decoding/README.md
index 2570530e..de6dcebe 100644
--- a/python/llm/example/CPU/Speculative-Decoding/README.md
+++ b/python/llm/example/CPU/Speculative-Decoding/README.md
@@ -1,15 +1,15 @@
-# Self-Speculative Decoding for Large Language Model BF16 Inference using BigDL-LLM on Intel CPUs
-You can use BigDL-LLM to run BF16 inference for any Huggingface Transformer model with ***self-speculative decoding*** on Intel CPUs. This directory contains example scripts to help you quickly get started to run some popular open-source models using self-speculative decoding. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
+# Self-Speculative Decoding for Large Language Model BF16 Inference using IPEX-LLM on Intel CPUs
+You can use IPEX-LLM to run BF16 inference for any Huggingface Transformer model with ***self-speculative decoding*** on Intel CPUs. This directory contains example scripts to help you quickly get started to run some popular open-source models using self-speculative decoding. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
 
 ## Verified Hardware Platforms
 
 - Intel Xeon SPR server
 
 ## Recommended Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#system-support) for more information. Make sure you have installed `bigdl-llm` before:
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#system-support) for more information. Make sure you have installed `ipex-llm` before:
 
 ```bash
-pip install --pre --upgrade bigdl-llm[all]
+pip install --pre --upgrade ipex-llm[all]
 ```
 
 Moreover, install IPEX 2.1.0, which can be done through `pip install intel_extension_for_pytorch==2.1.0`.
diff --git a/python/llm/example/CPU/Speculative-Decoding/baichuan2/README.md b/python/llm/example/CPU/Speculative-Decoding/baichuan2/README.md
index e5895024..35c0fab6 100644
--- a/python/llm/example/CPU/Speculative-Decoding/baichuan2/README.md
+++ b/python/llm/example/CPU/Speculative-Decoding/baichuan2/README.md
@@ -1,23 +1,23 @@
 # Baichuan2
-In this directory, you will find examples on how you could run Baichuan2 BF16 inference with self-speculative decoding using BigDL-LLM on [Intel CPUs](../README.md). For illustration purposes, we utilize the [baichuan-inc/Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat) and [baichuan-inc/Baichuan2-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat) as reference Baichuan2 models.
+In this directory, you will find examples on how you could run Baichuan2 BF16 inference with self-speculative decoding using IPEX-LLM on [Intel CPUs](../README.md). For illustration purposes, we utilize the [baichuan-inc/Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat) and [baichuan-inc/Baichuan2-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat) as reference Baichuan2 models.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel CPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM on Intel CPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [speculative.py](./speculative.py), we show a basic use case for a Baichuan2 model to predict the next N tokens using `generate()` API, with BigDL-LLM speculative decoding optimizations on Intel CPUs.
+In the example [speculative.py](./speculative.py), we show a basic use case for a Baichuan2 model to predict the next N tokens using `generate()` API, with IPEX-LLM speculative decoding optimizations on Intel CPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
-pip install --pre --upgrade bigdl-llm[all]
+pip install --pre --upgrade ipex-llm[all]
 pip install intel_extension_for_pytorch==2.1.0
 pip install transformers==4.31.0
 ```
 ### 2. Configures high-performing processor environment variables
 ```bash
-source bigdl-llm-init -t
+source ipex-llm-init -t
 export OMP_NUM_THREADS=48 # you can change 48 here to #cores of one processor socket
 ```
 ### 3. Run
@@ -91,7 +91,7 @@ And also replace `tokenization_baichuan.py` file under your model directory with
 Then, you can set `BIGDL_OPT_IPEX=true` to get target model acceleration:
 
 ```bash
-source bigdl-llm-init -t
+source ipex-llm-init -t
 export BIGDL_OPT_IPEX=true
 export OMP_NUM_THREADS=48 # you can change 48 here to #cores of one processor socket
 numactl -C 0-47 -m 0 python ./speculative.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT
diff --git a/python/llm/example/CPU/Speculative-Decoding/chatglm3/README.md b/python/llm/example/CPU/Speculative-Decoding/chatglm3/README.md
index d15bdc74..7d4a2e24 100644
--- a/python/llm/example/CPU/Speculative-Decoding/chatglm3/README.md
+++ b/python/llm/example/CPU/Speculative-Decoding/chatglm3/README.md
@@ -1,19 +1,19 @@
 # ChatGLM3
-In this directory, you will find examples on how you could run ChatGLM3 BF16 infernece with self-speculative decoding using BigDL-LLM on Intel CPUs. For illustration purposes, we utilize the [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b) as a reference ChatGLM3 model.
+In this directory, you will find examples on how you could run ChatGLM3 BF16 infernece with self-speculative decoding using IPEX-LLM on Intel CPUs. For illustration purposes, we utilize the [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b) as a reference ChatGLM3 model.
 
 
 ## Example: Predict Tokens using `generate()` API
-In the example [speculative.py](./speculative.py), we show a basic use case for a ChatGLM3 model to predict the next N tokens using `generate()` API, with BigDL-LLM speculative decoding optimizations on Intel CPUs.
+In the example [speculative.py](./speculative.py), we show a basic use case for a ChatGLM3 model to predict the next N tokens using `generate()` API, with IPEX-LLM speculative decoding optimizations on Intel CPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
-pip install --pre --upgrade bigdl-llm[all]
+pip install --pre --upgrade ipex-llm[all]
 ```
 ### 2. Configures OneAPI environment variables
 ```bash
-source bigdl-llm-init -t
+source ipex-llm-init -t
 export OMP_NUM_THREADS=one_socket_num
 ```
 
@@ -51,4 +51,4 @@ First token latency xx.xxxxs
 
 ### 4. Accelerate with BIGDL_OPT_IPEX
 
-BIGDL_OPT_IPEX can help to accelerate speculative decoding on chatglm3-6b, and please refer to [here](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/Speculative-Decoding/llama2#4-accelerate-with-bigdl_opt_ipex) for a try.
\ No newline at end of file
+BIGDL_OPT_IPEX can help to accelerate speculative decoding on chatglm3-6b, and please refer to [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/Speculative-Decoding/llama2#4-accelerate-with-ipex_opt_ipex) for a try.
\ No newline at end of file
diff --git a/python/llm/example/CPU/Speculative-Decoding/llama2/README.md b/python/llm/example/CPU/Speculative-Decoding/llama2/README.md
index f05ac46b..4f76831d 100644
--- a/python/llm/example/CPU/Speculative-Decoding/llama2/README.md
+++ b/python/llm/example/CPU/Speculative-Decoding/llama2/README.md
@@ -1,22 +1,22 @@
 # LLaMA2
-In this directory, you will find examples on how you could run LLaMA2 BF16 inference with self-speculative decoding using BigDL-LLM on [Intel CPUs](../README.md). For illustration purposes, we utilize the [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) and [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as reference Llama2 models.
+In this directory, you will find examples on how you could run LLaMA2 BF16 inference with self-speculative decoding using IPEX-LLM on [Intel CPUs](../README.md). For illustration purposes, we utilize the [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) and [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as reference Llama2 models.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel CPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM on Intel CPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [speculative.py](./speculative.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with BigDL-LLM speculative decoding optimizations on Intel CPUs.
+In the example [speculative.py](./speculative.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with IPEX-LLM speculative decoding optimizations on Intel CPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
-pip install --pre --upgrade bigdl-llm[all]
+pip install --pre --upgrade ipex-llm[all]
 pip install intel_extension_for_pytorch==2.1.0
 ```
 ### 2. Configures high-performing processor environment variables
 ```bash
-source bigdl-llm-init -t
+source ipex-llm-init -t
 export OMP_NUM_THREADS=48 # you can change 48 here to #cores of one processor socket
 ```
 
@@ -114,7 +114,7 @@ pip install transformers==4.36.2
 After installed IPEX, you can set `BIGDL_OPT_IPEX=true` to get target model acceleration. Currently `Llama2 7b and 13b` are supported.
 
 ```bash
-source bigdl-llm-init -t
+source ipex-llm-init -t
 export BIGDL_OPT_IPEX=true
 export OMP_NUM_THREADS=48 # you can change 48 here to #cores of one processor socket
 numactl -C 0-47 -m 0 python ./speculative.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT
diff --git a/python/llm/example/CPU/Speculative-Decoding/mistral/README.md b/python/llm/example/CPU/Speculative-Decoding/mistral/README.md
index 32e9ec64..0f6c0762 100644
--- a/python/llm/example/CPU/Speculative-Decoding/mistral/README.md
+++ b/python/llm/example/CPU/Speculative-Decoding/mistral/README.md
@@ -1,23 +1,23 @@
 # Mistral
-In this directory, you will find examples on how you could run Mistral BF16 inference with self-speculative decoding using BigDL-LLM on [Intel CPUs](../README.md). For illustration purposes,we utilize the [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) and [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) as reference Mistral models.
+In this directory, you will find examples on how you could run Mistral BF16 inference with self-speculative decoding using IPEX-LLM on [Intel CPUs](../README.md). For illustration purposes,we utilize the [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) and [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) as reference Mistral models.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel CPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM on Intel CPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [speculative.py](./speculative.py), we show a basic use case for a Baichuan2 model to predict the next N tokens using `generate()` API, with BigDL-LLM speculative decoding optimizations on Intel CPUs.
+In the example [speculative.py](./speculative.py), we show a basic use case for a Baichuan2 model to predict the next N tokens using `generate()` API, with IPEX-LLM speculative decoding optimizations on Intel CPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
-pip install --pre --upgrade bigdl-llm[all]
+pip install --pre --upgrade ipex-llm[all]
 pip install intel_extension_for_pytorch==2.1.0
 pip install transformers==4.35.2
 ```
 ### 2. Configures high-performing processor environment variables
 ```bash
-source bigdl-llm-init -t
+source ipex-llm-init -t
 export OMP_NUM_THREADS=48 # you can change 48 here to #cores of one processor socket
 ```
 ### 3. Run
@@ -100,7 +100,7 @@ pip install transformers==4.35.2
 After installed IPEX, you can set `BIGDL_OPT_IPEX=true` to get target model acceleration. Currently `Mistral-7B-Instruct-v0.1 and Mistral-7B-v0.1` are supported.
 
 ```bash
-source bigdl-llm-init -t
+source ipex-llm-init -t
 export BIGDL_OPT_IPEX=true
 export OMP_NUM_THREADS=48 # you can change 48 here to #cores of one processor socket
 numactl -C 0-47 -m 0 python ./speculative.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT
diff --git a/python/llm/example/CPU/Speculative-Decoding/qwen/README.md b/python/llm/example/CPU/Speculative-Decoding/qwen/README.md
index c2b7dfc7..e00d73f7 100644
--- a/python/llm/example/CPU/Speculative-Decoding/qwen/README.md
+++ b/python/llm/example/CPU/Speculative-Decoding/qwen/README.md
@@ -1,21 +1,21 @@
 # Qwen
 In this directory, you will find examples on how you could run Qwen BF16 infernece with 
-self-speculative decoding using BigDL-LLM on Intel CPUs. For illustration purposes, we utilize the [Qwen/Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat) and [Qwen/Qwen-14B-Chat](https://huggingface.co/Qwen/Qwen-14B-Chat) and [Qwen/Qwen-72B-Chat](https://huggingface.co/Qwen/Qwen-72B-Chat) as reference Qwen models.
+self-speculative decoding using IPEX-LLM on Intel CPUs. For illustration purposes, we utilize the [Qwen/Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat) and [Qwen/Qwen-14B-Chat](https://huggingface.co/Qwen/Qwen-14B-Chat) and [Qwen/Qwen-72B-Chat](https://huggingface.co/Qwen/Qwen-72B-Chat) as reference Qwen models.
 
 ## Example: Predict Tokens using `generate()` API
 In the example [speculative.py](./speculative.py), we show a basic use case for a Qwen model to 
-predict the next N tokens using `generate()` API, with BigDL-LLM speculative decoding optimizations on Intel CPUs.
+predict the next N tokens using `generate()` API, with IPEX-LLM speculative decoding optimizations on Intel CPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
-pip install --pre --upgrade bigdl-llm[all]
+pip install --pre --upgrade ipex-llm[all]
 pip install tiktoken einops transformers_stream_generator  # additional package required for Qwen to conduct generation
 ```
 ### 2. Configures environment variables
 ```bash
-source bigdl-llm-init -t
+source ipex-llm-init -t
 export OMP_NUM_THREADS=one_socket_num
 ```
 
diff --git a/python/llm/example/CPU/Speculative-Decoding/starcoder/README.md b/python/llm/example/CPU/Speculative-Decoding/starcoder/README.md
index 36636fab..dcb42d99 100644
--- a/python/llm/example/CPU/Speculative-Decoding/starcoder/README.md
+++ b/python/llm/example/CPU/Speculative-Decoding/starcoder/README.md
@@ -1,23 +1,23 @@
 # Starcoder
-In this directory, you will find examples on how you could run Starcoder BF16 inference with self-speculative decoding using BigDL-LLM on [Intel CPUs](../README.md). For illustration purposes,we utilize the [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) and [bigcode/tiny_starcoder_py](https://huggingface.co/bigcode/tiny_starcoder_py) as reference Starcoder models.
+In this directory, you will find examples on how you could run Starcoder BF16 inference with self-speculative decoding using IPEX-LLM on [Intel CPUs](../README.md). For illustration purposes,we utilize the [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) and [bigcode/tiny_starcoder_py](https://huggingface.co/bigcode/tiny_starcoder_py) as reference Starcoder models.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel CPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM on Intel CPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [speculative.py](./speculative.py), we show a basic use case for a Starcoder model to predict the next N tokens using `generate()` API, with BigDL-LLM speculative decoding optimizations on Intel CPUs.
+In the example [speculative.py](./speculative.py), we show a basic use case for a Starcoder model to predict the next N tokens using `generate()` API, with IPEX-LLM speculative decoding optimizations on Intel CPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
-pip install --pre --upgrade bigdl-llm[all]
+pip install --pre --upgrade ipex-llm[all]
 pip install intel_extension_for_pytorch==2.1.0
 pip install transformers==4.31.0
 ```
 ### 2. Configures high-performing processor environment variables
 ```bash
-source bigdl-llm-init -t
+source ipex-llm-init -t
 export OMP_NUM_THREADS=48 # you can change 48 here to #cores of one processor socket
 ```
 ### 3. Run
diff --git a/python/llm/example/CPU/Speculative-Decoding/vicuna/README.md b/python/llm/example/CPU/Speculative-Decoding/vicuna/README.md
index b648040f..faf31eb0 100644
--- a/python/llm/example/CPU/Speculative-Decoding/vicuna/README.md
+++ b/python/llm/example/CPU/Speculative-Decoding/vicuna/README.md
@@ -1,22 +1,22 @@
 # Vicuna
-In this directory, you will find examples on how you could run Vicuna BF16 inference with self-speculative decoding using BigDL-LLM on [Intel CPUs](../README.md). For illustration purposes, we utilize the [lmsys/vicuna-13b-v1.3](https://huggingface.co/lmsys/vicuna-13b-v1.3) and [lmsys/vicuna-33b-v1.3](https://huggingface.co/lmsys/vicuna-33b-v1.3) as reference Vicuna models.
+In this directory, you will find examples on how you could run Vicuna BF16 inference with self-speculative decoding using IPEX-LLM on [Intel CPUs](../README.md). For illustration purposes, we utilize the [lmsys/vicuna-13b-v1.3](https://huggingface.co/lmsys/vicuna-13b-v1.3) and [lmsys/vicuna-33b-v1.3](https://huggingface.co/lmsys/vicuna-33b-v1.3) as reference Vicuna models.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel CPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM on Intel CPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [speculative.py](./speculative.py), we show a basic use case for a Vicuna model to predict the next N tokens using `generate()` API, with BigDL-LLM speculative decoding optimizations on Intel CPUs.
+In the example [speculative.py](./speculative.py), we show a basic use case for a Vicuna model to predict the next N tokens using `generate()` API, with IPEX-LLM speculative decoding optimizations on Intel CPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
-pip install --pre --upgrade bigdl-llm[all]
+pip install --pre --upgrade ipex-llm[all]
 pip install intel_extension_for_pytorch==2.1.0
 ```
 ### 2. Configures high-performing processor environment variables
 ```bash
-source bigdl-llm-init -t
+source ipex-llm-init -t
 export OMP_NUM_THREADS=48 # you can change 48 here to #cores of one processor socket
 ```
 
@@ -94,4 +94,4 @@ First token latency xx.xxxxs
 
 ### 4. Accelerate with BIGDL_OPT_IPEX
 
-BIGDL_OPT_IPEX can help to accelerate speculative decoding to some extend, and please refer to [here](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/CPU/Speculative-Decoding/llama2#4-accelerate-with-bigdl_opt_ipex) for a try.
+BIGDL_OPT_IPEX can help to accelerate speculative decoding to some extend, and please refer to [here](https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/CPU/Speculative-Decoding/llama2#4-accelerate-with-ipex_opt_ipex) for a try.
diff --git a/python/llm/example/CPU/Speculative-Decoding/ziya/README.md b/python/llm/example/CPU/Speculative-Decoding/ziya/README.md
index e3f95fd1..769b5519 100644
--- a/python/llm/example/CPU/Speculative-Decoding/ziya/README.md
+++ b/python/llm/example/CPU/Speculative-Decoding/ziya/README.md
@@ -1,23 +1,23 @@
 # Ziya
-In this directory, you will find examples on how you could run Ziya BF16 inference with self-speculative decoding using BigDL-LLM on [Intel CPUs](../README.md). For illustration purposes,we utilize the [IDEA-CCNL/Ziya-Coding-34B-v1.0](https://huggingface.co/IDEA-CCNL/Ziya-Coding-34B-v1.0) as reference Ziya model.
+In this directory, you will find examples on how you could run Ziya BF16 inference with self-speculative decoding using IPEX-LLM on [Intel CPUs](../README.md). For illustration purposes,we utilize the [IDEA-CCNL/Ziya-Coding-34B-v1.0](https://huggingface.co/IDEA-CCNL/Ziya-Coding-34B-v1.0) as reference Ziya model.
 
 ## 0. Requirements
-To run the example with BigDL-LLM on Intel CPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run the example with IPEX-LLM on Intel CPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [speculative.py](./speculative.py), we show a basic use case for a Ziya model to predict the next N tokens using `generate()` API, with BigDL-LLM speculative decoding optimizations on Intel CPUs.
+In the example [speculative.py](./speculative.py), we show a basic use case for a Ziya model to predict the next N tokens using `generate()` API, with IPEX-LLM speculative decoding optimizations on Intel CPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
-pip install --pre --upgrade bigdl-llm[all]
+pip install --pre --upgrade ipex-llm[all]
 pip install intel_extension_for_pytorch==2.1.0
 pip install transformers==4.35.2
 ```
 ### 2. Configures high-performing processor environment variables
 ```bash
-source bigdl-llm-init -t
+source ipex-llm-init -t
 export OMP_NUM_THREADS=48 # you can change 48 here to #cores of one processor socket
 ```
 ### 3. Run
diff --git a/python/llm/example/CPU/vLLM-Serving/README.md b/python/llm/example/CPU/vLLM-Serving/README.md
index a7e6ec1e..c4e4c2bc 100644
--- a/python/llm/example/CPU/vLLM-Serving/README.md
+++ b/python/llm/example/CPU/vLLM-Serving/README.md
@@ -1,6 +1,6 @@
 # vLLM continuous batching on Intel CPUs (experimental support)
 
-This example demonstrates how to serve a LLaMA2-7B model using vLLM continuous batching on Intel CPU (with BigDL-LLM 4 bits optimizations).
+This example demonstrates how to serve a LLaMA2-7B model using vLLM continuous batching on Intel CPU (with IPEX-LLM 4 bits optimizations).
 
 The code shown in the following example is ported from [vLLM](https://github.com/vllm-project/vllm/tree/v0.2.1.post1).
 
@@ -14,11 +14,11 @@ To run vLLM continuous batching on Intel CPUs, install the dependencies as follo
 
 ```bash
 # First create an conda environment
-conda create -n bigdl-vllm python==3.9
-conda activate bigdl-vllm
+conda create -n ipex-vllm python==3.9
+conda activate ipex-vllm
 # Install dependencies
 pip3 install numpy
-pip3 install --pre --upgrade bigdl-llm[all]
+pip3 install --pre --upgrade ipex-llm[all]
 pip3 install psutil
 pip3 install sentencepiece  # Required for LLaMA tokenizer.
 pip3 install fastapi
@@ -29,7 +29,7 @@ pip3 install "pydantic<2"  # Required for OpenAI server.
 ### 2. Configure recommended environment variables
 
 ```bash
-source bigdl-llm-init -t
+source ipex-llm-init -t
 ```
 
 ### 3. Offline inference/Service
@@ -57,7 +57,7 @@ To fully utilize the continuous batching feature of the `vLLM`, you can send req
 # You may also want to adjust the `--max-num-batched-tokens` argument, it indicates the hard limit
 # of batched prompt length the server will accept
 numactl -C 48-95 -m 1 python -m ipex_llm.vllm.entrypoints.openai.api_server \
-        --model /MODEL_PATH/Llama-2-7b-chat-hf-bigdl/ --port 8000  \
+        --model /MODEL_PATH/Llama-2-7b-chat-hf-ipex/ --port 8000  \
         --load-format 'auto' --device cpu --dtype bfloat16 \
         --load-in-low-bit sym_int4 \
         --max-num-batched-tokens 4096
@@ -70,7 +70,7 @@ Then you can access the api server as follows:
  curl http://localhost:8000/v1/completions \
          -H "Content-Type: application/json" \
          -d '{
-                 "model": "/MODEL_PATH/Llama-2-7b-chat-hf-bigdl/",
+                 "model": "/MODEL_PATH/Llama-2-7b-chat-hf-ipex/",
                  "prompt": "San Francisco is a",
                  "max_tokens": 128,
                  "temperature": 0
@@ -83,12 +83,12 @@ Currently we have only supported LLaMA family model (including `llama`, `vicuna`
 
 #### 4.1 Add model code
 
-Create or clone the Pytorch model code to `BigDL/python/llm/src/bigdl/llm/vllm/model_executor/models`.
+Create or clone the Pytorch model code to `IPEX/python/llm/src/ipex/llm/vllm/model_executor/models`.
 
 #### 4.2 Rewrite the forward methods
 
-Refering to `BigDL/python/llm/src/bigdl/llm/vllm/model_executor/models/bigdl_llama.py`, it's necessary to maintain a `kv_cache`, which is a nested list of dictionary that maps `req_id` to a three-dimensional tensor **(the structure may vary from models)**. Before the model's actual `forward` method, you could prepare a `past_key_values` according to current `req_id`, and after that you need to update the `kv_cache` with `output.past_key_values`. The clearence will be executed when the request is finished.
+Refering to `IPEX/python/llm/src/ipex/llm/vllm/model_executor/models/ipex_llama.py`, it's necessary to maintain a `kv_cache`, which is a nested list of dictionary that maps `req_id` to a three-dimensional tensor **(the structure may vary from models)**. Before the model's actual `forward` method, you could prepare a `past_key_values` according to current `req_id`, and after that you need to update the `kv_cache` with `output.past_key_values`. The clearence will be executed when the request is finished.
 
 #### 4.3 Register new model
 
-Finally, register your `*ForCausalLM` class to the _MODEL_REGISTRY in `BigDL/python/llm/src/bigdl/llm/vllm/model_executor/model_loader.py`.
+Finally, register your `*ForCausalLM` class to the _MODEL_REGISTRY in `IPEX/python/llm/src/ipex/llm/vllm/model_executor/model_loader.py`.
diff --git a/python/llm/example/GPU/Applications/autogen/README.md b/python/llm/example/GPU/Applications/autogen/README.md
index 7ac7c4eb..da7c66ce 100644
--- a/python/llm/example/GPU/Applications/autogen/README.md
+++ b/python/llm/example/GPU/Applications/autogen/README.md
@@ -1,10 +1,10 @@
-# Running AutoGen Agent Chat with BigDL-LLM on Local Models
-This example is adapted from the [Official AutoGen Teachablility tutorial](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_teachability.ipynb). We use a version of FastChat modified for BigDL to create a teachable chat agent with [AutoGen](https://microsoft.github.io/autogen/) that works with locally deployed LLMs. This special agent can remember things you tell it over time, unlike regular chatbots that forget after each conversation. It does this by saving what it learns on disk, and then bring up the learnt information in future chats. This means you can teach it lots of new things—like facts, new skills, preferences, etc.
+# Running AutoGen Agent Chat with IPEX-LLM on Local Models
+This example is adapted from the [Official AutoGen Teachablility tutorial](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_teachability.ipynb). We use a version of FastChat modified for IPEX-LLM to create a teachable chat agent with [AutoGen](https://microsoft.github.io/autogen/) that works with locally deployed LLMs. This special agent can remember things you tell it over time, unlike regular chatbots that forget after each conversation. It does this by saving what it learns on disk, and then bring up the learnt information in future chats. This means you can teach it lots of new things—like facts, new skills, preferences, etc.
 
 In this example, we illustrate teaching the agent something it doesn't initially know. When we ask, `What is the Vicuna model?`, it doesn't have the answer. We then inform it, `Vicuna is a 13B-parameter language model released by Meta.` We repeat the process for the Orca model, telling the agent, `Orca is a 13B-parameter language model developed by Microsoft. It outperforms Vicuna on most tasks.` Finally, we test if the agent has learned by asking, `How does the Vicuna model compare to the Orca model?` The agent's response confirms it has retained and can use the information we taught it.
 
 
-### 1. Setup BigDL-LLM Environment
+### 1. Setup IPEX-LLM Environment
 ```bash
 # create autogen running directory
 mkdir autogen
@@ -14,9 +14,9 @@ cd autogen
 conda create -n autogen python=3.9
 conda activate autogen
 
-# install xpu-supported and fastchat-adapted bigdl-llm
-# we recommend using bigdl-llm version >= 2.5.0b20240110
-pip install --pre --upgrade bigdl-llm[xpu,serving] -f https://developer.intel.com/ipex-whl-stable-xpu
+# install xpu-supported and fastchat-adapted ipex-llm
+# we recommend using ipex-llm version >= 2.5.0b20240110
+pip install --pre --upgrade ipex-llm[xpu,serving] -f https://developer.intel.com/ipex-whl-stable-xpu
 
 # install recommend transformers version
 pip install transformers==4.36.2
@@ -75,7 +75,7 @@ python -m ipex_llm.serving.model_worker --model-path ... --device xpu
 ```
 
 Model Name Note:
-> Assume you use the model `Mistral-7B-Instruct-v0.2` and your model is downloaded to `autogen/model/Mistral-7B-Instruct-v0.2`. You should rename the model to `autogen/model/bigdl` and run `python -m ipex_llm.serving.model_worker --model-path ... --device xpu`. This ensures the proper usage of the BigDL-adapted FastChat.
+> Assume you use the model `Mistral-7B-Instruct-v0.2` and your model is downloaded to `autogen/model/Mistral-7B-Instruct-v0.2`. You should rename the model to `autogen/model/ipex-llm` and run `python -m ipex_llm.serving.model_worker --model-path ... --device xpu`. This ensures the proper usage of the IPEX-LLM-adapted FastChat.
 
 Device Note:
 > Please set `--device` to `xpu` to enable the Intel GPU usage.
@@ -194,4 +194,4 @@ The Vicuna model is a 13B-parameter language model released by Meta. It's design
 On the other hand, the Orca model is a large-scale language model developed by Microsoft. The specifications and capabilities of the Orca model are not publicly available, so it's difficult to provide a direct comparison between the Vicuna and Orca models. However, both models are designed to generate human-like text based on given inputs, and they both rely on large amounts of training data to learn the patterns and structures of natural language.
 
 --------------------------------------------------------------------------------
-```
\ No newline at end of file
+```
diff --git a/python/llm/example/GPU/Applications/streaming-llm/README.md b/python/llm/example/GPU/Applications/streaming-llm/README.md
index c783bc09..c4c3c6c5 100644
--- a/python/llm/example/GPU/Applications/streaming-llm/README.md
+++ b/python/llm/example/GPU/Applications/streaming-llm/README.md
@@ -1,7 +1,7 @@
-# Low-Bit Streaming LLM using BigDL-LLM
+# Low-Bit Streaming LLM using IPEX-LLM
 
-In this example, we apply low-bit optimizations to [Streaming-LLM](https://github.com/mit-han-lab/streaming-llm/tree/main#efficient-streaming-language-models-with-attention-sinks) using BigDL-LLM, which can deploy low-bit(including FP4/INT4/FP8/INT8) LLMs for infinite-length inputs.
-Only one code change is needed to load the model using bigdl-llm as follows:
+In this example, we apply low-bit optimizations to [Streaming-LLM](https://github.com/mit-han-lab/streaming-llm/tree/main#efficient-streaming-language-models-with-attention-sinks) using IPEX-LLM, which can deploy low-bit(including FP4/INT4/FP8/INT8) LLMs for infinite-length inputs.
+Only one code change is needed to load the model using ipex-llm as follows:
 ```python
 from ipex_llm.transformers import AutoModelForCausalLM
 model = AutoModelForCausalLM.from_pretrained(model_name_or_path, load_in_4bit=True, trust_remote_code=True, optimize_model=False)
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 pip install -U transformers==4.34.0
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ## Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/Deepspeed-AutoTP/README.md b/python/llm/example/GPU/Deepspeed-AutoTP/README.md
index 346bc8fc..3f745d85 100644
--- a/python/llm/example/GPU/Deepspeed-AutoTP/README.md
+++ b/python/llm/example/GPU/Deepspeed-AutoTP/README.md
@@ -1,9 +1,9 @@
-# Run BigDL-LLM on Multiple Intel GPUs using DeepSpeed AutoTP
+# Run IPEX-LLM on Multiple Intel GPUs using DeepSpeed AutoTP
 
-This example demonstrates how to run BigDL-LLM optimized low-bit model on multiple [Intel GPUs](../README.md) by leveraging DeepSpeed AutoTP.
+This example demonstrates how to run IPEX-LLM optimized low-bit model on multiple [Intel GPUs](../README.md) by leveraging DeepSpeed AutoTP.
 
 ## Requirements
-To run this example with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. For this particular example, you will need at least two GPUs on your machine.
+To run this example with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. For this particular example, you will need at least two GPUs on your machine.
 
 ## Example:
 
@@ -13,7 +13,7 @@ To run this example with BigDL-LLM on Intel GPUs, we have some recommended requi
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install oneccl_bind_pt==2.1.100 -f https://developer.intel.com/ipex-whl-stable-xpu
 # configures OneAPI environment variables
 source /opt/intel/oneapi/setvars.sh
@@ -25,7 +25,7 @@ conda install -c conda-forge -y gperftools=2.10 # to enable tcmalloc
 > **Important**: IPEX 2.1.10+xpu requires Intel® oneAPI Base Toolkit's version == 2024.0. Please make sure you have installed the correct version.
 
 ### 2. Run tensor parallel inference on multiple GPUs
-Here, we separate inference process into two stages. First, convert to deepspeed model and apply bigdl-llm optimization on CPU. Then, utilize XPU as DeepSpeed accelerator to inference. In this way, a *X*B model saved in 16-bit will requires approximately 0.5*X* GB total GPU memory in the whole process. For example, if you select to use two GPUs, 0.25*X* GB memory is required per GPU.
+Here, we separate inference process into two stages. First, convert to deepspeed model and apply ipex-llm optimization on CPU. Then, utilize XPU as DeepSpeed accelerator to inference. In this way, a *X*B model saved in 16-bit will requires approximately 0.5*X* GB total GPU memory in the whole process. For example, if you select to use two GPUs, 0.25*X* GB memory is required per GPU.
 
 Please select the appropriate model size based on the capabilities of your machine.
 
@@ -59,7 +59,7 @@ bash run_vicuna_33b_arc_2_card.sh
 [0] One day, she decided to go on a journey to find the legendary
 ```
 
-**Important**: The first token latency is much larger than rest token latency, you could use [our benchmark tool](https://github.com/intel-analytics/BigDL/blob/main/python/llm/dev/benchmark/README.md) to obtain more details about first and rest token latency.
+**Important**: The first token latency is much larger than rest token latency, you could use [our benchmark tool](https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/dev/benchmark/README.md) to obtain more details about first and rest token latency.
 
 ### Known Issue
 - In our example scripts, tcmalloc is enabled through `export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so:${LD_PRELOAD}` which speed up inference, but this may raise `munmap_chunk(): invalid pointer` error after finishing inference.
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ/README.md
index d899a914..35c3c384 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ/README.md
@@ -1,6 +1,6 @@
 # AWQ
 
-This example shows how to directly run 4-bit AWQ models using BigDL-LLM on Intel GPU.
+This example shows how to directly run 4-bit AWQ models using IPEX-LLM on Intel GPU.
 
 ## Verified Models
 
@@ -22,11 +22,11 @@ This example shows how to directly run 4-bit AWQ models using BigDL-LLM on Intel
 
 ## Requirements
 
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
 
-In the example [generate.py](./generate.py), we show a basic use case for a AWQ model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a AWQ model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 
 ### 1. Install
 
@@ -36,7 +36,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.35.0
 pip install autoawq==0.1.8 --no-deps
 pip install accelerate==0.25.0
@@ -72,7 +72,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Llama2 model based on the capabilities of your machine.
 
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF-IQ2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF-IQ2/README.md
index 0f15945e..b580015d 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF-IQ2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF-IQ2/README.md
@@ -1,6 +1,6 @@
 # GGUF-IQ2
 
-This example shows how to run INT2 models using the IQ2 mechanism (first implemented by `llama.cpp`) in BigDL-LLM on Intel GPU.
+This example shows how to run INT2 models using the IQ2 mechanism (first implemented by `llama.cpp`) in IPEX-LLM on Intel GPU.
 
 ## Verified Models
 
@@ -12,11 +12,11 @@ This example shows how to run INT2 models using the IQ2 mechanism (first impleme
 
 ## Requirements
 
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
 
-In the example [generate.py](./generate.py), we show a basic use case for a GGUF-IQ2 model to predict the next N tokens using `generate()` API, with BigDL-LLM optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a GGUF-IQ2 model to predict the next N tokens using `generate()` API, with IPEX-LLM optimizations.
 
 ### 1. Install
 
@@ -26,7 +26,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.35.0
 ```
 **Note: For Mixtral model, please use transformers 4.36.0:**
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF/README.md
index b75bb179..9f500eca 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF/README.md
@@ -1,5 +1,5 @@
 # Loading GGUF models
-In this directory, you will find examples on how to load GGUF model into `bigdl-llm`.
+In this directory, you will find examples on how to load GGUF model into `ipex-llm`.
 
 ## Verified Models(Q4_0)
 - [Llama-2-7B-Chat-GGUF](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/tree/main)
@@ -11,23 +11,23 @@ In this directory, you will find examples on how to load GGUF model into `bigdl-
 - [mpt-7b-chat-gguf](https://huggingface.co/maddes8cht/mosaicml-mpt-7b-chat-gguf/tree/main)
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#system-support) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#system-support) for more information.
 
 **Important: Please make sure you have installed `transformers==4.36.0` to run the example.**
 
 ## Example: Load gguf model using `from_gguf()` API
-In the example [generate.py](./generate.py), we show a basic use case to load a GGUF LLaMA2 model into `bigdl-llm` using `from_gguf()` API, with BigDL-LLM optimizations.
+In the example [generate.py](./generate.py), we show a basic use case to load a GGUF LLaMA2 model into `ipex-llm` using `from_gguf()` API, with IPEX-LLM optimizations.
 
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.36.0  # upgrade transformers
 ```
 
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GPTQ/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GPTQ/README.md
index 765aa9a4..575eff47 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GPTQ/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GPTQ/README.md
@@ -1,18 +1,18 @@
 # GPTQ
-This example shows how to directly run 4-bit GPTQ models using BigDL-LLM on Intel GPU. For illustration purposes, we utilize the ["TheBloke/Llama-2-7B-GPTQ"](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ) as a reference.
+This example shows how to directly run 4-bit GPTQ models using IPEX-LLM on Intel GPU. For illustration purposes, we utilize the ["TheBloke/Llama-2-7B-GPTQ"](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ) as a reference.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.34.0
 BUILD_CUDA_EXT=0 pip install git+https://github.com/PanQiWei/AutoGPTQ.git@1de9ab6
 pip install optimum==0.14.0
@@ -41,7 +41,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Llama2 model based on the capabilities of your machine.
 
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/README.md
index d02983b9..4e60a656 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/README.md
@@ -1,5 +1,5 @@
-# BigDL-LLM Transformers INT4 Optimization for Large Language Model on Intel GPUs
-You can use BigDL-LLM to run almost every Huggingface Transformer models with INT4 optimizations on your laptops with Intel GPUs. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
+# IPEX-LLM Transformers INT4 Optimization for Large Language Model on Intel GPUs
+You can use IPEX-LLM to run almost every Huggingface Transformer models with INT4 optimizations on your laptops with Intel GPUs. This directory contains example scripts to help you quickly get started using IPEX-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
 
 
 
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila/README.md
index 6e9cbee8..758dd6ac 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila/README.md
@@ -1,16 +1,16 @@
 # Aquila
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Aquila models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [BAAI/AquilaChat-7B](https://huggingface.co/BAAI/AquilaChat-7B) as a reference Aquila model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Aquila models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [BAAI/AquilaChat-7B](https://huggingface.co/BAAI/AquilaChat-7B) as a reference Aquila model.
 
 > **Note**: If you want to download the Hugging Face *Transformers* model, please refer to [here](https://huggingface.co/docs/hub/models-downloading#using-git).
 >
-> BigDL-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
+> IPEX-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Aquila model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Aquila model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 
 ### 1. Install
 #### 1.1 Installation on Linux
@@ -19,7 +19,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 #### 1.2 Installation on Windows
 We suggest using conda to manage environment:
@@ -27,7 +27,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila2/README.md
index 8cae2285..628e3e41 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila2/README.md
@@ -1,16 +1,16 @@
 # Aquila2
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Aquila2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [BAAI/AquilaChat2-7B](https://huggingface.co/BAAI/AquilaChat2-7B) as a reference Aquila2 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Aquila2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [BAAI/AquilaChat2-7B](https://huggingface.co/BAAI/AquilaChat2-7B) as a reference Aquila2 model.
 
 > **Note**: If you want to download the Hugging Face *Transformers* model, please refer to [here](https://huggingface.co/docs/hub/models-downloading#using-git).
 >
-> BigDL-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
+> IPEX-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Aquila2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Aquila2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 
 ### 1. Install
 #### 1.1 Installation on Linux
@@ -19,7 +19,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -28,7 +28,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan/README.md
index 4c45053a..bc98ef4f 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan/README.md
@@ -1,11 +1,11 @@
 # Baichuan
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Baichuan models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat) as a reference Baichuan model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Baichuan models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat) as a reference Baichuan model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Baichuan model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Baichuan model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers_stream_generator  # additional package required for Baichuan-13B-Chat to conduct generation
 ```
 
@@ -23,7 +23,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers_stream_generator  # additional package required for Baichuan-13B-Chat to conduct generation
 ```
 
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/README.md
index f49219c4..8dca1bf9 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/README.md
@@ -1,11 +1,11 @@
 # Baichuan
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Baichuan2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [baichuan-inc/Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan-7B-Chat) as a reference Baichuan model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Baichuan2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [baichuan-inc/Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan-7B-Chat) as a reference Baichuan model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Baichuan model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Baichuan model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers_stream_generator  # additional package required for Baichuan-7B-Chat to conduct generation
 ```
 
@@ -23,7 +23,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers_stream_generator  # additional package required for Baichuan-7B-Chat to conduct generation
 ```
 
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/bluelm/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/bluelm/README.md
index 4c112ed7..85b37d69 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/bluelm/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/bluelm/README.md
@@ -1,11 +1,11 @@
 # BlueLM
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on BlueLM models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [vivo-ai/BlueLM-7B-Chat](https://huggingface.co/vivo-ai/BlueLM-7B-Chat) as a reference BlueLM model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on BlueLM models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [vivo-ai/BlueLM-7B-Chat](https://huggingface.co/vivo-ai/BlueLM-7B-Chat) as a reference BlueLM model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a BlueLM model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a BlueLM model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -22,7 +22,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2/README.md
index 879e6f0b..bbaa1e08 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2/README.md
@@ -1,12 +1,12 @@
 # ChatGLM2
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on ChatGLM2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b) as a reference ChatGLM2 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on ChatGLM2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b) as a reference ChatGLM2 model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example 1: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 
 ### 1. Install
 #### 1.1 Installation on Linux
@@ -15,7 +15,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 #### 1.2 Installation on Windows
 We suggest using conda to manage environment:
@@ -23,7 +23,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
@@ -132,7 +132,7 @@ Inference time: xxxx s
 ```
 
 ## Example 2: Stream Chat using `stream_chat()` API
-In the example [streamchat.py](./streamchat.py), we show a basic use case for a ChatGLM2 model to stream chat, with BigDL-LLM INT4 optimizations.
+In the example [streamchat.py](./streamchat.py), we show a basic use case for a ChatGLM2 model to stream chat, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -140,7 +140,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 #### 1.2 Installation on Windows
 We suggest using conda to manage environment:
@@ -148,7 +148,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm3/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm3/README.md
index 663c5478..e7a1f77e 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm3/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm3/README.md
@@ -1,12 +1,12 @@
 # ChatGLM3
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on ChatGLM3 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b) as a reference ChatGLM3 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on ChatGLM3 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b) as a reference ChatGLM3 model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example 1: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM3 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM3 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -14,7 +14,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -23,7 +23,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
@@ -133,7 +133,7 @@ AI stands for Artificial Intelligence. It refers to the development of computer
 ```
 
 ## Example 2: Stream Chat using `stream_chat()` API
-In the example [streamchat.py](./streamchat.py), we show a basic use case for a ChatGLM3 model to stream chat, with BigDL-LLM INT4 optimizations.
+In the example [streamchat.py](./streamchat.py), we show a basic use case for a ChatGLM3 model to stream chat, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -141,7 +141,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -150,7 +150,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chinese-llama2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chinese-llama2/README.md
index f91cf4d9..f37d6774 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chinese-llama2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chinese-llama2/README.md
@@ -1,11 +1,11 @@
 # Chinese Llama2
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Chinese LLaMA models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [LinkSoul/Chinese-Llama-2-7b](https://huggingface.co/LinkSoul/Chinese-Llama-2-7b) as reference Chinese LLaMA models.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Chinese LLaMA models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [LinkSoul/Chinese-Llama-2-7b](https://huggingface.co/LinkSoul/Chinese-Llama-2-7b) as reference Chinese LLaMA models.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -22,7 +22,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/codellama/readme.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/codellama/readme.md
index 902678b4..ae83d88e 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/codellama/readme.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/codellama/readme.md
@@ -1,11 +1,11 @@
 # CodeLlama
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on CodeLlama models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) as a reference CodeLlama model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on CodeLlama models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) as a reference CodeLlama model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for an CodeLlama model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for an CodeLlama model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.34.1 # CodeLlamaTokenizer is supported in higher version of transformers
 ```
 
@@ -23,7 +23,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.34.1 # CodeLlamaTokenizer is supported in higher version of transformers
 ```
 
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deciLM-7b/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deciLM-7b/README.md
index 337398c0..9759c01d 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deciLM-7b/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deciLM-7b/README.md
@@ -1,23 +1,23 @@
 # DeciLM-7B
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on DeciLM-7B models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [Deci/DeciLM-7B-instruct](https://huggingface.co/Deci/DeciLM-7B-instruct) as a reference DeciLM-7B model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on DeciLM-7B models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [Deci/DeciLM-7B-instruct](https://huggingface.co/Deci/DeciLM-7B-instruct) as a reference DeciLM-7B model.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a DeciLM-7B model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a DeciLM-7B model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.0.110+xpu as default
 # you can install specific ipex/torch version for your need
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.35.2  # required by DeciLM-7B
 ```
 #### 1.2 Installation on Windows
@@ -26,7 +26,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.35.2  # required by DeciLM-7B
 ```
 
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deepseek/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deepseek/README.md
index 969b6529..c18adc65 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deepseek/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/deepseek/README.md
@@ -1,11 +1,11 @@
 # Deepseek
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Deepseek models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) as a reference Deepseek model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Deepseek models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) as a reference Deepseek model.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Deepseek model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Deepseek model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -14,7 +14,7 @@ conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.0.110+xpu as default
 # you can install specific ipex/torch version for your need
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -23,7 +23,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/distil-whisper/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/distil-whisper/README.md
index 76c2bfb7..217b590f 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/distil-whisper/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/distil-whisper/README.md
@@ -1,22 +1,22 @@
 # Distil-Whisper
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Distil-Whisper models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2) as a reference Distil-Whisper model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Distil-Whisper models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2) as a reference Distil-Whisper model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Recognize Tokens using `generate()` API
-In the example [recognize.py](./recognize.py), we show a basic use case for a Distil-Whisper model to conduct transcription using `pipeline()` API for long audio input, with BigDL-LLM INT4 optimizations.
+In the example [recognize.py](./recognize.py), we show a basic use case for a Distil-Whisper model to conduct transcription using `pipeline()` API for long audio input, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install datasets soundfile librosa # required by audio processing
 ```
 
@@ -26,7 +26,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install datasets soundfile librosa # required by audio processing
 ```
 
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v1/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v1/README.md
index 1ff5ab28..e532ac7e 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v1/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v1/README.md
@@ -1,11 +1,11 @@
 # Dolly v1
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Dolly v1 models. For illustration purposes, we utilize the [databricks/dolly-v1-6b](https://huggingface.co/databricks/dolly-v1-6b) as a reference Dolly v1 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Dolly v1 models. For illustration purposes, we utilize the [databricks/dolly-v1-6b](https://huggingface.co/databricks/dolly-v1-6b) as a reference Dolly v1 model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Dolly v1 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Dolly v1 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 
 
 ### 1. Install
@@ -15,7 +15,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -24,7 +24,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
@@ -107,7 +107,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Dolly v1 model based on the capabilities of your machine.
 
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v2/README.md
index 82b741cf..f0670cee 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/dolly-v2/README.md
@@ -1,11 +1,11 @@
 # Dolly v2
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Dolly v2 models. For illustration purposes, we utilize the [databricks/dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b) as a reference Dolly v2 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Dolly v2 models. For illustration purposes, we utilize the [databricks/dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b) as a reference Dolly v2 model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Dolly v2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Dolly v2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -22,7 +22,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon/README.md
index e3c96d07..98f5b93f 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon/README.md
@@ -1,12 +1,12 @@
 # Falcon
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Falcon models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) as a reference Falcon model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Falcon models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) as a reference Falcon model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Falcon model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Falcon model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -14,7 +14,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install einops # additional package required for falcon-7b-instruct to conduct generation
 ```
 
@@ -24,12 +24,12 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install einops # additional package required for falcon-7b-instruct to conduct generation
 ```
 
 ### 2. (Optional) Download Model and Replace File
-If you select the Falcon model ([tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct)), please note that their code (`modelling_RW.py`) does not support KV cache at the moment. To address issue, we have provided updated file ([falcon-7b-instruct/modelling_RW.py](./falcon-7b-instruct/modelling_RW.py)), which can be used to achieve the best performance using BigDL-LLM INT4 optimizations with KV cache support.
+If you select the Falcon model ([tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct)), please note that their code (`modelling_RW.py`) does not support KV cache at the moment. To address issue, we have provided updated file ([falcon-7b-instruct/modelling_RW.py](./falcon-7b-instruct/modelling_RW.py)), which can be used to achieve the best performance using IPEX-LLM INT4 optimizations with KV cache support.
 After transformers 4.36, only transformer models are supported since remote code diverges from transformer model code, make sure set `trust_remote_code=False`.
 ```python
  model = AutoModelForCausalLM.from_pretrained(model_path,
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/flan-t5/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/flan-t5/README.md
index c0168eb0..f99966cc 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/flan-t5/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/flan-t5/README.md
@@ -1,22 +1,22 @@
 # Flan-t5
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Flan-t5 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [google/flan-t5-xxl](https://huggingface.co/google/flan-t5-xxl) as a reference Flan-t5 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Flan-t5 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [google/flan-t5-xxl](https://huggingface.co/google/flan-t5-xxl) as a reference Flan-t5 model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Flan-t5 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Flan-t5 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -25,7 +25,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gemma/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gemma/README.md
index 74dda0cd..23b1cd79 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gemma/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gemma/README.md
@@ -1,24 +1,24 @@
 # Gemma
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Google Gemma models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [google/gemma-7b-it ](https://huggingface.co/google/gemma-7b-it) and [google/gemma-2b-it](https://huggingface.co/google/gemma-2b-it) as reference Gemma models.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Google Gemma models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [google/gemma-7b-it ](https://huggingface.co/google/gemma-7b-it) and [google/gemma-2b-it](https://huggingface.co/google/gemma-2b-it) as reference Gemma models.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 **Important: According to Gemma's requirement, please make sure you have installed `transformers==4.38.1` to run the example.**
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Gemma model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Gemma model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 
 # According to Gemma's requirement, please make sure you are using a stable version of Transformers, 4.38.1 or newer.
 pip install transformers==4.38.1
@@ -30,7 +30,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 
 # According to Gemma's requirement, please make sure you are using a stable version of Transformers, 4.38.1 or newer.
 pip install transformers==4.38.1
@@ -140,7 +140,7 @@ model
 ### 5. Know issue
 #### 1. Random and unreadable output of Gemma-7b-it on Arc770 ubuntu 22.04 due to driver and OneAPI missmatching.
 
-If driver and OneAPI missmatching, it will lead to some error when BigDL use XMX(short prompts) for speeding up.
+If driver and OneAPI missmatching, it will lead to some error when IPEX use XMX(short prompts) for speeding up.
 The output of `What's AI?` may like below:
 ```
 wiedzy Artificial Intelligence meliti: Artificial Intelligence undenti beng beng beng beng beng beng beng beng beng beng beng beng beng beng beng beng beng beng beng beng beng beng
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gpt-j/readme.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gpt-j/readme.md
index 100c5b15..152debf5 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gpt-j/readme.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gpt-j/readme.md
@@ -1,11 +1,11 @@
 # GPT-J
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on GPT-J models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [EleutherAI/gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b) as reference GPT-J models.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on GPT-J models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [EleutherAI/gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b) as reference GPT-J models.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a GPT-J model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a GPT-J model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -22,7 +22,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm/README.md
index e81ea33a..7d36c184 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm/README.md
@@ -1,11 +1,11 @@
 # InternLM
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on InternLM models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [internlm/internlm-chat-7b-8k](https://huggingface.co/internlm/internlm-chat-7b-8k) as a reference InternLM model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on InternLM models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [internlm/internlm-chat-7b-8k](https://huggingface.co/internlm/internlm-chat-7b-8k) as a reference InternLM model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a InternLM model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a InternLM model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -22,7 +22,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm2/README.md
index 78b02254..70a887bb 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm2/README.md
@@ -1,11 +1,11 @@
 # InternLM2
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on InternLM2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [internlm/internlm2-chat-7b](https://huggingface.co/internlm/internlm2-chat-7b) as a reference InternLM model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on InternLM2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [internlm/internlm2-chat-7b](https://huggingface.co/internlm/internlm2-chat-7b) as a reference InternLM model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a InternLM2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a InternLM2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -22,7 +22,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2/README.md
index 26b1a638..6decfaa0 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2/README.md
@@ -1,11 +1,11 @@
 # Llama2
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Llama2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) and [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as reference Llama2 models.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Llama2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) and [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as reference Llama2 models.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -22,7 +22,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral/README.md
index 8cd40a15..330d2e1f 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral/README.md
@@ -1,24 +1,24 @@
 # Mistral
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Mistral models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) and [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) as reference Mistral models.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Mistral models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) and [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) as reference Mistral models.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 **Important: According to [Mistral Troubleshooting](https://huggingface.co/mistralai/Mistral-7B-v0.1#troubleshooting), please make sure you have installed `transformers==4.34.0` to run the example.**
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Mistral model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Mistral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 
 # Refer to https://huggingface.co/mistralai/Mistral-7B-v0.1#troubleshooting, please make sure you are using a stable version of Transformers, 4.34.0 or newer.
 pip install transformers==4.34.0
@@ -30,7 +30,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 
 # Refer to https://huggingface.co/mistralai/Mistral-7B-v0.1#troubleshooting, please make sure you are using a stable version of Transformers, 4.34.0 or newer.
 pip install transformers==4.34.0
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral/README.md
index 309e4e25..47934abe 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral/README.md
@@ -1,24 +1,24 @@
 # Mixtral
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Mixtral models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) as a reference Mixtral model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Mixtral models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) as a reference Mixtral model.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 **Important: Please make sure you have installed `transformers==4.36.0` to run the example.**
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Mixtral model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Mixtral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 
 # Please make sure you are using a stable version of Transformers, 4.36.0 or newer.
 pip install transformers==4.36.0
@@ -30,7 +30,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 
 # Please make sure you are using a stable version of Transformers, 4.36.0 or newer.
 pip install transformers==4.36.0
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mpt/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mpt/README.md
index a073ce02..2419226c 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mpt/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mpt/README.md
@@ -1,11 +1,11 @@
 # MPT
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Llama2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [mosaicml/mpt-7b-chat](https://huggingface.co/mosaicml/mpt-7b-chat) as a reference MPT model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Llama2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [mosaicml/mpt-7b-chat](https://huggingface.co/mosaicml/mpt-7b-chat) as a reference MPT model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for an MPT model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for an MPT model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install einops  # additional package required for mpt-7b-chat and mpt-30b-chat to conduct generation
 ```
 
@@ -23,7 +23,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-1_5/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-1_5/README.md
index 8e108b1f..252328ad 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-1_5/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-1_5/README.md
@@ -1,11 +1,11 @@
 # phi-1_5
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on phi-1_5 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5) as a reference phi-1_5 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on phi-1_5 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5) as a reference phi-1_5 model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a phi-1_5 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a phi-1_5 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install einops # additional package required for phi-1_5 to conduct generation
 ```
 
@@ -23,7 +23,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install einops # additional package required for phi-1_5 to conduct generation
 ```
 
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-2/README.md
index a251f4da..b0eb3fbf 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phi-2/README.md
@@ -1,11 +1,11 @@
 # phi-2
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on phi-2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) as a reference phi-2 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on phi-2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) as a reference phi-2 model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a phi-2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a phi-2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install einops # additional package required for phi-2 to conduct generation
 ```
 #### 1.2 Installation on Windows
@@ -22,7 +22,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phixtral/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phixtral/README.md
index b271f5ba..27aad5a3 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phixtral/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/phixtral/README.md
@@ -1,11 +1,11 @@
 # Phixtral
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on phixtral models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [mlabonne/phixtral-4x2_8](https://huggingface.co/mlabonne/phixtral-4x2_8) as a reference phixtral model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on phixtral models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [mlabonne/phixtral-4x2_8](https://huggingface.co/mlabonne/phixtral-4x2_8) as a reference phixtral model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a InternLM model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a InternLM model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -22,7 +22,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen-vl/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen-vl/README.md
index d5c54ff6..f56dd404 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen-vl/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen-vl/README.md
@@ -1,21 +1,21 @@
 # Qwen-VL
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Qwen-VL models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [Qwen/Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) as a reference Qwen-VL model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Qwen-VL models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [Qwen/Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) as a reference Qwen-VL model.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Multimodal chat using `chat()` API
-In the example [chat.py](./chat.py), we show a basic use case for a Qwen-VL model to start a multimodal chat using `chat()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [chat.py](./chat.py), we show a basic use case for a Qwen-VL model to start a multimodal chat using `chat()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install accelerate tiktoken einops transformers_stream_generator==0.0.4 scipy torchvision pillow tensorboard matplotlib # additional package required for Qwen-VL-Chat to conduct generation
 ```
 
@@ -25,7 +25,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install accelerate tiktoken einops transformers_stream_generator==0.0.4 scipy torchvision pillow tensorboard matplotlib # additional package required for Qwen-VL-Chat to conduct generation
 ```
 
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen/README.md
index ee1d162e..8b361d3c 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen/README.md
@@ -1,11 +1,11 @@
 # Qwen
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Qwen models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat) as a reference Qwen model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Qwen models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat) as a reference Qwen model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Qwen model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Qwen model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install tiktoken einops transformers_stream_generator  # additional package required for Qwen-7B-Chat to conduct generation
 ```
 
@@ -23,7 +23,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install tiktoken einops transformers_stream_generator  # additional package required for Qwen-7B-Chat to conduct generation
 ```
 
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen1.5/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen1.5/README.md
index 979c511a..c2a411c8 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen1.5/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen1.5/README.md
@@ -1,11 +1,11 @@
 # Qwen1.5
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Qwen1.5 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) as a reference InternLM model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Qwen1.5 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) as a reference InternLM model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Qwen1.5 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Qwen1.5 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.37.0 # install transformers which supports Qwen2
 ```
 
@@ -23,7 +23,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.37.2 # install transformers which supports Qwen2
 ```
 
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/redpajama/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/redpajama/README.md
index 0f412391..f3b4909f 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/redpajama/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/redpajama/README.md
@@ -1,21 +1,21 @@
 # RedPajama
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on redpajama models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [redpajama/gptneox-7b-redpajama-bf16](https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Chat) as a reference redpajama model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on redpajama models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [redpajama/gptneox-7b-redpajama-bf16](https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Chat) as a reference redpajama model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for an redpajama model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for an redpajama model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -24,7 +24,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/replit/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/replit/README.md
index f869f46f..546fc202 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/replit/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/replit/README.md
@@ -1,21 +1,21 @@
 # Replit
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Replit models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [replit/replit-code-v1-3b](https://huggingface.co/replit/replit-code-v1-3b) as a reference Replit model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Replit models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [replit/replit-code-v1-3b](https://huggingface.co/replit/replit-code-v1-3b) as a reference Replit model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for an Replit model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for an Replit model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -24,7 +24,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv4/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv4/README.md
index 19283e14..9a2993eb 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv4/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv4/README.md
@@ -1,12 +1,12 @@
 # RWKV4
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on RWKV4 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [RWKV/rwkv-4-world-7b](https://huggingface.co/RWKV/rwkv-4-world-7b) as a reference RWKV4 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on RWKV4 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [RWKV/rwkv-4-world-7b](https://huggingface.co/RWKV/rwkv-4-world-7b) as a reference RWKV4 model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example 1: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a RWKV4 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a RWKV4 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 
 ### 1. Install
 #### 1.1 Installation on Linux
@@ -15,7 +15,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 #### 1.2 Installation on Windows
 We suggest using conda to manage environment:
@@ -23,7 +23,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv5/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv5/README.md
index bd78ecc6..b058bca5 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv5/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/rwkv5/README.md
@@ -1,12 +1,12 @@
 # RWKV5
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on RWKV5 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [RWKV/HF_v5-Eagle-7B](https://huggingface.co/RWKV/HF_v5-Eagle-7B) as a reference RWKV5 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on RWKV5 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [RWKV/HF_v5-Eagle-7B](https://huggingface.co/RWKV/HF_v5-Eagle-7B) as a reference RWKV5 model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example 1: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a RWKV5 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a RWKV5 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 
 ### 1. Install
 #### 1.1 Installation on Linux
@@ -15,7 +15,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 #### 1.2 Installation on Windows
 We suggest using conda to manage environment:
@@ -23,7 +23,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/solar/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/solar/README.md
index 957b0736..a666f06d 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/solar/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/solar/README.md
@@ -1,11 +1,11 @@
 # SOLAR
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on SOLAR models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [upstage/SOLAR-10.7B-Instruct-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0) as a reference SOLAR model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on SOLAR models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [upstage/SOLAR-10.7B-Instruct-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0) as a reference SOLAR model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a SOLAR model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a SOLAR model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.35.2 # required by SOLAR
 ```
 
@@ -23,7 +23,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.35.2 # required by SOLAR
 ```
 
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder/readme.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder/readme.md
index 3b8b46c3..82ca54bf 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder/readme.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder/readme.md
@@ -1,11 +1,11 @@
 # StarCoder
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on StarCoder models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) as a reference StarCoder model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on StarCoder models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) as a reference StarCoder model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for an StarCoder model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for an StarCoder model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -22,7 +22,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/vicuna/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/vicuna/README.md
index de227e03..880224f6 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/vicuna/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/vicuna/README.md
@@ -1,11 +1,11 @@
 # Vicuna
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Vicuna models. For illustration purposes, we utilize the [lmsys/vicuna-13b-v1.3](https://huggingface.co/lmsys/vicuna-13b-v1.3) and [eachadea/vicuna-7b-1.1](https://huggingface.co/eachadea/vicuna-7b-1.1) as reference Vicuna models.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Vicuna models. For illustration purposes, we utilize the [lmsys/vicuna-13b-v1.3](https://huggingface.co/lmsys/vicuna-13b-v1.3) and [eachadea/vicuna-7b-1.1](https://huggingface.co/eachadea/vicuna-7b-1.1) as reference Vicuna models.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Vicuna model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
+In the example [generate.py](./generate.py), we show a basic use case for a Vicuna model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations.
 
 
 ### 1. Install
@@ -15,7 +15,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -24,7 +24,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
@@ -107,7 +107,7 @@ Arguments info:
 - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`.
 - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
 
-> **Note**: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
+> **Note**: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a *X*B model saved in 16-bit will requires approximately 2*X* GB of memory for loading, and ~0.5*X* GB memory for further inference.
 >
 > Please select the appropriate size of the Vicuna model based on the capabilities of your machine.
 
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/voiceassistant/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/voiceassistant/README.md
index e5965073..5a809c85 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/voiceassistant/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/voiceassistant/README.md
@@ -1,13 +1,13 @@
 # Voice Assistant
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Whisper and Llama2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the following models: 
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Whisper and Llama2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the following models: 
 - [openai/whisper-small](https://huggingface.co/openai/whisper-small) and [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) as reference whisper models.
 - [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) and [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as reference Llama2 models.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Whisper model to conduct transcription using `generate()` API, then use the recoginzed text as the input for Llama2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Whisper model to conduct transcription using `generate()` API, then use the recoginzed text as the input for Llama2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -15,7 +15,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install librosa soundfile datasets
 pip install accelerate
 pip install SpeechRecognition sentencepiece colorama
@@ -29,7 +29,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install librosa soundfile datasets
 pip install accelerate
 pip install SpeechRecognition sentencepiece colorama
@@ -162,8 +162,8 @@ frame_data = np.frombuffer(audio.frame_data, np.int16).flatten().astype(np.float
 
 #### Sample Output
 ```bash
-(llm) bigdl@bigdl-llm:~/Documents/voiceassistant$ python generate.py --llama2-repo-id-or-model-path /mnt/windows/demo/models/Llama-2-7b-chat-hf --whisper-repo-id-or-model-path /mnt/windows/demo/models/whisper-medium
-/home/bigdl/anaconda3/envs/llm/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
+(llm) ipex@ipex-llm:~/Documents/voiceassistant$ python generate.py --llama2-repo-id-or-model-path /mnt/windows/demo/models/Llama-2-7b-chat-hf --whisper-repo-id-or-model-path /mnt/windows/demo/models/whisper-medium
+/home/ipex/anaconda3/envs/llm/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
   warn(
 
 [?] Which microphone do you choose?: Default
@@ -189,11 +189,11 @@ Extracting data files: 100%|█████████████████
 Generating validation split: 73 examples [00:00, 5328.37 examples/s]
 Converting and loading models...
 Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:09<00:00,  3.04s/it]
-/home/bigdl/anaconda3/envs/yina-llm/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:362: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
+/home/ipex/anaconda3/envs/yina-llm/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:362: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
   warnings.warn(
-/home/bigdl/anaconda3/envs/yina-llm/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:367: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
+/home/ipex/anaconda3/envs/yina-llm/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:367: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
   warnings.warn(
-/home/bigdl/anaconda3/envs/yina-llm/lib/python3.9/site-packages/transformers/generation/utils.py:1411: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation )
+/home/ipex/anaconda3/envs/yina-llm/lib/python3.9/site-packages/transformers/generation/utils.py:1411: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation )
   warnings.warn(
 Calibrating...
 Listening now...
@@ -202,7 +202,7 @@ Recognizing...
 Whisper : 
  What is AI?
 
-BigDL-LLM: 
+IPEX-LLM: 
  Artificial intelligence (AI) is the broader field of research and development aimed at creating machines that can perform tasks that typically require human intelligence,
 Listening now...
 Recognizing...
@@ -210,6 +210,6 @@ Recognizing...
 Whisper : 
  Tell me something about Intel
 
-BigDL-LLM: 
+IPEX-LLM: 
  Intel is a well-known technology company that specializes in designing, manufacturing, and selling computer hardware components and semiconductor products.
 ```
\ No newline at end of file
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper/readme.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper/readme.md
index 17046ef9..97ab5496 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper/readme.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper/readme.md
@@ -1,12 +1,12 @@
 # Whisper
 
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Whisper models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) as a reference Whisper model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Whisper models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) as a reference Whisper model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Recognize Tokens using `generate()` API
-In the example [recognize.py](./recognize.py), we show a basic use case for a Whisper model to conduct transcription using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [recognize.py](./recognize.py), we show a basic use case for a Whisper model to conduct transcription using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -14,7 +14,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install datasets soundfile librosa # required by audio processing
 ```
 
@@ -24,7 +24,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install datasets soundfile librosa # required by audio processing
 ```
 
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yi/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yi/README.md
index 1ef23e2b..2b69a6d8 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yi/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yi/README.md
@@ -1,22 +1,22 @@
 # Yi
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Yi models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [01-ai/Yi-6B](https://huggingface.co/01-ai/Yi-6B) as a reference Yi model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Yi models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [01-ai/Yi-6B](https://huggingface.co/01-ai/Yi-6B) as a reference Yi model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Yi model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Yi model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install einops # additional package required for Yi-6B to conduct generation
 ```
 
@@ -26,7 +26,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install einops # additional package required for Yi-6B to conduct generation
 ```
 
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yuan2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yuan2/README.md
index 8d6d3f26..5dcd470a 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yuan2/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/yuan2/README.md
@@ -1,13 +1,13 @@
 # Yuan2
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Yuan2 models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [IEITYuan/Yuan2-2B-hf](https://huggingface.co/IEITYuan/Yuan2-2B-hf) as a reference Yuan2 model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Yuan2 models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [IEITYuan/Yuan2-2B-hf](https://huggingface.co/IEITYuan/Yuan2-2B-hf) as a reference Yuan2 model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 In addition, you need to modify some files in Yuan2-2B-hf folder, since Flash attention dependency is for CUDA usage and currently cannot be installed on Intel CPUs. To manually turn it off, please refer to [this issue](https://github.com/IEIT-Yuan/Yuan-2.0/issues/92). We also provide two modified files([config.json](yuan2-2B-instruct/config.json) and [yuan_hf_model.py](yuan2-2B-instruct/yuan_hf_model.py)), which can be used to replace the original content in config.json and yuan_hf_model.py.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for an Yuan2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for an Yuan2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -15,7 +15,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install einops # additional package required for Yuan2 to conduct generation
 pip install pandas # additional package required for Yuan2 to conduct generation
 ```
@@ -25,7 +25,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install einops # additional package required for Yuan2 to conduct generation
 ```
 
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types/README.md
index 3fd08df5..8c1d5193 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types/README.md
@@ -1,6 +1,6 @@
-# BigDL-LLM Transformers Low-Bit Inference Pipeline (FP8, FP4, INT4 and more)
+# IPEX-LLM Transformers Low-Bit Inference Pipeline (FP8, FP4, INT4 and more)
 
-In this example, we show a pipeline to apply BigDL-LLM low-bit optimizations (including **FP8/INT8/MixedFP8/FP4/INT4/MixedFP4**) to any Hugging Face Transformers model, and then run inference on the optimized low-bit model.
+In this example, we show a pipeline to apply IPEX-LLM low-bit optimizations (including **FP8/INT8/MixedFP8/FP4/INT4/MixedFP4**) to any Hugging Face Transformers model, and then run inference on the optimized low-bit model.
 
 ## Prepare Environment
 We suggest using conda to manage environment:
@@ -9,7 +9,7 @@ conda create -n llm python=3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ## Run Example
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/README.md
index 18281167..a9ea3911 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/README.md
@@ -1,6 +1,6 @@
-# Running Hugging Face Transformers model using BigDL-LLM on Intel GPU
+# Running Hugging Face Transformers model using IPEX-LLM on Intel GPU
 
-This folder contains examples of running any Hugging Face Transformers model on BigDL-LLM (using the standard AutoModel APIs):
+This folder contains examples of running any Hugging Face Transformers model on IPEX-LLM (using the standard AutoModel APIs):
 
 - [Model](Model): examples of running Hugging Face Transformers models (e.g., LLaMA2, ChatGLM2, Falcon, MPT, Baichuan2, etc.) using INT4 optimizations
 - [More-Data-Types](More-Data-Types): examples of applying other low bit optimizations (NF4/INT5/INT8, etc.)
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Save-Load/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Save-Load/README.md
index 1fd08499..dd5286c1 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Save-Load/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Save-Load/README.md
@@ -1,9 +1,9 @@
-# Save/Load Low-Bit Models with BigDL-LLM Optimizations
+# Save/Load Low-Bit Models with IPEX-LLM Optimizations
 
-In this directory, you will find example on how you could save/load models with BigDL-LLM INT4 optimizations on Llama2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) and [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as reference Llama2 models.
+In this directory, you will find example on how you could save/load models with IPEX-LLM INT4 optimizations on Llama2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) and [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as reference Llama2 models.
 
 ## 0. Requirements
-To run this example with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../README.md#system-support) for more information.
+To run this example with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../../README.md#system-support) for more information.
 
 ## Example: Save/Load Model in Low-Bit Optimization
 In the example [generate.py](./generate.py), we show a basic use case of saving/loading model in low-bit optimizations to predict the next N tokens using `generate()` API. Also, saving and loading operations are platform-independent, so you could run it on different platforms.
@@ -14,7 +14,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -23,7 +23,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/LLM-Finetuning/DPO/README.md b/python/llm/example/GPU/LLM-Finetuning/DPO/README.md
index 3992f16c..873dce6d 100644
--- a/python/llm/example/GPU/LLM-Finetuning/DPO/README.md
+++ b/python/llm/example/GPU/LLM-Finetuning/DPO/README.md
@@ -1,10 +1,10 @@
-# Simple Example of DPO Finetuning with BigDL-LLM
+# Simple Example of DPO Finetuning with IPEX-LLM
 
-This simple example demonstrates how to finetune a Mistral-7B model use BigDL-LLM 4bit optimizations using [Intel GPUs](../../README.md).
+This simple example demonstrates how to finetune a Mistral-7B model use IPEX-LLM 4bit optimizations using [Intel GPUs](../../README.md).
 Note, this example is just used for illustrating related usage.
 
 ## 0. Requirements
-To run this example with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../README.md#requirements) for more information.
+To run this example with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../README.md#requirements) for more information.
 
 ## Example: Finetune Mistral-7b using DPO
 
@@ -16,7 +16,7 @@ This example is ported from [Fine_tune_a_Mistral_7b_model_with_DPO](https://gith
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.34.0 datasets
 pip install trl peft==0.5.0
 pip install accelerate==0.23.0
diff --git a/python/llm/example/GPU/LLM-Finetuning/HF-PEFT/README.md b/python/llm/example/GPU/LLM-Finetuning/HF-PEFT/README.md
index 96a6ebba..bd9aebfd 100644
--- a/python/llm/example/GPU/LLM-Finetuning/HF-PEFT/README.md
+++ b/python/llm/example/GPU/LLM-Finetuning/HF-PEFT/README.md
@@ -1,11 +1,11 @@
 # Finetuning on Intel GPU using Hugging Face PEFT code
 
-This example demonstrates how to easily run LLM finetuning application of PEFT use BigDL-LLM 4bit optimizations using [Intel GPUs](../../../README.md). By applying BigDL-LLM patch, you could run Hugging Face PEFT code on Intel GPUs using BigDL-LLM optimization without modification.
+This example demonstrates how to easily run LLM finetuning application of PEFT use IPEX-LLM 4bit optimizations using [Intel GPUs](../../../README.md). By applying IPEX-LLM patch, you could run Hugging Face PEFT code on Intel GPUs using IPEX-LLM optimization without modification.
 
 Note, this example is just used for illustrating related usage and don't guarantee convergence of training.
 
 ### 0. Requirements
-To run this example with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../README.md#requirements) for more information.
+To run this example with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../README.md#requirements) for more information.
 
 ### 1. Install
 
@@ -13,7 +13,7 @@ To run this example with BigDL-LLM on Intel GPUs, we have some recommended requi
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.34.0 datasets
 pip install fire peft==0.5.0
 pip install oneccl_bind_pt==2.1.100 -f https://developer.intel.com/ipex-whl-stable-xpu # necessary to run distributed finetuning
diff --git a/python/llm/example/GPU/LLM-Finetuning/LoRA/README.md b/python/llm/example/GPU/LLM-Finetuning/LoRA/README.md
index 73740a11..b65f77c9 100644
--- a/python/llm/example/GPU/LLM-Finetuning/LoRA/README.md
+++ b/python/llm/example/GPU/LLM-Finetuning/LoRA/README.md
@@ -1,9 +1,9 @@
-# LoRA Finetuning with BigDL-LLM
+# LoRA Finetuning with IPEX-LLM
 
-This example ports [Alpaca-LoRA](https://github.com/tloen/alpaca-lora/tree/main) to BigDL-LLM (using [LoRA](https://arxiv.org/abs/2106.09685) algorithm) on [Intel GPU](../../README.md).
+This example ports [Alpaca-LoRA](https://github.com/tloen/alpaca-lora/tree/main) to IPEX-LLM (using [LoRA](https://arxiv.org/abs/2106.09685) algorithm) on [Intel GPU](../../README.md).
 
 ### 0. Requirements
-To run this example with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../README.md#requirements) for more information.
+To run this example with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../README.md#requirements) for more information.
 
 ### 1. Install
 
@@ -11,7 +11,7 @@ To run this example with BigDL-LLM on Intel GPUs, we have some recommended requi
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.34.0 datasets
 pip install fire peft==0.5.0
 pip install oneccl_bind_pt==2.1.100 -f https://developer.intel.com/ipex-whl-stable-xpu # necessary to run distributed finetuning
@@ -58,8 +58,8 @@ bash lora_finetune_llama2_7b_pvc_1550_4_card.sh
 python ./alpaca_lora_finetuning.py \
     --base_model "meta-llama/Llama-2-7b-hf" \
     --data_path "yahma/alpaca-cleaned" \
-    --output_dir "./bigdl-qlora-alpaca" \
-    --resume_from_checkpoint "./bigdl-qlora-alpaca/checkpoint-1100"
+    --output_dir "./ipex-qlora-alpaca" \
+    --resume_from_checkpoint "./ipex-qlora-alpaca/checkpoint-1100"
 ```
 
 ### 5. Sample Output
diff --git a/python/llm/example/GPU/LLM-Finetuning/LoRA/lora_finetune_llama2_7b_arc_1_card.sh b/python/llm/example/GPU/LLM-Finetuning/LoRA/lora_finetune_llama2_7b_arc_1_card.sh
index 44824868..42055fc2 100644
--- a/python/llm/example/GPU/LLM-Finetuning/LoRA/lora_finetune_llama2_7b_arc_1_card.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/LoRA/lora_finetune_llama2_7b_arc_1_card.sh
@@ -20,6 +20,6 @@ python ./alpaca_lora_finetuning.py \
     --batch_size 128 \
     --base_model "meta-llama/Llama-2-7b-hf" \
     --data_path "yahma/alpaca-cleaned" \
-    --output_dir "./bigdl-lora-alpaca" \
+    --output_dir "./ipex-lora-alpaca" \
     --gradient_checkpointing True \
     --lora_target_modules "['k_proj', 'q_proj', 'o_proj', 'v_proj']"
diff --git a/python/llm/example/GPU/LLM-Finetuning/LoRA/lora_finetune_llama2_7b_pvc_1110_4_card.sh b/python/llm/example/GPU/LLM-Finetuning/LoRA/lora_finetune_llama2_7b_pvc_1110_4_card.sh
index cdeabfc1..e0c689a7 100644
--- a/python/llm/example/GPU/LLM-Finetuning/LoRA/lora_finetune_llama2_7b_pvc_1110_4_card.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/LoRA/lora_finetune_llama2_7b_pvc_1110_4_card.sh
@@ -25,6 +25,6 @@ mpirun -n 4 \
     --batch_size 128 \
     --base_model "meta-llama/Llama-2-7b-hf" \
     --data_path "yahma/alpaca-cleaned" \
-    --output_dir "./bigdl-lora-alpaca" \
+    --output_dir "./ipex-lora-alpaca" \
     --gradient_checkpointing True \
     --lora_target_modules "['k_proj', 'q_proj', 'o_proj', 'v_proj', 'up_proj', 'down_proj', 'gate_proj']"
diff --git a/python/llm/example/GPU/LLM-Finetuning/LoRA/lora_finetune_llama2_7b_pvc_1550_1_tile.sh b/python/llm/example/GPU/LLM-Finetuning/LoRA/lora_finetune_llama2_7b_pvc_1550_1_tile.sh
index a9d1ca70..cdde017b 100644
--- a/python/llm/example/GPU/LLM-Finetuning/LoRA/lora_finetune_llama2_7b_pvc_1550_1_tile.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/LoRA/lora_finetune_llama2_7b_pvc_1550_1_tile.sh
@@ -20,6 +20,6 @@ python ./alpaca_lora_finetuning.py \
     --batch_size 128 \
     --base_model "meta-llama/Llama-2-7b-hf" \
     --data_path "yahma/alpaca-cleaned" \
-    --output_dir "./bigdl-lora-alpaca" \
+    --output_dir "./ipex-lora-alpaca" \
     --gradient_checkpointing True \
     --lora_target_modules "['k_proj', 'q_proj', 'o_proj', 'v_proj', 'up_proj', 'down_proj', 'gate_proj']"
diff --git a/python/llm/example/GPU/LLM-Finetuning/LoRA/lora_finetune_llama2_7b_pvc_1550_4_card.sh b/python/llm/example/GPU/LLM-Finetuning/LoRA/lora_finetune_llama2_7b_pvc_1550_4_card.sh
index 09272b84..6073f0e9 100644
--- a/python/llm/example/GPU/LLM-Finetuning/LoRA/lora_finetune_llama2_7b_pvc_1550_4_card.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/LoRA/lora_finetune_llama2_7b_pvc_1550_4_card.sh
@@ -25,6 +25,6 @@ mpirun -n 8 \
     --batch_size 128 \
     --base_model "meta-llama/Llama-2-7b-hf" \
     --data_path "yahma/alpaca-cleaned" \
-    --output_dir "./bigdl-lora-alpaca" \
+    --output_dir "./ipex-lora-alpaca" \
     --gradient_checkpointing False \
     --lora_target_modules "['k_proj', 'q_proj', 'o_proj', 'v_proj', 'up_proj', 'down_proj', 'gate_proj']"
diff --git a/python/llm/example/GPU/LLM-Finetuning/QA-LoRA/README.md b/python/llm/example/GPU/LLM-Finetuning/QA-LoRA/README.md
index 9b237298..3f046123 100644
--- a/python/llm/example/GPU/LLM-Finetuning/QA-LoRA/README.md
+++ b/python/llm/example/GPU/LLM-Finetuning/QA-LoRA/README.md
@@ -1,9 +1,9 @@
-# QA-LoRA Finetuning with BigDL-LLM
+# QA-LoRA Finetuning with IPEX-LLM
 
-This example ports [Alpaca-LoRA](https://github.com/tloen/alpaca-lora/tree/main) to BigDL-LLM (using [QA-LoRA](https://arxiv.org/abs/2309.14717) algorithm) on [Intel GPU](../../README.md).
+This example ports [Alpaca-LoRA](https://github.com/tloen/alpaca-lora/tree/main) to IPEX-LLM (using [QA-LoRA](https://arxiv.org/abs/2309.14717) algorithm) on [Intel GPU](../../README.md).
 
 ### 0. Requirements
-To run this example with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../README.md#requirements) for more information.
+To run this example with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../README.md#requirements) for more information.
 
 ### 1. Install
 
@@ -11,7 +11,7 @@ To run this example with BigDL-LLM on Intel GPUs, we have some recommended requi
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.34.0 datasets
 pip install fire peft==0.5.0
 pip install oneccl_bind_pt==2.1.100 -f https://developer.intel.com/ipex-whl-stable-xpu # necessary to run distributed finetuning
@@ -52,8 +52,8 @@ bash qalora_finetune_llama2_7b_pvc_1550_1_tile.sh
 python ./alpaca_qalora_finetuning.py \
     --base_model "meta-llama/Llama-2-7b-hf" \
     --data_path "yahma/alpaca-cleaned" \
-    --output_dir "./bigdl-qlora-alpaca" \
-    --resume_from_checkpoint "./bigdl-qlora-alpaca/checkpoint-1100"
+    --output_dir "./ipex-qlora-alpaca" \
+    --resume_from_checkpoint "./ipex-qlora-alpaca/checkpoint-1100"
 ```
 
 ### 5. Sample Output
diff --git a/python/llm/example/GPU/LLM-Finetuning/QA-LoRA/qalora_finetune_llama2_7b_arc_1_card.sh b/python/llm/example/GPU/LLM-Finetuning/QA-LoRA/qalora_finetune_llama2_7b_arc_1_card.sh
index 842487e7..efa56139 100644
--- a/python/llm/example/GPU/LLM-Finetuning/QA-LoRA/qalora_finetune_llama2_7b_arc_1_card.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/QA-LoRA/qalora_finetune_llama2_7b_arc_1_card.sh
@@ -18,7 +18,7 @@
 python ./alpaca_qalora_finetuning.py \
     --base_model "meta-llama/Llama-2-7b-hf" \
     --data_path "yahma/alpaca-cleaned" \
-    --output_dir "./bigdl-qlora-alpaca" \
+    --output_dir "./ipex-qlora-alpaca" \
     --learning_rate 9e-5 \
     --micro_batch_size 2 \
     --batch_size 128 \
diff --git a/python/llm/example/GPU/LLM-Finetuning/QA-LoRA/qalora_finetune_llama2_7b_arc_2_card.sh b/python/llm/example/GPU/LLM-Finetuning/QA-LoRA/qalora_finetune_llama2_7b_arc_2_card.sh
index f6a0d493..d30fb7ae 100644
--- a/python/llm/example/GPU/LLM-Finetuning/QA-LoRA/qalora_finetune_llama2_7b_arc_2_card.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/QA-LoRA/qalora_finetune_llama2_7b_arc_2_card.sh
@@ -23,7 +23,7 @@ mpirun -n 2 \
        python -u ./alpaca_qalora_finetuning.py \
        --base_model "meta-llama/Llama-2-7b-hf" \
        --data_path "yahma/alpaca-cleaned" \
-       --output_dir "./bigdl-qlora-alpaca" \
+       --output_dir "./ipex-qlora-alpaca" \
        --learning_rate 9e-5 \
        --micro_batch_size 2 \
        --batch_size 128 \
diff --git a/python/llm/example/GPU/LLM-Finetuning/QA-LoRA/qalora_finetune_llama2_7b_pvc_1550_1_card.sh b/python/llm/example/GPU/LLM-Finetuning/QA-LoRA/qalora_finetune_llama2_7b_pvc_1550_1_card.sh
index 34df1a42..70d47833 100644
--- a/python/llm/example/GPU/LLM-Finetuning/QA-LoRA/qalora_finetune_llama2_7b_pvc_1550_1_card.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/QA-LoRA/qalora_finetune_llama2_7b_pvc_1550_1_card.sh
@@ -23,7 +23,7 @@ mpirun -n 2 \
        python -u ./alpaca_qalora_finetuning.py \
        --base_model "meta-llama/Llama-2-7b-hf" \
        --data_path "yahma/alpaca-cleaned" \
-       --output_dir "./bigdl-qlora-alpaca" \
+       --output_dir "./ipex-qlora-alpaca" \
        --learning_rate 9e-5 \
        --micro_batch_size 8 \
        --batch_size 128 \
diff --git a/python/llm/example/GPU/LLM-Finetuning/QA-LoRA/qalora_finetune_llama2_7b_pvc_1550_1_tile.sh b/python/llm/example/GPU/LLM-Finetuning/QA-LoRA/qalora_finetune_llama2_7b_pvc_1550_1_tile.sh
index 0cdd196e..ed244b83 100644
--- a/python/llm/example/GPU/LLM-Finetuning/QA-LoRA/qalora_finetune_llama2_7b_pvc_1550_1_tile.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/QA-LoRA/qalora_finetune_llama2_7b_pvc_1550_1_tile.sh
@@ -19,7 +19,7 @@
 python ./alpaca_qalora_finetuning.py \
     --base_model "meta-llama/Llama-2-7b-hf" \
     --data_path "yahma/alpaca-cleaned" \
-    --output_dir "./bigdl-qlora-alpaca" \
+    --output_dir "./ipex-qlora-alpaca" \
     --learning_rate 9e-5 \
     --micro_batch_size 8 \
     --batch_size 128 \
diff --git a/python/llm/example/GPU/LLM-Finetuning/QLoRA/README.md b/python/llm/example/GPU/LLM-Finetuning/QLoRA/README.md
index 2afebf3d..214f8e6f 100644
--- a/python/llm/example/GPU/LLM-Finetuning/QLoRA/README.md
+++ b/python/llm/example/GPU/LLM-Finetuning/QLoRA/README.md
@@ -1,5 +1,5 @@
-# QLoRA Finetuning with BigDL-LLM
+# QLoRA Finetuning with IPEX-LLM
 
-We provide [Alpaca-QLoRA example](./alpaca-qlora/), which ports [Alpaca-LoRA](https://github.com/tloen/alpaca-lora/tree/main) to BigDL-LLM (using [QLoRA](https://arxiv.org/abs/2305.14314) algorithm) on [Intel GPU](../../README.md).
+We provide [Alpaca-QLoRA example](./alpaca-qlora/), which ports [Alpaca-LoRA](https://github.com/tloen/alpaca-lora/tree/main) to IPEX-LLM (using [QLoRA](https://arxiv.org/abs/2305.14314) algorithm) on [Intel GPU](../../README.md).
 
-Meanwhile, we also provide a [simple example](./simple-example/) to help you get started with QLoRA Finetuning using BigDL-LLM, and [TRL example](./trl-example/) to help you get started with QLoRA Finetuning using BigDL-LLM and TRL library.
+Meanwhile, we also provide a [simple example](./simple-example/) to help you get started with QLoRA Finetuning using IPEX-LLM, and [TRL example](./trl-example/) to help you get started with QLoRA Finetuning using IPEX-LLM and TRL library.
diff --git a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/README.md b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/README.md
index afa0dc37..f43c2a8e 100644
--- a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/README.md
+++ b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/README.md
@@ -1,11 +1,11 @@
-# QLoRA Finetuning with BigDL-LLM
+# QLoRA Finetuning with IPEX-LLM
 
-This example ports [Alpaca-LoRA](https://github.com/tloen/alpaca-lora/tree/main) to BigDL-LLM (using [QLoRA](https://arxiv.org/abs/2305.14314) algorithm) on [Intel GPU](../../../README.md).
+This example ports [Alpaca-LoRA](https://github.com/tloen/alpaca-lora/tree/main) to IPEX-LLM (using [QLoRA](https://arxiv.org/abs/2305.14314) algorithm) on [Intel GPU](../../../README.md).
 
 > Note: You could also refer to [simple QLoRA example](../simple-example/) to try related usage.
 
 ### 0. Requirements
-To run this example with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run this example with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ### 1. Install
 
@@ -13,7 +13,7 @@ To run this example with BigDL-LLM on Intel GPUs, we have some recommended requi
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.34.0 datasets
 pip install fire peft==0.5.0
 pip install oneccl_bind_pt==2.1.100 -f https://developer.intel.com/ipex-whl-stable-xpu # necessary to run distributed finetuning
@@ -113,7 +113,7 @@ bash qlora_finetune_llama2_13b_pvc_1550_4_card.sh
 <details>
   <summary> Show LLaMA2-70B examples </summary>
 
-Different from `LLaMA2-7B` and `LLaMA2-13B`, it is recommonded to save the model with bigdl-llm low-bit optimization first to avoid large amount of CPU memory usage. And DeepSpeed ZeRO2 technology is used during finetuning.
+Different from `LLaMA2-7B` and `LLaMA2-13B`, it is recommonded to save the model with ipex-llm low-bit optimization first to avoid large amount of CPU memory usage. And DeepSpeed ZeRO2 technology is used during finetuning.
 
 ##### Finetuning LLaMA2-70B on one Intel Data Center GPU Max 1550
 
@@ -135,8 +135,8 @@ If you fail to complete the whole finetuning process, it is suggested to resume
 python ./alpaca_qlora_finetuning.py \
     --base_model "meta-llama/Llama-2-7b-hf" \
     --data_path "yahma/alpaca-cleaned" \
-    --output_dir "./bigdl-qlora-alpaca" \
-    --resume_from_checkpoint "./bigdl-qlora-alpaca/checkpoint-1100"
+    --output_dir "./ipex-qlora-alpaca" \
+    --resume_from_checkpoint "./ipex-qlora-alpaca/checkpoint-1100"
 ```
 
 ### 5. Sample Output
diff --git a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_13b_pvc_1550_1_card.sh b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_13b_pvc_1550_1_card.sh
index c7c9e934..bafbc62e 100644
--- a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_13b_pvc_1550_1_card.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_13b_pvc_1550_1_card.sh
@@ -23,6 +23,6 @@ mpirun -n 2 \
        python -u ./alpaca_qlora_finetuning.py \
        --base_model "meta-llama/Llama-2-13b-hf" \
        --data_path "yahma/alpaca-cleaned" \
-       --output_dir "./bigdl-qlora-alpaca" \
+       --output_dir "./ipex-qlora-alpaca" \
        --micro_batch_size 8 \
        --batch_size 128 > training.log
diff --git a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_13b_pvc_1550_1_tile.sh b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_13b_pvc_1550_1_tile.sh
index ef656602..5a5f8c1d 100644
--- a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_13b_pvc_1550_1_tile.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_13b_pvc_1550_1_tile.sh
@@ -18,6 +18,6 @@
 python ./alpaca_qlora_finetuning.py \
     --base_model "meta-llama/Llama-2-13b-hf" \
     --data_path "yahma/alpaca-cleaned" \
-    --output_dir "./bigdl-qlora-alpaca" \
+    --output_dir "./ipex-qlora-alpaca" \
     --micro_batch_size 8 \
     --batch_size 128
diff --git a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_13b_pvc_1550_4_card.sh b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_13b_pvc_1550_4_card.sh
index 18a3f242..f03716d2 100644
--- a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_13b_pvc_1550_4_card.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_13b_pvc_1550_4_card.sh
@@ -23,6 +23,6 @@ mpirun -n 8 \
        python -u ./alpaca_qlora_finetuning.py \
        --base_model "meta-llama/Llama-2-13b-hf" \
        --data_path "yahma/alpaca-cleaned" \
-       --output_dir "./bigdl-qlora-alpaca" \
+       --output_dir "./ipex-qlora-alpaca" \
        --micro_batch_size 8 \
        --batch_size 128 > training.log
diff --git a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_70b_pvc_1550_1_card.sh b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_70b_pvc_1550_1_card.sh
index a5326359..9f7bf380 100644
--- a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_70b_pvc_1550_1_card.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_70b_pvc_1550_1_card.sh
@@ -14,7 +14,7 @@
 # limitations under the License.
 #
 
-# save Llama-2-70b-hf model with bigdl-llm low-bit optimization first
+# save Llama-2-70b-hf model with ipex-llm low-bit optimization first
 python save_low_bit_70b_model.py --output_path "./llama-2-70b-hf-nf4"
 
 export MASTER_ADDR=127.0.0.1
@@ -27,7 +27,7 @@ mpirun -n 2 \
        python -u ./alpaca_qlora_finetuning.py \
        --base_model "meta-llama/Llama-2-70b-hf" \
        --data_path "yahma/alpaca-cleaned" \
-       --output_dir "./bigdl-qlora-alpaca" \
+       --output_dir "./ipex-qlora-alpaca" \
        --gradient_checkpointing True \
        --micro_batch_size 8 \
        --batch_size 128 \
diff --git a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_70b_pvc_1550_4_card.sh b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_70b_pvc_1550_4_card.sh
index e647b0a0..9dead743 100644
--- a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_70b_pvc_1550_4_card.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_70b_pvc_1550_4_card.sh
@@ -14,7 +14,7 @@
 # limitations under the License.
 #
 
-# save Llama-2-70b-hf model with bigdl-llm low-bit optimization first
+# save Llama-2-70b-hf model with ipex-llm low-bit optimization first
 python save_low_bit_70b_model.py --output_path "./llama-2-70b-hf-nf4"
 
 export MASTER_ADDR=127.0.0.1
@@ -27,7 +27,7 @@ mpirun -n 8 \
        python -u ./alpaca_qlora_finetuning.py \
        --base_model "meta-llama/Llama-2-70b-hf" \
        --data_path "yahma/alpaca-cleaned" \
-       --output_dir "./bigdl-qlora-alpaca" \
+       --output_dir "./ipex-qlora-alpaca" \
        --gradient_checkpointing True \
        --micro_batch_size 8 \
        --batch_size 128 \
diff --git a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_arc_1_card.sh b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_arc_1_card.sh
index b6040456..12056c20 100644
--- a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_arc_1_card.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_arc_1_card.sh
@@ -18,4 +18,4 @@
 python ./alpaca_qlora_finetuning.py \
     --base_model "meta-llama/Llama-2-7b-hf" \
     --data_path "yahma/alpaca-cleaned" \
-    --output_dir "./bigdl-qlora-alpaca"
+    --output_dir "./ipex-qlora-alpaca"
diff --git a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_arc_2_card.sh b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_arc_2_card.sh
index ef1c6ab0..cb10a142 100644
--- a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_arc_2_card.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_arc_2_card.sh
@@ -23,4 +23,4 @@ mpirun -n 2 \
        python -u ./alpaca_qlora_finetuning.py \
        --base_model "meta-llama/Llama-2-7b-hf" \
        --data_path "yahma/alpaca-cleaned" \
-       --output_dir "./bigdl-qlora-alpaca" > training.log
+       --output_dir "./ipex-qlora-alpaca" > training.log
diff --git a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_flex_170_1_card.sh b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_flex_170_1_card.sh
index 542aecae..316d4cc5 100644
--- a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_flex_170_1_card.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_flex_170_1_card.sh
@@ -20,4 +20,4 @@ python ./alpaca_qlora_finetuning.py \
     --batch_size 128 \
     --base_model "meta-llama/Llama-2-7b-hf" \
     --data_path "yahma/alpaca-cleaned" \
-    --output_dir "./bigdl-qlora-alpaca"
+    --output_dir "./ipex-qlora-alpaca"
diff --git a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_flex_170_3_card.sh b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_flex_170_3_card.sh
index 4b28d255..bc9b4dcf 100644
--- a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_flex_170_3_card.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_flex_170_3_card.sh
@@ -23,7 +23,7 @@ mpirun -n 3 \
        python -u ./alpaca_qlora_finetuning.py \
        --base_model "meta-llama/Llama-2-7b-hf" \
        --data_path "yahma/alpaca-cleaned" \
-       --output_dir "./bigdl-qlora-alpaca" \
+       --output_dir "./ipex-qlora-alpaca" \
        --gradient_checkpointing False \
        --micro_batch_size 2 \
        --batch_size 128 > training.log
diff --git a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_pvc_1100_1_card.sh b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_pvc_1100_1_card.sh
index ba3409c7..52e1a304 100644
--- a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_pvc_1100_1_card.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_pvc_1100_1_card.sh
@@ -20,4 +20,4 @@ python ./alpaca_qlora_finetuning.py \
     --batch_size 128 \
     --base_model "meta-llama/Llama-2-7b-hf" \
     --data_path "yahma/alpaca-cleaned" \
-    --output_dir "./bigdl-qlora-alpaca"
+    --output_dir "./ipex-qlora-alpaca"
diff --git a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_pvc_1100_4_card.sh b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_pvc_1100_4_card.sh
index 213c29d6..8ee4dbbd 100644
--- a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_pvc_1100_4_card.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_pvc_1100_4_card.sh
@@ -23,6 +23,6 @@ mpirun -n 4 \
        python -u ./alpaca_qlora_finetuning.py \
        --base_model "meta-llama/Llama-2-7b-hf" \
        --data_path "yahma/alpaca-cleaned" \
-       --output_dir "./bigdl-qlora-alpaca" \
+       --output_dir "./ipex-qlora-alpaca" \
        --micro_batch_size 8 \
        --batch_size 128 > training.log
diff --git a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_pvc_1550_1_card.sh b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_pvc_1550_1_card.sh
index 9480cc72..272870e9 100644
--- a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_pvc_1550_1_card.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_pvc_1550_1_card.sh
@@ -23,6 +23,6 @@ mpirun -n 2 \
        python -u ./alpaca_qlora_finetuning.py \
        --base_model "meta-llama/Llama-2-7b-hf" \
        --data_path "yahma/alpaca-cleaned" \
-       --output_dir "./bigdl-qlora-alpaca" \
+       --output_dir "./ipex-qlora-alpaca" \
        --micro_batch_size 8 \
        --batch_size 128 > training.log
diff --git a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_pvc_1550_4_card.sh b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_pvc_1550_4_card.sh
index 8a82c8c6..801d88fc 100644
--- a/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_pvc_1550_4_card.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora/qlora_finetune_llama2_7b_pvc_1550_4_card.sh
@@ -23,6 +23,6 @@ mpirun -n 8 \
        python -u ./alpaca_qlora_finetuning.py \
        --base_model "meta-llama/Llama-2-7b-hf" \
        --data_path "yahma/alpaca-cleaned" \
-       --output_dir "./bigdl-qlora-alpaca" \
+       --output_dir "./ipex-qlora-alpaca" \
        --micro_batch_size 8 \
        --batch_size 128 > training.log
diff --git a/python/llm/example/GPU/LLM-Finetuning/QLoRA/simple-example/README.md b/python/llm/example/GPU/LLM-Finetuning/QLoRA/simple-example/README.md
index 43aa8146..2a94b584 100644
--- a/python/llm/example/GPU/LLM-Finetuning/QLoRA/simple-example/README.md
+++ b/python/llm/example/GPU/LLM-Finetuning/QLoRA/simple-example/README.md
@@ -1,10 +1,10 @@
-# Simple Example of QLoRA Finetuning with BigDL-LLM
+# Simple Example of QLoRA Finetuning with IPEX-LLM
 
-This simple example demonstrates how to finetune a llama2-7b model use BigDL-LLM 4bit optimizations using [Intel GPUs](../../../README.md).
+This simple example demonstrates how to finetune a llama2-7b model use IPEX-LLM 4bit optimizations using [Intel GPUs](../../../README.md).
 Note, this example is just used for illustrating related usage and don't guarantee convergence of training.
 
 ## 0. Requirements
-To run this example with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run this example with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Finetune llama2-7b using qlora
 
@@ -16,7 +16,7 @@ This example is referred to [bnb-4bit-training](https://colab.research.google.co
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.34.0 datasets
 pip install peft==0.5.0
 pip install accelerate==0.23.0
diff --git a/python/llm/example/GPU/LLM-Finetuning/QLoRA/trl-example/README.md b/python/llm/example/GPU/LLM-Finetuning/QLoRA/trl-example/README.md
index 353a8b10..1bce470c 100644
--- a/python/llm/example/GPU/LLM-Finetuning/QLoRA/trl-example/README.md
+++ b/python/llm/example/GPU/LLM-Finetuning/QLoRA/trl-example/README.md
@@ -1,10 +1,10 @@
-# Example of QLoRA Finetuning with BigDL-LLM
+# Example of QLoRA Finetuning with IPEX-LLM
 
-This simple example demonstrates how to finetune a llama2-7b model use BigDL-LLM 4bit optimizations with TRL library on [Intel GPU](../../../README.md).
+This simple example demonstrates how to finetune a llama2-7b model use IPEX-LLM 4bit optimizations with TRL library on [Intel GPU](../../../README.md).
 Note, this example is just used for illustrating related usage and don't guarantee convergence of training.
 
 ## 0. Requirements
-To run this example with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run this example with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Finetune llama2-7b using qlora
 
@@ -16,7 +16,7 @@ This example utilizes a subset of [yahma/alpaca-cleaned](https://huggingface.co/
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.34.0 datasets
 pip install peft==0.5.0
 pip install accelerate==0.23.0
diff --git a/python/llm/example/GPU/LLM-Finetuning/README.md b/python/llm/example/GPU/LLM-Finetuning/README.md
index a3cdccc7..e2ab7acb 100644
--- a/python/llm/example/GPU/LLM-Finetuning/README.md
+++ b/python/llm/example/GPU/LLM-Finetuning/README.md
@@ -1,6 +1,6 @@
-# Running LLM Finetuning using BigDL-LLM on Intel GPU
+# Running LLM Finetuning using IPEX-LLM on Intel GPU
 
-This folder contains examples of running different training mode with BigDL-LLM on Intel GPU:
+This folder contains examples of running different training mode with IPEX-LLM on Intel GPU:
 
 - [LoRA](LoRA): examples of running LoRA finetuning
 - [QLoRA](QLoRA): examples of running QLoRA finetuning
diff --git a/python/llm/example/GPU/LLM-Finetuning/ReLora/README.md b/python/llm/example/GPU/LLM-Finetuning/ReLora/README.md
index 084a6ef7..36045269 100644
--- a/python/llm/example/GPU/LLM-Finetuning/ReLora/README.md
+++ b/python/llm/example/GPU/LLM-Finetuning/ReLora/README.md
@@ -1,9 +1,9 @@
-# ReLoRA Finetuning with BigDL-LLM
+# ReLoRA Finetuning with IPEX-LLM
 
-This example ports [Alpaca-LoRA](https://github.com/tloen/alpaca-lora/tree/main) to BigDL-LLM (using [ReLoRA](https://arxiv.org/abs/2307.05695) algorithm) on [Intel GPU](../../README.md).
+This example ports [Alpaca-LoRA](https://github.com/tloen/alpaca-lora/tree/main) to IPEX-LLM (using [ReLoRA](https://arxiv.org/abs/2307.05695) algorithm) on [Intel GPU](../../README.md).
 
 ### 0. Requirements
-To run this example with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../README.md#requirements) for more information.
+To run this example with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../README.md#requirements) for more information.
 
 ### 1. Install
 
@@ -11,7 +11,7 @@ To run this example with BigDL-LLM on Intel GPUs, we have some recommended requi
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.34.0 datasets
 pip install fire peft==0.5.0
 pip install oneccl_bind_pt==2.1.100 -f https://developer.intel.com/ipex-whl-stable-xpu # necessary to run distributed finetuning
@@ -58,8 +58,8 @@ bash relora_finetune_llama2_7b_pvc_1550_4_card.sh
 python ./alpaca_relora_finetuning.py \
     --base_model "meta-llama/Llama-2-7b-hf" \
     --data_path "yahma/alpaca-cleaned" \
-    --output_dir "./bigdl-qlora-alpaca" \
-    --resume_from_checkpoint "./bigdl-qlora-alpaca/checkpoint-1100"
+    --output_dir "./ipex-qlora-alpaca" \
+    --resume_from_checkpoint "./ipex-qlora-alpaca/checkpoint-1100"
 ```
 
 ### 5. Sample Output
diff --git a/python/llm/example/GPU/LLM-Finetuning/ReLora/relora_finetune_llama2_7b_arc_1_card.sh b/python/llm/example/GPU/LLM-Finetuning/ReLora/relora_finetune_llama2_7b_arc_1_card.sh
index 6285469d..4bb00965 100644
--- a/python/llm/example/GPU/LLM-Finetuning/ReLora/relora_finetune_llama2_7b_arc_1_card.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/ReLora/relora_finetune_llama2_7b_arc_1_card.sh
@@ -18,6 +18,6 @@
 python ./alpaca_relora_finetuning.py \
     --base_model "meta-llama/Llama-2-7b-hf" \
     --data_path "yahma/alpaca-cleaned" \
-    --output_dir "./bigdl-relora-alpaca" \
+    --output_dir "./ipex-relora-alpaca" \
     --relora_steps 300 \
     --relora_warmup_steps 10
diff --git a/python/llm/example/GPU/LLM-Finetuning/ReLora/relora_finetune_llama2_7b_arc_2_card.sh b/python/llm/example/GPU/LLM-Finetuning/ReLora/relora_finetune_llama2_7b_arc_2_card.sh
index e14beced..3d66fd41 100644
--- a/python/llm/example/GPU/LLM-Finetuning/ReLora/relora_finetune_llama2_7b_arc_2_card.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/ReLora/relora_finetune_llama2_7b_arc_2_card.sh
@@ -23,6 +23,6 @@ mpirun -n 2 \
        python -u ./alpaca_relora_finetuning.py \
        --base_model "meta-llama/Llama-2-7b-hf" \
        --data_path "yahma/alpaca-cleaned" \
-       --output_dir "./bigdl-relora-alpaca" \
+       --output_dir "./ipex-relora-alpaca" \
        --relora_steps 300 \
        --relora_warmup_steps 10 > training.log
diff --git a/python/llm/example/GPU/LLM-Finetuning/ReLora/relora_finetune_llama2_7b_pvc_1550_1_card.sh b/python/llm/example/GPU/LLM-Finetuning/ReLora/relora_finetune_llama2_7b_pvc_1550_1_card.sh
index 2d1333db..042c25d5 100644
--- a/python/llm/example/GPU/LLM-Finetuning/ReLora/relora_finetune_llama2_7b_pvc_1550_1_card.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/ReLora/relora_finetune_llama2_7b_pvc_1550_1_card.sh
@@ -23,7 +23,7 @@ mpirun -n 2 \
        python -u ./alpaca_relora_finetuning.py \
        --base_model "meta-llama/Llama-2-7b-hf" \
        --data_path "yahma/alpaca-cleaned" \
-       --output_dir "./bigdl-relora-alpaca" \
+       --output_dir "./ipex-relora-alpaca" \
        --micro_batch_size 8 \
        --relora_steps 300 \
        --relora_warmup_steps 10 \
diff --git a/python/llm/example/GPU/LLM-Finetuning/ReLora/relora_finetune_llama2_7b_pvc_1550_4_card.sh b/python/llm/example/GPU/LLM-Finetuning/ReLora/relora_finetune_llama2_7b_pvc_1550_4_card.sh
index c0ae8982..c2f12c90 100644
--- a/python/llm/example/GPU/LLM-Finetuning/ReLora/relora_finetune_llama2_7b_pvc_1550_4_card.sh
+++ b/python/llm/example/GPU/LLM-Finetuning/ReLora/relora_finetune_llama2_7b_pvc_1550_4_card.sh
@@ -23,7 +23,7 @@ mpirun -n 8 \
        python -u ./alpaca_relora_finetuning.py \
        --base_model "meta-llama/Llama-2-7b-hf" \
        --data_path "yahma/alpaca-cleaned" \
-       --output_dir "./bigdl-relora-alpaca" \
+       --output_dir "./ipex-relora-alpaca" \
        --micro_batch_size 8 \
        --relora_steps 300 \
        --relora_warmup_steps 10 \
diff --git a/python/llm/example/GPU/LangChain/transformer_int4_gpu/README.md b/python/llm/example/GPU/LangChain/transformer_int4_gpu/README.md
index 630cfa10..9822aae8 100644
--- a/python/llm/example/GPU/LangChain/transformer_int4_gpu/README.md
+++ b/python/llm/example/GPU/LangChain/transformer_int4_gpu/README.md
@@ -1,9 +1,9 @@
 # Langchain examples
 
-The examples in this folder shows how to use [LangChain](https://www.langchain.com/) with `bigdl-llm` on Intel GPU.
+The examples in this folder shows how to use [LangChain](https://www.langchain.com/) with `ipex-llm` on Intel GPU.
 
-### 1. Install bigdl-llm
-Follow the instructions in [GPU Install Guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) to install bigdl-llm
+### 1. Install ipex-llm
+Follow the instructions in [GPU Install Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) to install ipex-llm
 
 ### 2. Install Required Dependencies for langchain examples. 
 
@@ -99,5 +99,5 @@ python rag.py -m <path_to_model> [-q QUESTION] [-i INPUT_PATH]
 ```
 arguments info:
 - `-m MODEL_PATH`: **required**, path to the model.
-- `-q QUESTION`: question to ask. Default is `What is BigDL?`.
+- `-q QUESTION`: question to ask. Default is `What is IPEX?`.
 - `-i INPUT_PATH`: path to the input doc.
\ No newline at end of file
diff --git a/python/llm/example/GPU/LlamaIndex/README.md b/python/llm/example/GPU/LlamaIndex/README.md
index 5eb836e4..74ac0fd0 100644
--- a/python/llm/example/GPU/LlamaIndex/README.md
+++ b/python/llm/example/GPU/LlamaIndex/README.md
@@ -1,7 +1,7 @@
 # LlamaIndex Examples
 
 
-This folder contains examples showcasing how to use [**LlamaIndex**](https://github.com/run-llama/llama_index) with `bigdl-llm`.
+This folder contains examples showcasing how to use [**LlamaIndex**](https://github.com/run-llama/llama_index) with `ipex-llm`.
 > [**LlamaIndex**](https://github.com/run-llama/llama_index) is a data framework designed to improve large language models by providing tools for easier data ingestion, management, and application integration. 
 
 
@@ -18,7 +18,7 @@ The RAG example ([rag.py](./rag.py)) is adapted from the [Official llama index R
     ```
 * **Install Bigdl LLM**
 
-    Follow the instructions in [GPU Install Guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) to install bigdl-llm.
+    Follow the instructions in [GPU Install Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) to install ipex-llm.
 
 * **Database Setup (using PostgreSQL)**:
     * Linux
diff --git a/python/llm/example/GPU/ModelScope-Models/README.md b/python/llm/example/GPU/ModelScope-Models/README.md
index c32e3d37..cfde3a4b 100644
--- a/python/llm/example/GPU/ModelScope-Models/README.md
+++ b/python/llm/example/GPU/ModelScope-Models/README.md
@@ -1,12 +1,12 @@
 # Run ModelScope Model
 
-In this directory, you will find example on how you could apply BigDL-LLM INT4 optimizations on ModelScope models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [ZhipuAI/chatglm3-6b](https://modelscope.cn/models/ZhipuAI/chatglm3-6b/summary) as a reference ModelScope model.
+In this directory, you will find example on how you could apply IPEX-LLM INT4 optimizations on ModelScope models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [ZhipuAI/chatglm3-6b](https://modelscope.cn/models/ZhipuAI/chatglm3-6b/summary) as a reference ModelScope model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM3 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM3 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -14,7 +14,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 # Refer to https://github.com/modelscope/modelscope/issues/765, please make sure you are using 1.11.0 version
 pip install modelscope==1.11.0
 ```
@@ -25,7 +25,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install modelscope
 ```
 
diff --git a/python/llm/example/GPU/ModelScope-Models/Save-Load/README.md b/python/llm/example/GPU/ModelScope-Models/Save-Load/README.md
index 1be85568..365ecbf2 100644
--- a/python/llm/example/GPU/ModelScope-Models/Save-Load/README.md
+++ b/python/llm/example/GPU/ModelScope-Models/Save-Load/README.md
@@ -1,9 +1,9 @@
-# Save/Load Low-Bit Models with BigDL-LLM Optimizations
+# Save/Load Low-Bit Models with IPEX-LLM Optimizations
 
-In this directory, you will find example on how you could save/load ModelScope models with BigDL-LLM INT4 optimizations on ModelScope models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [baichuan-inc/Baichuan2-7B-Chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat/summary) as a reference ModelScope model.
+In this directory, you will find example on how you could save/load ModelScope models with IPEX-LLM INT4 optimizations on ModelScope models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [baichuan-inc/Baichuan2-7B-Chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat/summary) as a reference ModelScope model.
 
 ## 0. Requirements
-To run this example with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../README.md#system-support) for more information.
+To run this example with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../../README.md#system-support) for more information.
 
 ## Example: Save/Load Model in Low-Bit Optimization
 In the example [generate.py](./generate.py), we show a basic use case of saving/loading model in low-bit optimizations to predict the next N tokens using `generate()` API. Also, saving and loading operations are platform-independent, so you could run it on different platforms.
@@ -14,7 +14,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install modelscope==1.11.0
 ```
 
@@ -24,7 +24,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install modelscope==1.11.0
 ```
 
diff --git a/python/llm/example/GPU/Pipeline-Parallel-Inference/README.md b/python/llm/example/GPU/Pipeline-Parallel-Inference/README.md
index b19275cd..00104653 100644
--- a/python/llm/example/GPU/Pipeline-Parallel-Inference/README.md
+++ b/python/llm/example/GPU/Pipeline-Parallel-Inference/README.md
@@ -1,20 +1,20 @@
-# Run BigDL-LLM on Multiple Intel GPUs in pipeline parallel fashion
+# Run IPEX-LLM on Multiple Intel GPUs in pipeline parallel fashion
 
-This example demonstrates how to run BigDL-LLM optimized low-bit model vertically partitioned on two [Intel GPUs](../README.md).
+This example demonstrates how to run IPEX-LLM optimized low-bit model vertically partitioned on two [Intel GPUs](../README.md).
 
 ## Requirements
-To run this example with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. For this particular example, you will need at least two GPUs on your machine.
+To run this example with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. For this particular example, you will need at least two GPUs on your machine.
 
 ## Example:
 
-### 1.1 Install BigDL-LLM
+### 1.1 Install IPEX-LLM
 
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
 # you can install specific ipex/torch version for your need
-pip install --pre --upgrade bigdl-llm[xpu_2.1] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu_2.1] -f https://developer.intel.com/ipex-whl-stable-xpu
 # configures OneAPI environment variables
 source /opt/intel/oneapi/setvars.sh
 
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/README.md b/python/llm/example/GPU/PyTorch-Models/Model/README.md
index a6c9fb0d..600d615c 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/README.md
@@ -1,5 +1,5 @@
-# BigDL-LLM INT4 Optimization for Large Language Model on Intel GPUs
-You can use `optimize_model` API to accelerate general PyTorch models on Intel GPUs. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
+# IPEX-LLM INT4 Optimization for Large Language Model on Intel GPUs
+You can use `optimize_model` API to accelerate general PyTorch models on Intel GPUs. This directory contains example scripts to help you quickly get started using IPEX-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
 
 
 
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/aquila2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/aquila2/README.md
index 66b702ed..7a7477f5 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/aquila2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/aquila2/README.md
@@ -1,22 +1,22 @@
 # Aquila2
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Aquila2 models. For illustration purposes, we utilize the [BAAI/AquilaChat2-7B](https://huggingface.co/BAAI/AquilaChat2-7B) as reference Aquila2 models.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Aquila2 models. For illustration purposes, we utilize the [BAAI/AquilaChat2-7B](https://huggingface.co/BAAI/AquilaChat2-7B) as reference Aquila2 models.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Aquila2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Aquila2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -25,7 +25,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/baichuan/README.md b/python/llm/example/GPU/PyTorch-Models/Model/baichuan/README.md
index c0a098af..d09ebc45 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/baichuan/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/baichuan/README.md
@@ -1,22 +1,22 @@
 # Baichuan
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Baichuan models. For illustration purposes, we utilize the [baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat) as reference Baichuan models.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Baichuan models. For illustration purposes, we utilize the [baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat) as reference Baichuan models.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Baichuan model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Baichuan model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers_stream_generator  # additional package required for Baichuan-13B-Chat to conduct generation
 ```
 
@@ -26,7 +26,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers_stream_generator  # additional package required for Baichuan-13B-Chat to conduct generation
 ```
 
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/baichuan2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/baichuan2/README.md
index c3e2b193..fa4c3618 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/baichuan2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/baichuan2/README.md
@@ -1,22 +1,22 @@
 # Baichuan2
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Baichuan2 models. For illustration purposes, we utilize the [baichuan-inc/Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan-7B-Chat) as reference Baichuan2 models.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Baichuan2 models. For illustration purposes, we utilize the [baichuan-inc/Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan-7B-Chat) as reference Baichuan2 models.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Baichuan2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Baichuan2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers_stream_generator  # additional package required for Baichuan2-7B-Chat to conduct generation
 ```
 
@@ -26,7 +26,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers_stream_generator  # additional package required for Baichuan2-7B-Chat to conduct generation
 ```
 
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/bark/README.md b/python/llm/example/GPU/PyTorch-Models/Model/bark/README.md
index 0ec27177..f054e256 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/bark/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/bark/README.md
@@ -1,22 +1,22 @@
 # Bark
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Bark models. For illustration purposes, we utilize the [suno/bark-small](https://huggingface.co/suno/bark-small) as reference Bark models.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Bark models. For illustration purposes, we utilize the [suno/bark-small](https://huggingface.co/suno/bark-small) as reference Bark models.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Synthesize speech with the given input text
-In the example [synthesize_speech.py](./synthesize_speech.py), we show a basic use case for Bark model to synthesize speech based on the given text, with BigDL-LLM INT4 optimizations.
+In the example [synthesize_speech.py](./synthesize_speech.py), we show a basic use case for Bark model to synthesize speech based on the given text, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install scipy
 ```
 
@@ -27,7 +27,7 @@ conda create -n llm python=3.9 libuv
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install scipy
 ```
 
@@ -102,19 +102,19 @@ There is no need to set further environment variables.
 ### 4. Running examples
 
 ```bash
-python ./synthesize_speech.py --text 'BigDL-LLM is a library for running large language model on Intel XPU with very low latency.'
+python ./synthesize_speech.py --text 'IPEX-LLM is a library for running large language model on Intel XPU with very low latency.'
 ```
 
 In the example, several arguments can be passed to satisfy your requirements:
 
 - `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the Bark model (e.g. `suno/bark-small` and `suno/bark`) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be `'suno/bark-small'`.
 - `--voice-preset`: argument defining the voice preset of model. It is default to be `'v2/en_speaker_6'`.
-- `--text TEXT`: argument defining the text to synthesize speech. It is default to be `"BigDL-LLM is a library for running large language model on Intel XPU with very low latency."`.
+- `--text TEXT`: argument defining the text to synthesize speech. It is default to be `"IPEX-LLM is a library for running large language model on Intel XPU with very low latency."`.
 
 #### 4.1 Sample Output
 
 #### [suno/bark-small](https://huggingface.co/suno/bark-small)
 
-Text: BigDL-LLM is a library for running large language model on Intel XPU with very low latency.
+Text: IPEX-LLM is a library for running large language model on Intel XPU with very low latency.
 
 [Click here to hear sample output.](https://llm-assets.readthedocs.io/en/latest/_downloads/e92874986553193acbd321d1cfe29739/bark-example-output.wav)
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/bluelm/README.md b/python/llm/example/GPU/PyTorch-Models/Model/bluelm/README.md
index a99a99ca..83e10e10 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/bluelm/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/bluelm/README.md
@@ -1,22 +1,22 @@
 # BlueLM
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate BlueLM models. For illustration purposes, we utilize the [vivo-ai/BlueLM-7B-Chat](https://huggingface.co/vivo-ai/BlueLM-7B-Chat) as reference BlueLM models.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate BlueLM models. For illustration purposes, we utilize the [vivo-ai/BlueLM-7B-Chat](https://huggingface.co/vivo-ai/BlueLM-7B-Chat) as reference BlueLM models.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a BlueLM model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a BlueLM model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -25,7 +25,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/chatglm2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/chatglm2/README.md
index 887d88bc..eec786ac 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/chatglm2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/chatglm2/README.md
@@ -1,22 +1,22 @@
 # ChatGLM2
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate ChatGLM2 models. For illustration purposes, we utilize the [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b) as reference ChatGLM2 models.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate ChatGLM2 models. For illustration purposes, we utilize the [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b) as reference ChatGLM2 models.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example 1: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -25,7 +25,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
@@ -129,18 +129,18 @@ Inference time: xxxx s
 ```
 
 ## Example 2: Stream Chat using `stream_chat()` API
-In the example [streamchat.py](./streamchat.py), we show a basic use case for a ChatGLM2 model to stream chat, with BigDL-LLM INT4 optimizations.
+In the example [streamchat.py](./streamchat.py), we show a basic use case for a ChatGLM2 model to stream chat, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -149,7 +149,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/chatglm3/README.md b/python/llm/example/GPU/PyTorch-Models/Model/chatglm3/README.md
index 84d148cc..e9360a79 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/chatglm3/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/chatglm3/README.md
@@ -1,22 +1,22 @@
 # ChatGLM3
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate ChatGLM3 models. For illustration purposes, we utilize the [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b) as reference ChatGLM3 models.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate ChatGLM3 models. For illustration purposes, we utilize the [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b) as reference ChatGLM3 models.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example 1: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM3 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM3 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -25,7 +25,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
@@ -128,18 +128,18 @@ AI stands for Artificial Intelligence. It refers to the development of computer
 ```
 
 ## Example 2: Stream Chat using `stream_chat()` API
-In the example [streamchat.py](./streamchat.py), we show a basic use case for a ChatGLM3 model to stream chat, with BigDL-LLM INT4 optimizations.
+In the example [streamchat.py](./streamchat.py), we show a basic use case for a ChatGLM3 model to stream chat, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -148,7 +148,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/codellama/README.md b/python/llm/example/GPU/PyTorch-Models/Model/codellama/README.md
index e1c9f7bc..9dfbab63 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/codellama/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/codellama/README.md
@@ -1,22 +1,22 @@
 # CodeLlama
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate CodeLlama models. For illustration purposes, we utilize the [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) as reference CodeLlama models.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate CodeLlama models. For illustration purposes, we utilize the [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) as reference CodeLlama models.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a CodeLlama model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a CodeLlama model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.34.1 # CodeLlamaTokenizer is supported in higher version of transformers
 ```
 
@@ -26,7 +26,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.34.1 # CodeLlamaTokenizer is supported in higher version of transformers
 ```
 
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/deciLM-7b/README.md b/python/llm/example/GPU/PyTorch-Models/Model/deciLM-7b/README.md
index 4c3ff88e..68af9aaa 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/deciLM-7b/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/deciLM-7b/README.md
@@ -1,16 +1,16 @@
 # DeciLM-7B
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate DeciLM-7B models. For illustration purposes, we utilize the [Deci/DeciLM-7B-instruct](https://huggingface.co/Deci/DeciLM-7B-instruct) as a reference DeciLM-7B model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate DeciLM-7B models. For illustration purposes, we utilize the [Deci/DeciLM-7B-instruct](https://huggingface.co/Deci/DeciLM-7B-instruct) as a reference DeciLM-7B model.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a DeciLM-7B model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a DeciLM-7B model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
@@ -18,7 +18,7 @@ conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.0.110+xpu as default
 # you can install specific ipex/torch version for your need
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.35.2 # required by DeciLM-7B
 ```
 
@@ -28,7 +28,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/deepseek/README.md b/python/llm/example/GPU/PyTorch-Models/Model/deepseek/README.md
index c9339543..efc0842b 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/deepseek/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/deepseek/README.md
@@ -1,23 +1,23 @@
 # Deepseek
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Deepseek models. For illustration purposes, we utilize the [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) as a reference Deepseek model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Deepseek models. For illustration purposes, we utilize the [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) as a reference Deepseek model.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Deepseek model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Deepseek model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.0.110+xpu as default
 # you can install specific ipex/torch version for your need
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -26,7 +26,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/distil-whisper/README.md b/python/llm/example/GPU/PyTorch-Models/Model/distil-whisper/README.md
index a3201f70..729adf18 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/distil-whisper/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/distil-whisper/README.md
@@ -1,22 +1,22 @@
 # Distil-Whisper
 
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API on Distil-Whisper models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2) as a reference Distil-Whisper model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API on Distil-Whisper models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2) as a reference Distil-Whisper model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Recognize Tokens using `generate()` API
-In the example [recognize.py](./recognize.py), we show a basic use case for a Distil-Whisper model to conduct transcription using `pipeline()` API for long audio input, with BigDL-LLM INT4 optimizations.
+In the example [recognize.py](./recognize.py), we show a basic use case for a Distil-Whisper model to conduct transcription using `pipeline()` API for long audio input, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install datasets soundfile librosa # required by audio processing
 ```
 
@@ -26,7 +26,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install datasets soundfile librosa # required by audio processing
 ```
 
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/dolly-v1/README.md b/python/llm/example/GPU/PyTorch-Models/Model/dolly-v1/README.md
index 28c04d8d..18954452 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/dolly-v1/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/dolly-v1/README.md
@@ -1,22 +1,22 @@
 # Dolly v1
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Dolly v1 models. For illustration purposes, we utilize the [databricks/dolly-v1-6b](https://huggingface.co/databricks/dolly-v1-6b) as reference Dolly v1 models.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Dolly v1 models. For illustration purposes, we utilize the [databricks/dolly-v1-6b](https://huggingface.co/databricks/dolly-v1-6b) as reference Dolly v1 models.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Dolly v1 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Dolly v1 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -25,7 +25,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/dolly-v2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/dolly-v2/README.md
index 485212a0..0becfdac 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/dolly-v2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/dolly-v2/README.md
@@ -1,22 +1,22 @@
 # Dolly v2
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Dolly v2 models. For illustration purposes, we utilize the [databricks/dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b) and [databricks/dolly-v2-7b](https://huggingface.co/databricks/dolly-v2-7b) as reference Dolly v2 models.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Dolly v2 models. For illustration purposes, we utilize the [databricks/dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b) and [databricks/dolly-v2-7b](https://huggingface.co/databricks/dolly-v2-7b) as reference Dolly v2 models.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Dolly v2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Dolly v2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -25,7 +25,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/flan-t5/README.md b/python/llm/example/GPU/PyTorch-Models/Model/flan-t5/README.md
index ae6ce8cb..37999f79 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/flan-t5/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/flan-t5/README.md
@@ -1,22 +1,22 @@
 # Flan-t5
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API on Flan-t5 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [google/flan-t5-xxl](https://huggingface.co/google/flan-t5-xxl) as a reference Flan-t5 model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API on Flan-t5 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [google/flan-t5-xxl](https://huggingface.co/google/flan-t5-xxl) as a reference Flan-t5 model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Flan-t5 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Flan-t5 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -25,7 +25,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/internlm2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/internlm2/README.md
index 78b02254..70a887bb 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/internlm2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/internlm2/README.md
@@ -1,11 +1,11 @@
 # InternLM2
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on InternLM2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [internlm/internlm2-chat-7b](https://huggingface.co/internlm/internlm2-chat-7b) as a reference InternLM model.
+In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on InternLM2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [internlm/internlm2-chat-7b](https://huggingface.co/internlm/internlm2-chat-7b) as a reference InternLM model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a InternLM2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a InternLM2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -22,7 +22,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/llama2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/llama2/README.md
index 844db7cf..270c6ceb 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/llama2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/llama2/README.md
@@ -1,22 +1,22 @@
 # Llama2
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Llama2 models. For illustration purposes, we utilize the [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) and [meta-llama/Llama-2-70b-chat-hf](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) as reference Llama2 models.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Llama2 models. For illustration purposes, we utilize the [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) and [meta-llama/Llama-2-70b-chat-hf](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) as reference Llama2 models.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example 1 - Basic Version: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -25,7 +25,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/llava/README.md b/python/llm/example/GPU/PyTorch-Models/Model/llava/README.md
index 3909a8eb..de941327 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/llava/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/llava/README.md
@@ -1,22 +1,22 @@
 # LLaVA
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API on LLaVA models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [liuhaotian/llava-v1.5-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b) as a reference LLaVA model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API on LLaVA models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [liuhaotian/llava-v1.5-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b) as a reference LLaVA model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Multi-turn chat centered around an image using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a LLaVA model to start a multi-turn chat centered around an image using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a LLaVA model to start a multi-turn chat centered around an image using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 
 git clone -b v1.1.1 --depth=1 https://github.com/haotian-liu/LLaVA.git # clone the llava libary
 pip install einops # install dependencies required by llava
@@ -30,7 +30,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 
 git clone -b v1.1.1 --depth=1 https://github.com/haotian-liu/LLaVA.git # clone the llava libary
 pip install einops # install dependencies required by llava
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/mamba/README.md b/python/llm/example/GPU/PyTorch-Models/Model/mamba/README.md
index 12b35869..d80d6c9a 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/mamba/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/mamba/README.md
@@ -1,22 +1,22 @@
 # Mamba
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Mamba models. For illustration purposes, we utilize the [state-spaces/mamba-1.4b](https://huggingface.co/state-spaces/mamba-1.4b) and [state-spaces/mamba-2.8b](https://huggingface.co/state-spaces/mamba-2.8b) as reference Mamba models.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Mamba models. For illustration purposes, we utilize the [state-spaces/mamba-1.4b](https://huggingface.co/state-spaces/mamba-1.4b) and [state-spaces/mamba-2.8b](https://huggingface.co/state-spaces/mamba-2.8b) as reference Mamba models.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Mamba model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Mamba model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.0.110+xpu as default
 # you can install specific ipex/torch version for your need
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install einops # package required by Mamba
 ```
 
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/mistral/README.md b/python/llm/example/GPU/PyTorch-Models/Model/mistral/README.md
index bbbefbb1..1662d8e6 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/mistral/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/mistral/README.md
@@ -1,24 +1,24 @@
 # Mistral
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Mistral models. For illustration purposes, we utilize the [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) and [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) as reference Mistral models.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Mistral models. For illustration purposes, we utilize the [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) and [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) as reference Mistral models.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 **Important: According to [Mistral Troubleshooting](https://huggingface.co/mistralai/Mistral-7B-v0.1#troubleshooting), please make sure you have installed `transformers==4.34.0` to run the example.**
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Mistral model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Mistral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 
 # Refer to https://huggingface.co/mistralai/Mistral-7B-v0.1#troubleshooting, please make sure you are using a stable version of Transformers, 4.34.0 or newer.
 pip install transformers==4.34.0
@@ -30,7 +30,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.34.0
 ```
 
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/mixtral/README.md b/python/llm/example/GPU/PyTorch-Models/Model/mixtral/README.md
index d214ad6c..54a21f7b 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/mixtral/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/mixtral/README.md
@@ -1,24 +1,24 @@
 # Mixtral
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Mixtral models. For illustration purposes, we utilize the [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) as a reference Mixtral model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Mixtral models. For illustration purposes, we utilize the [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) as a reference Mixtral model.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 **Important: Please make sure you have installed `transformers==4.36.0` to run the example.**
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Mixtral model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Mixtral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 
 # Please make sure you are using a stable version of Transformers, 4.36.0 or newer.
 pip install transformers==4.36.0
@@ -30,7 +30,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 
 # Please make sure you are using a stable version of Transformers, 4.36.0 or newer.
 pip install transformers==4.36.0
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/phi-1_5/README.md b/python/llm/example/GPU/PyTorch-Models/Model/phi-1_5/README.md
index 1f90c48b..e7feb1cb 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/phi-1_5/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/phi-1_5/README.md
@@ -1,21 +1,21 @@
 # phi-1_5
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate phi-1_5 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5) as a reference phi-1_5 model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate phi-1_5 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5) as a reference phi-1_5 model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a phi-1_5 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a phi-1_5 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install einops # additional package required for phi-1_5 to conduct generation
 ```
 
@@ -25,7 +25,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install einops # additional package required for phi-1_5 to conduct generation
 ```
 
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/phi-2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/phi-2/README.md
index c9614ab4..1e01293f 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/phi-2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/phi-2/README.md
@@ -1,21 +1,21 @@
 # phi-2
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate phi-2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) as a reference phi-2 model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate phi-2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) as a reference phi-2 model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a phi-2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a phi-2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install einops # additional package required for phi-2 to conduct generation
 ```
 #### 1.2 Installation on Windows
@@ -24,7 +24,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/phixtral/README.md b/python/llm/example/GPU/PyTorch-Models/Model/phixtral/README.md
index 29d0c886..f3b5e7b9 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/phixtral/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/phixtral/README.md
@@ -1,21 +1,21 @@
 # phixtral
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate phi-1_5 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [mlabonne/phixtral-4x2_8](https://huggingface.co/mlabonne/phixtral-4x2_8) as a reference phixtral model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate phi-1_5 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [mlabonne/phixtral-4x2_8](https://huggingface.co/mlabonne/phixtral-4x2_8) as a reference phixtral model.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a phixtral model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a phixtral model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install einops # additional package required for phixtral to conduct generation
 ```
 
@@ -25,7 +25,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install einops # additional package required for phixtral to conduct generation
 ```
 
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/qwen-vl/README.md b/python/llm/example/GPU/PyTorch-Models/Model/qwen-vl/README.md
index a88ca3a4..b98c8f66 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/qwen-vl/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/qwen-vl/README.md
@@ -1,21 +1,21 @@
 # Qwen-VL
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Qwen-VL models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [Qwen/Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) as a reference Qwen-VL model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Qwen-VL models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [Qwen/Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) as a reference Qwen-VL model.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Multimodal chat using `chat()` API
-In the example [chat.py](./chat.py), we show a basic use case for a Qwen-VL model to start a multimodal chat using `chat()` API, with BigDL-LLM 'optimize_model' API on Intel GPUs.
+In the example [chat.py](./chat.py), we show a basic use case for a Qwen-VL model to start a multimodal chat using `chat()` API, with IPEX-LLM 'optimize_model' API on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install accelerate tiktoken einops transformers_stream_generator==0.0.4 scipy torchvision pillow tensorboard matplotlib # additional package required for Qwen-VL-Chat to conduct generation
 ```
 
@@ -25,7 +25,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install accelerate tiktoken einops transformers_stream_generator==0.0.4 scipy torchvision pillow tensorboard matplotlib # additional package required for Qwen-VL-Chat to conduct generation
 ```
 
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/qwen1.5/README.md b/python/llm/example/GPU/PyTorch-Models/Model/qwen1.5/README.md
index d5f3bc93..c56676c2 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/qwen1.5/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/qwen1.5/README.md
@@ -1,11 +1,11 @@
 # Qwen1.5
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Qwen1.5 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) as a reference InternLM model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Qwen1.5 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) as a reference InternLM model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Qwen1.5 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Qwen1.5 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage environment:
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.37.0 # install transformers which supports Qwen2
 ```
 
@@ -23,7 +23,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.37.2 # install transformers which supports Qwen2
 ```
 
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/replit/README.md b/python/llm/example/GPU/PyTorch-Models/Model/replit/README.md
index 55786ed6..5f1c24ae 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/replit/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/replit/README.md
@@ -1,22 +1,22 @@
 # Replit
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Replit models. For illustration purposes, we utilize the [replit/replit-code-v1-3b](https://huggingface.co/replit/replit-code-v1-3b) as reference Replit models.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Replit models. For illustration purposes, we utilize the [replit/replit-code-v1-3b](https://huggingface.co/replit/replit-code-v1-3b) as reference Replit models.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Replit model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Replit model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -25,7 +25,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/solar/README.md b/python/llm/example/GPU/PyTorch-Models/Model/solar/README.md
index e18edbbe..b21ba262 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/solar/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/solar/README.md
@@ -1,22 +1,22 @@
 # SOLAR
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate SOLAR models. For illustration purposes, we utilize the [upstage/SOLAR-10.7B-Instruct-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0) as a reference SOLAR model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate SOLAR models. For illustration purposes, we utilize the [upstage/SOLAR-10.7B-Instruct-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0) as a reference SOLAR model.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a SOLAR model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a SOLAR model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.35.2 # required by SOLAR
 ```
 
@@ -26,7 +26,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.35.2 # required by SOLAR
 ```
 
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/speech-t5/README.md b/python/llm/example/GPU/PyTorch-Models/Model/speech-t5/README.md
index 87e5936e..08d994fd 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/speech-t5/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/speech-t5/README.md
@@ -1,22 +1,22 @@
 # SpeechT5
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate SpeechT5 models. For illustration purposes, we utilize the [microsoft/speecht5_tts](https://huggingface.co/microsoft/speecht5_tts) as reference SpeechT5 models.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate SpeechT5 models. For illustration purposes, we utilize the [microsoft/speecht5_tts](https://huggingface.co/microsoft/speecht5_tts) as reference SpeechT5 models.
 
 ## Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Synthesize speech with the given input text
-In the example [synthesize_speech.py](./synthesize_speech.py), we show a basic use case for SpeechT5 model to synthesize speech based on the given text, with BigDL-LLM INT4 optimizations.
+In the example [synthesize_speech.py](./synthesize_speech.py), we show a basic use case for SpeechT5 model to synthesize speech based on the given text, with IPEX-LLM INT4 optimizations.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install "datasets<2.18" soundfile # additional package required for SpeechT5 to conduct generation
 ```
 
@@ -27,7 +27,7 @@ conda create -n llm python=3.9 libuv
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install "datasets<2.18" soundfile # additional package required for SpeechT5 to conduct generation
 ```
 
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/starcoder/README.md b/python/llm/example/GPU/PyTorch-Models/Model/starcoder/README.md
index a1fb0a66..80bb8182 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/starcoder/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/starcoder/README.md
@@ -1,22 +1,22 @@
 # StarCoder
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate StarCoder models. For illustration purposes, we utilize the [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) as reference StarCoder models.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate StarCoder models. For illustration purposes, we utilize the [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) as reference StarCoder models.
 
 ## Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a StarCoder model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a StarCoder model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 #### 1.2 Installation on Windows
@@ -25,7 +25,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/yi/README.md b/python/llm/example/GPU/PyTorch-Models/Model/yi/README.md
index 32f85134..0aaa8575 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/yi/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/yi/README.md
@@ -1,22 +1,22 @@
 # Yi
-In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API on Yi models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [01-ai/Yi-6B](https://huggingface.co/01-ai/Yi-6B) as a reference Yi model.
+In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API on Yi models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [01-ai/Yi-6B](https://huggingface.co/01-ai/Yi-6B) as a reference Yi model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Yi model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for a Yi model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9 # recommend to use Python 3.9
 conda activate llm
 
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install einops # additional package required for Yi-6B to conduct generation
 ```
 
@@ -26,7 +26,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install einops # additional package required for Yi-6B to conduct generation
 ```
 
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/yuan2/README.md b/python/llm/example/GPU/PyTorch-Models/Model/yuan2/README.md
index 498552e8..8cc99379 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/yuan2/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/yuan2/README.md
@@ -1,23 +1,23 @@
 # Yuan2
-In this directory, you will find examples on how you could apply BigDL-LLM `optimize_model` API to accelerate Yuan2 models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [IEITYuan/Yuan2-2B-hf](https://huggingface.co/IEITYuan/Yuan2-2B-hf) as a reference Yuan2 model.
+In this directory, you will find examples on how you could apply IPEX-LLM `optimize_model` API to accelerate Yuan2 models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [IEITYuan/Yuan2-2B-hf](https://huggingface.co/IEITYuan/Yuan2-2B-hf) as a reference Yuan2 model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 In addition, you need to modify some files in Yuan2-2B-hf folder, since Flash attention dependency is for CUDA usage and currently cannot be installed on Intel CPUs. To manually turn it off, please refer to [this issue](https://github.com/IEIT-Yuan/Yuan-2.0/issues/92). We also provide two modified files([config.json](yuan2-2B-instruct/config.json) and [yuan_hf_model.py](yuan2-2B-instruct/yuan_hf_model.py)), which can be used to replace the original content in config.json and yuan_hf_model.py.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for an Yuan2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
+In the example [generate.py](./generate.py), we show a basic use case for an Yuan2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 #### 1.1 Installation on Linux
 We suggest using conda to manage the Python environment. For more information about conda installation, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
 
-After installing conda, create a Python environment for BigDL-LLM:
+After installing conda, create a Python environment for IPEX-LLM:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 
-pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option
+pip install --pre --upgrade ipex-llm[all] # install the latest ipex-llm nightly build with 'all' option
 pip install einops # additional package required for Yuan2 to conduct generation
 pip install pandas # additional package required for Yuan2 to conduct generation
 ```
@@ -27,7 +27,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9 libuv
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install einops # additional package required for Yuan2 to conduct generation
 ```
 
diff --git a/python/llm/example/GPU/PyTorch-Models/More-Data-Types/README.md b/python/llm/example/GPU/PyTorch-Models/More-Data-Types/README.md
index 864e2291..b03a6af4 100644
--- a/python/llm/example/GPU/PyTorch-Models/More-Data-Types/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/More-Data-Types/README.md
@@ -1,9 +1,9 @@
-# BigDL-LLM Low Bit Optimization for Large Language Model
+# IPEX-LLM Low Bit Optimization for Large Language Model
 
-In this example, we show how to apply BigDL-LLM low-bit optimizations (including INT8/INT5/INT4) to Llama2 model, and then run inference on the optimized low-bit model with Intel GPUs.
+In this example, we show how to apply IPEX-LLM low-bit optimizations (including INT8/INT5/INT4) to Llama2 model, and then run inference on the optimized low-bit model with Intel GPUs.
 
 ## 0. Requirements
-To run this example with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../README.md#system-support) for more information.
+To run this example with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../../README.md#system-support) for more information.
 
 ## Example: Load Model in Low-Bit Optimization
 In the example [generate.py](./generate.py), we show a basic use case of low-bit optimizations (including INT8/INT5/INT4) on a Llama2 model to predict the next N tokens using `generate()` API. By specifying `--low-bit` argument, you could apply other low-bit optimization (e.g. INT8/INT5) on model.
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/PyTorch-Models/README.md b/python/llm/example/GPU/PyTorch-Models/README.md
index ce5cd50e..931ef3a3 100644
--- a/python/llm/example/GPU/PyTorch-Models/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/README.md
@@ -1,6 +1,6 @@
-# Running PyTorch model using BigDL-LLM on Intel GPU
+# Running PyTorch model using IPEX-LLM on Intel GPU
 
-This folder contains examples of running any PyTorch model on BigDL-LLM (with "one-line code change"):
+This folder contains examples of running any PyTorch model on IPEX-LLM (with "one-line code change"):
 
 - [Model](Model): examples of running PyTorch models (e.g., Openai Whisper, LLaMA2, ChatGLM2, Falcon, MPT, Baichuan2, etc.) using INT4 optimizations
 - [More-Data-Types](More-Data-Types): examples of applying other low bit optimizations (NF4/INT5/INT8, etc.)
diff --git a/python/llm/example/GPU/PyTorch-Models/Save-Load/README.md b/python/llm/example/GPU/PyTorch-Models/Save-Load/README.md
index 504167e2..29341e82 100644
--- a/python/llm/example/GPU/PyTorch-Models/Save-Load/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Save-Load/README.md
@@ -1,9 +1,9 @@
-# Save/Load Low-Bit Models with BigDL-LLM Optimizations
+# Save/Load Low-Bit Models with IPEX-LLM Optimizations
 
-In this example, we show how to save/load model with BigDL-LLM low-bit optimizations (including INT8/INT5/INT4), and then run inference on the optimized low-bit model.
+In this example, we show how to save/load model with IPEX-LLM low-bit optimizations (including INT8/INT5/INT4), and then run inference on the optimized low-bit model.
 
 ## 0. Requirements
-To run this example with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../README.md#system-support) for more information.
+To run this example with IPEX-LLM, we have some recommended requirements for your machine, please refer to [here](../../README.md#system-support) for more information.
 
 ## Example: Save/Load Model in Low-Bit Optimization
 In the example [generate.py](./generate.py), we show a basic use case of saving/loading model in low-bit optimizations to predict the next N tokens using `generate()` API. Also, saving and loading operations are platform-independent, so you could run it on different platforms.
@@ -13,7 +13,7 @@ We suggest using conda to manage environment:
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/README.md b/python/llm/example/GPU/README.md
index ae41c717..cee5b2fd 100644
--- a/python/llm/example/GPU/README.md
+++ b/python/llm/example/GPU/README.md
@@ -1,16 +1,16 @@
-# BigDL-LLM Examples on Intel GPU
+# IPEX-LLM Examples on Intel GPU
 
-This folder contains examples of running BigDL-LLM on Intel GPU:
+This folder contains examples of running IPEX-LLM on Intel GPU:
 
-- [Applications](Applications): running LLM applications (such as autogen) on BigDL-LLM
-- [HF-Transformers-AutoModels](HF-Transformers-AutoModels): running any ***Hugging Face Transformers*** model on BigDL-LLM (using the standard AutoModel APIs)
-- [LLM-Finetuning](LLM-Finetuning): running ***finetuning*** (such as LoRA, QLoRA, QA-LoRA, etc) using BigDL-LLM on Intel GPUs
-- [vLLM-Serving](vLLM-Serving): running ***vLLM*** serving framework on intel GPUs (with BigDL-LLM low-bit optimized models)
-- [Deepspeed-AutoTP](Deepspeed-AutoTP): running distributed inference using ***DeepSpeed AutoTP*** (with BigDL-LLM low-bit optimized models) on Intel GPUs
-- [LangChain](LangChain): running ***LangChain*** applications on BigDL-LLM
-- [PyTorch-Models](PyTorch-Models): running any PyTorch model on BigDL-LLM (with "one-line code change")
+- [Applications](Applications): running LLM applications (such as autogen) on IPEX-LLM
+- [HF-Transformers-AutoModels](HF-Transformers-AutoModels): running any ***Hugging Face Transformers*** model on IPEX-LLM (using the standard AutoModel APIs)
+- [LLM-Finetuning](LLM-Finetuning): running ***finetuning*** (such as LoRA, QLoRA, QA-LoRA, etc) using IPEX-LLM on Intel GPUs
+- [vLLM-Serving](vLLM-Serving): running ***vLLM*** serving framework on intel GPUs (with IPEX-LLM low-bit optimized models)
+- [Deepspeed-AutoTP](Deepspeed-AutoTP): running distributed inference using ***DeepSpeed AutoTP*** (with IPEX-LLM low-bit optimized models) on Intel GPUs
+- [LangChain](LangChain): running ***LangChain*** applications on IPEX-LLM
+- [PyTorch-Models](PyTorch-Models): running any PyTorch model on IPEX-LLM (with "one-line code change")
 - [Speculative-Decoding](Speculative-Decoding): running any ***Hugging Face Transformers*** model with ***self-speculative decoding*** on Intel GPUs
-- [ModelScope-Models](ModelScope-Models): running ***ModelScope*** model with BigDL-LLM on Intel GPUs
+- [ModelScope-Models](ModelScope-Models): running ***ModelScope*** model with IPEX-LLM on Intel GPUs
 
 
 ## System Support
@@ -31,4 +31,4 @@ This folder contains examples of running BigDL-LLM on Intel GPU:
 - Windows 10/11, with or without WSL 
 
 ## Requirements
-To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation. See the [GPU installation guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for mode details.
\ No newline at end of file
+To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation. See the [GPU installation guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for mode details.
\ No newline at end of file
diff --git a/python/llm/example/GPU/Speculative-Decoding/README.md b/python/llm/example/GPU/Speculative-Decoding/README.md
index 3b8b4e7e..bb003532 100644
--- a/python/llm/example/GPU/Speculative-Decoding/README.md
+++ b/python/llm/example/GPU/Speculative-Decoding/README.md
@@ -1,12 +1,12 @@
-# Self-Speculative Decoding for Large Language Model FP16 Inference using BigDL-LLM on Intel GPUs
-You can use BigDL-LLM to run FP16 inference for any Huggingface Transformer model with ***self-speculative decoding*** on Intel GPUs. This directory contains example scripts to help you quickly get started to run some popular open-source models using self-speculative decoding. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
+# Self-Speculative Decoding for Large Language Model FP16 Inference using IPEX-LLM on Intel GPUs
+You can use IPEX-LLM to run FP16 inference for any Huggingface Transformer model with ***self-speculative decoding*** on Intel GPUs. This directory contains example scripts to help you quickly get started to run some popular open-source models using self-speculative decoding. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
 
 ## Verified Hardware Platforms
 
 - Intel Data Center GPU Max Series
 
 ## Recommended Requirements
-To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation. See the [GPU installation guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for mode details.
+To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation. See the [GPU installation guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for mode details.
 
 Step 1, only Linux system is supported now, Ubuntu 22.04 is prefered.
 
diff --git a/python/llm/example/GPU/Speculative-Decoding/baichuan2/README.md b/python/llm/example/GPU/Speculative-Decoding/baichuan2/README.md
index 7fcce80e..e6bbe6ed 100644
--- a/python/llm/example/GPU/Speculative-Decoding/baichuan2/README.md
+++ b/python/llm/example/GPU/Speculative-Decoding/baichuan2/README.md
@@ -1,18 +1,18 @@
 # Baichuan2
-In this directory, you will find examples on how you could run Baichuan2 FP16 infernece with self-speculative decoding using BigDL-LLM on [Intel GPUs](../README.md). For illustration purposes, we utilize the [baichuan-inc/Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat) and [baichuan-inc/Baichuan2-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat) as reference Baichuan2 models.
+In this directory, you will find examples on how you could run Baichuan2 FP16 infernece with self-speculative decoding using IPEX-LLM on [Intel GPUs](../README.md). For illustration purposes, we utilize the [baichuan-inc/Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat) and [baichuan-inc/Baichuan2-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat) as reference Baichuan2 models.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [speculative.py](./speculative.py), we show a basic use case for a Baichuan2 model to predict the next N tokens using `generate()` API, with BigDL-LLM speculative decoding optimizations on Intel GPUs.
+In the example [speculative.py](./speculative.py), we show a basic use case for a Baichuan2 model to predict the next N tokens using `generate()` API, with IPEX-LLM speculative decoding optimizations on Intel GPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers_stream_generator  # additional package required for Baichuan-7B-Chat to conduct generation
 ```
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/Speculative-Decoding/chatglm3/README.md b/python/llm/example/GPU/Speculative-Decoding/chatglm3/README.md
index 6cd1a762..6c3e4558 100644
--- a/python/llm/example/GPU/Speculative-Decoding/chatglm3/README.md
+++ b/python/llm/example/GPU/Speculative-Decoding/chatglm3/README.md
@@ -1,18 +1,18 @@
 # ChatGLM3
-In this directory, you will find examples on how you could run ChatGLM3 FP16 infernece with self-speculative decoding using BigDL-LLM on [Intel GPUs](../README.md). For illustration purposes, we utilize the [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b) as a reference ChatGLM3 model.
+In this directory, you will find examples on how you could run ChatGLM3 FP16 infernece with self-speculative decoding using IPEX-LLM on [Intel GPUs](../README.md). For illustration purposes, we utilize the [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b) as a reference ChatGLM3 model.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [speculative.py](./speculative.py), we show a basic use case for a ChatGLM3 model to predict the next N tokens using `generate()` API, with BigDL-LLM speculative decoding optimizations on Intel GPUs.
+In the example [speculative.py](./speculative.py), we show a basic use case for a ChatGLM3 model to predict the next N tokens using `generate()` API, with IPEX-LLM speculative decoding optimizations on Intel GPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 ### 2. Configures OneAPI environment variables
 ```bash
diff --git a/python/llm/example/GPU/Speculative-Decoding/gpt-j/README.md b/python/llm/example/GPU/Speculative-Decoding/gpt-j/README.md
index bd2d4290..0b0bd1e9 100644
--- a/python/llm/example/GPU/Speculative-Decoding/gpt-j/README.md
+++ b/python/llm/example/GPU/Speculative-Decoding/gpt-j/README.md
@@ -1,18 +1,18 @@
 # GPT-J
-In this directory, you will find examples on how you could run GPT-J FP16 infernece with self-speculative decoding using BigDL-LLM on [Intel GPUs](../README.md). For illustration purposes,we utilize the [EleutherAI/gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b) as reference GPT-J models.
+In this directory, you will find examples on how you could run GPT-J FP16 infernece with self-speculative decoding using IPEX-LLM on [Intel GPUs](../README.md). For illustration purposes,we utilize the [EleutherAI/gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b) as reference GPT-J models.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [speculative.py](./speculative.py), we show a basic use case for a GPT-J model to predict the next N tokens using `generate()` API, with BigDL-LLM speculative decoding optimizations on Intel GPUs.
+In the example [speculative.py](./speculative.py), we show a basic use case for a GPT-J model to predict the next N tokens using `generate()` API, with IPEX-LLM speculative decoding optimizations on Intel GPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 ### 2. Configures OneAPI environment variables
 ```bash
diff --git a/python/llm/example/GPU/Speculative-Decoding/llama2/README.md b/python/llm/example/GPU/Speculative-Decoding/llama2/README.md
index 7121360b..df82e613 100644
--- a/python/llm/example/GPU/Speculative-Decoding/llama2/README.md
+++ b/python/llm/example/GPU/Speculative-Decoding/llama2/README.md
@@ -1,18 +1,18 @@
 # LLaMA2
-In this directory, you will find examples on how you could run LLaMA2 FP16 infernece with self-speculative decoding using BigDL-LLM on [Intel GPUs](../README.md). For illustration purposes, we utilize the [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) and [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as reference Llama2 models.
+In this directory, you will find examples on how you could run LLaMA2 FP16 infernece with self-speculative decoding using IPEX-LLM on [Intel GPUs](../README.md). For illustration purposes, we utilize the [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) and [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as reference Llama2 models.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [speculative.py](./speculative.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with BigDL-LLM speculative decoding optimizations on Intel GPUs.
+In the example [speculative.py](./speculative.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with IPEX-LLM speculative decoding optimizations on Intel GPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 ### 2. Configures OneAPI environment variables
 ```bash
diff --git a/python/llm/example/GPU/Speculative-Decoding/mistral/README.md b/python/llm/example/GPU/Speculative-Decoding/mistral/README.md
index eab28848..8044dc65 100644
--- a/python/llm/example/GPU/Speculative-Decoding/mistral/README.md
+++ b/python/llm/example/GPU/Speculative-Decoding/mistral/README.md
@@ -1,18 +1,18 @@
 # Mistral
-In this directory, you will find examples on how you could run Mistral FP16 infernece with self-speculative decoding using BigDL-LLM on [Intel GPUs](../README.md). For illustration purposes,we utilize the [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) and [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) as reference Mistral models.
+In this directory, you will find examples on how you could run Mistral FP16 infernece with self-speculative decoding using IPEX-LLM on [Intel GPUs](../README.md). For illustration purposes,we utilize the [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) and [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) as reference Mistral models.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [speculative.py](./speculative.py), we show a basic use case for a Mistral model to predict the next N tokens using `generate()` API, with BigDL-LLM speculative decoding optimizations on Intel GPUs.
+In the example [speculative.py](./speculative.py), we show a basic use case for a Mistral model to predict the next N tokens using `generate()` API, with IPEX-LLM speculative decoding optimizations on Intel GPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install transformers==4.36.0
 ```
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/Speculative-Decoding/qwen/README.md b/python/llm/example/GPU/Speculative-Decoding/qwen/README.md
index 1288117c..ccde3c71 100644
--- a/python/llm/example/GPU/Speculative-Decoding/qwen/README.md
+++ b/python/llm/example/GPU/Speculative-Decoding/qwen/README.md
@@ -1,18 +1,18 @@
 # Qwen
-In this directory, you will find examples on how you could run Qwen FP16 infernece with self-speculative decoding using BigDL-LLM on [Intel GPUs](../README.md). For illustration purposes, we utilize the [Qwen/Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat) and [Qwen/Qwen-14B-Chat](https://huggingface.co/Qwen/Qwen-14B-Chat) as reference Qwen models.
+In this directory, you will find examples on how you could run Qwen FP16 infernece with self-speculative decoding using IPEX-LLM on [Intel GPUs](../README.md). For illustration purposes, we utilize the [Qwen/Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat) and [Qwen/Qwen-14B-Chat](https://huggingface.co/Qwen/Qwen-14B-Chat) as reference Qwen models.
 
 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
 
 ## Example: Predict Tokens using `generate()` API
-In the example [speculative.py](./speculative.py), we show a basic use case for a Qwen model to predict the next N tokens using `generate()` API, with BigDL-LLM speculative decoding optimizations on Intel GPUs.
+In the example [speculative.py](./speculative.py), we show a basic use case for a Qwen model to predict the next N tokens using `generate()` API, with IPEX-LLM speculative decoding optimizations on Intel GPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
 conda create -n llm python=3.9
 conda activate llm
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
 pip install tiktoken einops transformers_stream_generator # additional package required for Qwen-7B-Chat to conduct generation
 ```
 ### 2. Configures OneAPI environment variables
diff --git a/python/llm/example/GPU/vLLM-Serving/README.md b/python/llm/example/GPU/vLLM-Serving/README.md
index 595b0d84..301d884d 100644
--- a/python/llm/example/GPU/vLLM-Serving/README.md
+++ b/python/llm/example/GPU/vLLM-Serving/README.md
@@ -1,6 +1,6 @@
 # vLLM continuous batching on Intel GPUs (experimental support)
 
-This example demonstrates how to serve a LLaMA2-7B model using vLLM continuous batching on Intel GPU (with BigDL-LLM low-bits optimizations).
+This example demonstrates how to serve a LLaMA2-7B model using vLLM continuous batching on Intel GPU (with IPEX-LLM low-bits optimizations).
 
 The code shown in the following example is ported from [vLLM](https://github.com/vllm-project/vllm/tree/v0.2.1.post1).
 
@@ -10,7 +10,7 @@ In this example, we will run Llama2-7b model using Arc A770 and provide `OpenAI-
 
 ### 0. Environment
 
-To use Intel GPUs for deep-learning tasks, you should install the XPU driver and the oneAPI Base Toolkit. Please check the requirements at [here](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU#requirements).
+To use Intel GPUs for deep-learning tasks, you should install the XPU driver and the oneAPI Base Toolkit. Please check the requirements at [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU#requirements).
 
 After install the toolkit, run the following commands in your environment before starting vLLM GPU:
 ```bash
@@ -31,14 +31,14 @@ To run vLLM continuous batching on Intel GPUs, install the dependencies as follo
 
 ```bash
 # First create an conda environment
-conda create -n bigdl-vllm python==3.9
-conda activate bigdl-vllm
+conda create -n ipex-vllm python==3.9
+conda activate ipex-vllm
 # Install dependencies
 pip3 install psutil
 pip3 install sentencepiece  # Required for LLaMA tokenizer.
 pip3 install numpy
 # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
-pip install --pre --upgrade "bigdl-llm[xpu]" -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade "ipex-llm[xpu]" -f https://developer.intel.com/ipex-whl-stable-xpu
 pip3 install fastapi
 pip3 install "uvicorn[standard]"
 pip3 install "pydantic<2"  # Required for OpenAI server.
@@ -87,7 +87,7 @@ Then you can access the api server as follows:
  curl http://localhost:8000/v1/completions \
          -H "Content-Type: application/json" \
          -d '{
-                 "model": "/MODEL_PATH/Llama-2-7b-chat-hf-bigdl/",
+                 "model": "/MODEL_PATH/Llama-2-7b-chat-hf-ipex/",
                  "prompt": "San Francisco is a",
                  "max_tokens": 128,
                  "temperature": 0
@@ -100,12 +100,12 @@ Currently we have only supported LLaMA family model (including `llama`, `vicuna`
 
 #### 4.1 Add model code
 
-Create or clone the Pytorch model code to `BigDL/python/llm/src/bigdl/llm/vllm/model_executor/models`.
+Create or clone the Pytorch model code to `IPEX/python/llm/src/ipex/llm/vllm/model_executor/models`.
 
 #### 4.2 Rewrite the forward methods
 
-Refering to `BigDL/python/llm/src/bigdl/llm/vllm/model_executor/models/bigdl_llama.py`, it's necessary to maintain a `kv_cache`, which is a nested list of dictionary that maps `req_id` to a three-dimensional tensor **(the structure may vary from models)**. Before the model's actual `forward` method, you could prepare a `past_key_values` according to current `req_id`, and after that you need to update the `kv_cache` with `output.past_key_values`. The clearence will be executed when the request is finished.
+Refering to `IPEX/python/llm/src/ipex/llm/vllm/model_executor/models/ipex_llama.py`, it's necessary to maintain a `kv_cache`, which is a nested list of dictionary that maps `req_id` to a three-dimensional tensor **(the structure may vary from models)**. Before the model's actual `forward` method, you could prepare a `past_key_values` according to current `req_id`, and after that you need to update the `kv_cache` with `output.past_key_values`. The clearence will be executed when the request is finished.
 
 #### 4.3 Register new model
 
-Finally, register your `*ForCausalLM` class to the _MODEL_REGISTRY in `BigDL/python/llm/src/bigdl/llm/vllm/model_executor/model_loader.py`.
+Finally, register your `*ForCausalLM` class to the _MODEL_REGISTRY in `IPEX/python/llm/src/ipex/llm/vllm/model_executor/model_loader.py`.
diff --git a/python/llm/portable-zip/README-ui.md b/python/llm/portable-zip/README-ui.md
index a644ad32..a1907275 100644
--- a/python/llm/portable-zip/README-ui.md
+++ b/python/llm/portable-zip/README-ui.md
@@ -1,8 +1,8 @@
-# BigDL-LLM Portable Zip with Web-UI For Windows: User Guide
+# IPEX-LLM Portable Zip with Web-UI For Windows: User Guide
 
 ## Introduction
 
-This portable zip includes everything you need to run an LLM with BigDL-LLM optimizations and chat with it in Web-UI. Please refer to [How to use](#how-to-use) section to get started.
+This portable zip includes everything you need to run an LLM with IPEX-LLM optimizations and chat with it in Web-UI. Please refer to [How to use](#how-to-use) section to get started.
 
 ### 6B model running on an Intel 11-Gen Core PC (real-time screen capture)
 
@@ -15,7 +15,7 @@ This portable zip includes everything you need to run an LLM with BigDL-LLM opti
 
 1. Download the zip from link [here]().
 2. (Optional) You could also build the zip on your own. Run `setup.bat --ui` and it will generate the zip file.
-3. Unzip `bigdl-llm.zip`.
+3. Unzip `ipex-llm.zip`.
 4. Download the model to your computer.
 5. Go into the unzipped folder and double click `chat-ui.bat`. Input the path of the model (e.g. `path\to\model`, note that there's no slash at the end of the path). Press Enter and wait until it shows `All service started. Visit 127.0.0.1:7860 in browser to chat.`. Do NOT close the terminal window!
 6. Visit `127.0.0.1:7860` in your browser and enjoy chatting!
diff --git a/python/llm/portable-zip/README.md b/python/llm/portable-zip/README.md
index 182e4766..f6018d3e 100644
--- a/python/llm/portable-zip/README.md
+++ b/python/llm/portable-zip/README.md
@@ -1,8 +1,8 @@
-# BigDL-LLM Portable Zip For Windows: User Guide
+# IPEX-LLM Portable Zip For Windows: User Guide
 
 ## Introduction
 
-This portable zip includes everything you need to run an LLM with BigDL-LLM optimizations (except models) . Please refer to [How to use](#how-to-use) section to get started.
+This portable zip includes everything you need to run an LLM with IPEX-LLM optimizations (except models) . Please refer to [How to use](#how-to-use) section to get started.
 
 ### 13B model running on an Intel 11-Gen Core PC (real-time screen capture)
 
@@ -22,7 +22,7 @@ This portable zip includes everything you need to run an LLM with BigDL-LLM opti
 
 1. Download the zip from link [here]().
 2. (Optional) You could also build the zip on your own. Run `setup.bat` and it will generate the zip file.
-3. Unzip `bigdl-llm.zip`.
+3. Unzip `ipex-llm.zip`.
 4. Download the model to your computer. Please ensure there is a file named `config.json` in the model folder, otherwise the script won't work.
 
 <p align="center">
diff --git a/python/llm/portable-zip/setup.md b/python/llm/portable-zip/setup.md
index 3cbed8bf..023ebcd1 100644
--- a/python/llm/portable-zip/setup.md
+++ b/python/llm/portable-zip/setup.md
@@ -1,11 +1,11 @@
-# BigDL-LLM Portable Zip Setup Script For Windows
+# IPEX-LLM Portable Zip Setup Script For Windows
 
 # How to use
 
 ## Build Portable Zip without Web-UI
 
-Run `setup.bat` to generate portable zip without Web-UI. It will download and install all dependency and generate `bigdl-llm.zip` for user to use.
+Run `setup.bat` to generate portable zip without Web-UI. It will download and install all dependency and generate `ipex-llm.zip` for user to use.
 
 ## Build Portable Zip with Web-UI
 
-Run `setup.bat --ui` to generate portable zip with Web-UI. It will download and install all dependency and generate `bigdl-llm.zip` for user to use.
+Run `setup.bat --ui` to generate portable zip with Web-UI. It will download and install all dependency and generate `ipex-llm.zip` for user to use.
diff --git a/python/llm/scripts/README.md b/python/llm/scripts/README.md
index 6509169d..f1ff0293 100644
--- a/python/llm/scripts/README.md
+++ b/python/llm/scripts/README.md
@@ -3,7 +3,7 @@
 
 ## Env-Check
 
-The **Env-Check** scripts  ([env-check.sh](./env-chec.sh), [env-check.bat](./env-chec.bat)) are designed to verify your `bigdl-llm` installation and runtime environment. These scripts can help you ensure your environment is correctly set up for optimal performance. You can include the script's output when reporting issues on [BigDL Github Issues](https://github.com/intel-analytics/BigDL/issues) for easier troubleshooting.
+The **Env-Check** scripts  ([env-check.sh](./env-chec.sh), [env-check.bat](./env-chec.bat)) are designed to verify your `ipex-llm` installation and runtime environment. These scripts can help you ensure your environment is correctly set up for optimal performance. You can include the script's output when reporting issues on [IPEX Github Issues](https://github.com/intel-analytics/ipex-llm/issues) for easier troubleshooting.
 
 > Note: These scripts verify python installation, check for necessary packages and environmental variables, assess hardware or operating system compatibility, and identify any XPU-related issues. 
 
@@ -17,11 +17,11 @@ sudo apt install xpu-smi
   
 ### Usage
 
-* After installing `bigdl-llm`, open a terminal (on Linux) or **Anaconda Prompt** (on Windows), and activate the conda environment you have created for running `bigdl-llm`: 
+* After installing `ipex-llm`, open a terminal (on Linux) or **Anaconda Prompt** (on Windows), and activate the conda environment you have created for running `ipex-llm`: 
   ```
   conda activate llm
   ```
-  > If you do not know how to install `bigdl-llm`, refer to [BigDL-LLM installation](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install.html) for more details.
+  > If you do not know how to install `ipex-llm`, refer to [IPEX-LLM installation](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install.html) for more details.
 *  Within the activated python environment, run below command:
     *  On Linux
         ```bash
@@ -44,7 +44,7 @@ transformers=4.37.0
 -----------------------------------------------------------------
 torch=2.1.0a0+cxx11.abi
 -----------------------------------------------------------------
-BigDL Version: 2.5.0b20240219
+IPEX Version: 2.5.0b20240219
 -----------------------------------------------------------------
 ipex=2.1.10+xpu
 -----------------------------------------------------------------
diff --git a/python/llm/src/ipex_llm/serving/fastchat/README.md b/python/llm/src/ipex_llm/serving/fastchat/README.md
index 20c4893a..b40f3956 100644
--- a/python/llm/src/ipex_llm/serving/fastchat/README.md
+++ b/python/llm/src/ipex_llm/serving/fastchat/README.md
@@ -1,8 +1,8 @@
-# Serving using BigDL-LLM and FastChat
+# Serving using IPEX-LLM and FastChat
 
 FastChat is an open platform for training, serving, and evaluating large language model based chatbots. You can find the detailed information at their [homepage](https://github.com/lm-sys/FastChat).
 
-BigDL-LLM can be easily integrated into FastChat so that user can use `BigDL-LLM` as a serving backend in the deployment.
+IPEX-LLM can be easily integrated into FastChat so that user can use `IPEX-LLM` as a serving backend in the deployment.
 
 <details>
 <summary>Table of contents</summary>
@@ -11,9 +11,9 @@ BigDL-LLM can be easily integrated into FastChat so that user can use `BigDL-LLM
 - [Start the service](#start-the-service)
   - [Launch controller](#launch-controller)
   - [Launch model worker(s) and load models](#launch-model-workers-and-load-models)
-    - [BigDL model worker (deprecated)](#bigdl-model-worker-deprecated)
-    - [BigDL worker](#bigdl-llm-worker)
-    - [BigDL vLLM model worker](#vllm-model-worker)
+    - [IPEX model worker (deprecated)](#ipex-model-worker-deprecated)
+    - [IPEX worker](#ipex-llm-worker)
+    - [IPEX vLLM model worker](#vllm-model-worker)
   - [Launch Gradio web server](#launch-gradio-web-server)
   - [Launch RESTful API server](#launch-restful-api-server)
 
@@ -21,19 +21,19 @@ BigDL-LLM can be easily integrated into FastChat so that user can use `BigDL-LLM
 
 ## Install
 
-You may install **`bigdl-llm`** with `FastChat` as follows:
+You may install **`ipex-llm`** with `FastChat` as follows:
 
 ```bash
-pip install --pre --upgrade bigdl-llm[serving]
+pip install --pre --upgrade ipex-llm[serving]
 
 # Or
-pip install --pre --upgrade bigdl-llm[all]
+pip install --pre --upgrade ipex-llm[all]
 ```
 
-To add GPU support for FastChat, you may install **`bigdl-llm`** as follows:
+To add GPU support for FastChat, you may install **`ipex-llm`** as follows:
 
 ```bash
-pip install --pre --upgrade bigdl-llm[xpu, serving] -f https://developer.intel.com/ipex-whl-stable-xpu
+pip install --pre --upgrade ipex-llm[xpu, serving] -f https://developer.intel.com/ipex-whl-stable-xpu
 
 ```
 
@@ -49,35 +49,35 @@ python3 -m fastchat.serve.controller
 
 ### Launch model worker(s) and load models
 
-Using BigDL-LLM in FastChat does not impose any new limitations on model usage. Therefore, all Hugging Face Transformer models can be utilized in FastChat.
+Using IPEX-LLM in FastChat does not impose any new limitations on model usage. Therefore, all Hugging Face Transformer models can be utilized in FastChat.
 
-#### BigDL model worker (deprecated)
+#### IPEX model worker (deprecated)
 <details>
 <summary>details</summary>
 
-> Warning: This method has been deprecated, please change to use `BigDL-LLM` [worker](#bigdl-llm-worker) instead.
+> Warning: This method has been deprecated, please change to use `IPEX-LLM` [worker](#ipex-llm-worker) instead.
 
-FastChat determines the Model adapter to use through path matching. Therefore, in order to load models using BigDL-LLM, you need to make some modifications to the model's name.
+FastChat determines the Model adapter to use through path matching. Therefore, in order to load models using IPEX-LLM, you need to make some modifications to the model's name.
 
-For instance, assuming you have downloaded the `llama-7b-hf` from [HuggingFace](https://huggingface.co/decapoda-research/llama-7b-hf).  Then, to use the `BigDL-LLM` as backend, you need to change name from `llama-7b-hf` to `bigdl-7b`.The key point here is that the model's path should include "bigdl" and **should not include paths matched by other model adapters**.
+For instance, assuming you have downloaded the `llama-7b-hf` from [HuggingFace](https://huggingface.co/decapoda-research/llama-7b-hf).  Then, to use the `IPEX-LLM` as backend, you need to change name from `llama-7b-hf` to `ipex-7b`.The key point here is that the model's path should include "ipex" and **should not include paths matched by other model adapters**.
 
-Then we will use `bigdl-7b` as model-path.
+Then we will use `ipex-7b` as model-path.
 
-> note: This is caused by the priority of name matching list. The new added `BigDL-LLM` adapter is at the tail of the name-matching list so that it has the lowest priority. If model path contains other keywords like `vicuna` which matches to another adapter with higher priority, then the `BigDL-LLM` adapter will not work.
+> note: This is caused by the priority of name matching list. The new added `IPEX-LLM` adapter is at the tail of the name-matching list so that it has the lowest priority. If model path contains other keywords like `vicuna` which matches to another adapter with higher priority, then the `IPEX-LLM` adapter will not work.
 
-A special case is `ChatGLM` models. For these models, you do not need to do any changes after downloading the model and the `BigDL-LLM` backend will be used automatically.
+A special case is `ChatGLM` models. For these models, you do not need to do any changes after downloading the model and the `IPEX-LLM` backend will be used automatically.
 
 Then we can run model workers
 
 ```bash
 # On CPU
-python3 -m ipex_llm.serving.fastchat.model_worker --model-path PATH/TO/bigdl-7b --device cpu
+python3 -m ipex_llm.serving.fastchat.model_worker --model-path PATH/TO/ipex-7b --device cpu
 
 # On GPU
-python3 -m ipex_llm.serving.fastchat.model_worker --model-path PATH/TO/bigdl-7b --device xpu
+python3 -m ipex_llm.serving.fastchat.model_worker --model-path PATH/TO/ipex-7b --device xpu
 ```
 
-If you run successfully using `BigDL` backend, you can see the output in log like this:
+If you run successfully using `IPEX` backend, you can see the output in log like this:
 
 ```bash
 INFO - Converting the current model to sym_int4 format......
@@ -86,29 +86,29 @@ INFO - Converting the current model to sym_int4 format......
 > note: We currently only support int4 quantization for this method.
 </details>
 
-#### BigDL-LLM worker
-To integrate BigDL-LLM with `FastChat` efficiently, we have provided a new model_worker implementation named `bigdl_worker.py`.
+#### IPEX-LLM worker
+To integrate IPEX-LLM with `FastChat` efficiently, we have provided a new model_worker implementation named `ipex_worker.py`.
 
-To run the `bigdl_worker` on CPU, using the following code:
+To run the `ipex_worker` on CPU, using the following code:
 ```bash
-source bigdl-llm-init -t
+source ipex-llm-init -t
 
 # Available low_bit format including sym_int4, sym_int8, bf16 etc.
-python3 -m ipex_llm.serving.fastchat.bigdl_worker --model-path lmsys/vicuna-7b-v1.5 --low-bit "sym_int4" --trust-remote-code --device "cpu"
+python3 -m ipex_llm.serving.fastchat.ipex_worker --model-path lmsys/vicuna-7b-v1.5 --low-bit "sym_int4" --trust-remote-code --device "cpu"
 ```
 
 
 For GPU example:
 ```bash
 # Available low_bit format including sym_int4, sym_int8, fp16 etc.
-python3 -m ipex_llm.serving.fastcaht.bigdl_worker --model-path lmsys/vicuna-7b-v1.5 --low-bit "sym_int4" --trust-remote-code --device "xpu"
+python3 -m ipex_llm.serving.fastcaht.ipex_worker --model-path lmsys/vicuna-7b-v1.5 --low-bit "sym_int4" --trust-remote-code --device "xpu"
 ```
 
-For a full list of accepted arguments, you can refer to the main method of the `bigdl_worker.py`
+For a full list of accepted arguments, you can refer to the main method of the `ipex_worker.py`
 
-#### BigDL vLLM model worker
+#### IPEX vLLM model worker
 
-We also provide the `vllm_worker` which uses the [vLLM](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/vLLM-Serving) engine for better hardware utilization.
+We also provide the `vllm_worker` which uses the [vLLM](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/vLLM-Serving) engine for better hardware utilization.
 
 To run using the `vLLM_worker`,  we don't need to change model name, just simply uses the following command:
 
@@ -128,11 +128,11 @@ python3 -m fastchat.serve.gradio_web_server
 
 This is the user interface that users will interact with.
 
-By following these steps, you will be able to serve your models using the web UI with BigDL-LLM as the backend. You can open your browser and chat with a model now.
+By following these steps, you will be able to serve your models using the web UI with IPEX-LLM as the backend. You can open your browser and chat with a model now.
 
 ### Launch RESTful API server
 
-To start an OpenAI API server that provides compatible APIs using BigDL-LLM backend, you can launch the `openai_api_server` and follow this [doc](https://github.com/lm-sys/FastChat/blob/main/docs/openai_api.md) to use it.
+To start an OpenAI API server that provides compatible APIs using IPEX-LLM backend, you can launch the `openai_api_server` and follow this [doc](https://github.com/lm-sys/FastChat/blob/main/docs/openai_api.md) to use it.
 
 ```bash
 python3 -m fastchat.serve.openai_api_server --host localhost --port 8000

Functionality
Model
DL framework	-
OS
Auto Tuning
Inference Opt
Hardware
Package
Version
Install CMD	NA
	reStructuredText	CommonMark
Inline code	- -```rst -``inline code`` -``` -	- -```md -`inline code` -``` - -
Hyperlinks	- -```rst -`Relative link text `_ -`Absolute link text `_ -``` - -	- -```md -[Relative link text](relatve/path/to/the/file) -[Absolute link text](https://www.example.com/) -``` - -
Italic	- -```rst -`italicized text` -italicized text -``` -	- -```md -italicized text - -``` - -
Italic & bold	- -Not supported, needed help with css - -	- -```md -*italicized & bold text* -``` - -
Headers	- -```rst -Header Level 1 -========================= - -Header Level 2 -------------------------- - -Header Level 3 -~~~~~~~~~~~~~~~~~~~~~~~~~ - -Header Level 4 -^^^^^^^^^^^^^^^^^^^^^^^^^ -``` - -	- -Note that the underline symbols should be at least as long as the header texts. - -Also, we do not expect maually-added styles to headers. - -You could refer [here](https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#sections) for more information on reStructuredText sections. - -
Lists	- -```rst -* A unordered list -* The second item of the unordered list - with two lines - -#. A numbered list - - 1. A nested numbered list - 2. The second nested numbered list - -#. The second item of - the numbered list -``` - -	- -Note that the number of spaces indented depends on the markup. That is, if we use '* '/'#. '/'10. ' for the list, the following contents belong to the list or the nested lists after it should be indented by 2/3/4 spaces. - -Also note that blanks lines are needed around the nested list. - -You could refer [here](https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#lists-and-quote-like-blocks) for more information on reStructuredText lists. - -
- -Note, -Warning, -Danger, -Tip, -Important, -See Also -boxes - -	- -```rst -.. note:: - - This is a note box. - -.. warning:: - - This is a warning box. - -.. danger:: - - This is a danger box. - -.. tip:: - - This is a tip box. - -.. important:: - - This is an important box. - -.. seealso:: - - This is a see also box. -``` - -	- -```eval_rst -.. note:: - - This is a note box. - -.. warning:: - - This is a warning box. - -.. danger:: - - This is a danger box. - -.. tip:: - - This is a tip box. - -.. important:: - - This is an important box. - -.. seealso:: - - This is a see also box. -``` - -
Code blocks	- -```rst -.. code-block:: [language] - - some code in this language - -.. code-block:: python - - some python code -``` - -	- -All the supported language argument for syntax highlighting can be found [here](https://pygments.org/docs/lexers/). - -
Tabs	- -```rst -.. tabs:: - - .. tab:: Title 1 - - Contents for tab 1 - - .. tab:: Title 2 - - Contents for tab 2 - - .. code-block:: python - - some python code -``` - -	- -```eval_rst -.. tabs:: - - .. tab:: Title 1 - - Contents for tab 1 - - .. tab:: Title 2 - - Contents for tab 2 - - .. code-block:: python - - some python code -``` - -You could refer [here](https://sphinx-tabs.readthedocs.io/en/v3.4.0/) for more information on the usage of tabs. - -
Cards in grids	- -```rst -.. grid:: 1 2 2 2 - :gutter: 2 - - .. grid-item-card:: - - Header - ^^^ - A normal card. - +++ - :bdg-link:`Footer ` - - .. grid-item-card:: - :link: https://www.example.com/ - :class-card: bigdl-link-card - - Header - ^^^ - A link card. - +++ - Footer -``` - -	- -```eval_rst -.. grid:: 1 2 2 2 - :gutter: 2 - - .. grid-item-card:: - - Header - ^^^ - A normal card. - +++ - :bdg-link:`Footer ` - - .. grid-item-card:: - :link: https://www.example.com/ - :class-card: bigdl-link-card - - Header - ^^^ - A link card. - +++ - Footer -``` - -You could refer [here](https://sphinx-design.readthedocs.io/en/furo-theme/cards.html) for more information on the usage of cards, and [here](https://sphinx-design.readthedocs.io/en/furo-theme/grids.html#placing-a-card-in-a-grid) for cards in grids. - -Note that `1 2 2 2` defines the number of cards per row in different screen sizes (from extra-small to large). - -
- -[Mermaid](https://mermaid-js.github.io/) digrams - -	- -```rst -.. mermaid:: - - flowchart LR - A(Node A) - B([Node B]) - - A -- points to --> B - A --> C{{Node C}} - - classDef blue color:#0171c3; - class B,C blue; -``` - -	- -```eval_rst -.. mermaid:: - - flowchart LR - A(Node A) - B([Node B]) - - A -- points to --> B - A --> C{{Node C}} - - classDef blue color:#0171c3; - class B,C blue; -``` - -Mermaid is a charting tool for dynamically creating/modifying diagrams. Refer [here](https://mermaid-js.github.io/) for more Mermaid syntax. - -
Headers	- -```md -# Header Level 1 - -## Header Level 2 - -### Header Level 3 - -#### Header Level 4 -``` - -	- -Note that we do not expect maually-added styles to headers. - -
Lists	- -```md -- A unordered list -- The second item of the unordered list - with two lines - -1. A numbered list - * A nested unordered list - * The second nested unordered list -2. The second item of - the numbered list -``` - -	- -Note that the number of spaces indented depends on the markup. That is, if we use '- '/'1. '/'10. ' for the list, the following contents belong to the list or the nested lists after it should be indented by 2/3/4 spaces. - -
Code blocks	- -~~~md -```[language] -some code in this language -``` - -```python -some python code -``` -~~~ - -	- -All the supported language argument for syntax highlighting can be found [here](https://pygments.org/docs/lexers/). - -