[Doc] IPEX-LLM Doc Layout Update (#10532)

* Fix navigation bar to 1 * Remove unnecessary python api * Fixed failed langchain native api doc * Change index page layout * Update quicklink for IPEX-LLM * Simplify toc and add bigdl-llm migration guide * Update readthedocs readme * Add missing index link for bigdl-llm migration guide * Update logo image and repo link * Update copyright * Small fix * Update copyright * Update top nav bar * Small fix
2024-03-25 16:23:56 +08:00 · 2024-03-25 16:23:56 +08:00 · e0ea7b8244
commit e0ea7b8244
parent de5bbf83de
48 changed files with 57 additions and 5832 deletions
--- a/docs/readthedocs/README.md
+++ b/docs/readthedocs/README.md
@ -1,15 +1,16 @@
-# BigDL Documentation
+# IPEX-LLM Documentation
-This is the repository for BigDL documentation, which is hosted at https://bigdl.readthedocs.io/en/latest/
+This is the repository for IPEX-LLM documentation, which is hosted at https://ipex-llm.readthedocs.io/en/latest/
 ## Local build
 ### 1. Set up environment
-To build BigDL documentation locally for testing purposes, it is recommended to create a conda environment with specified Python version:
+To build IPEX-LLM documentation locally for testing purposes, it is recommended to create a conda environment with specified Python version:
 ```bash
 conda create -n docs python=3.7
 conda activate docs
 ```
-Then inside [`BigDL/docs/readthedocs`](.) folder, install required packages:
+Then inside [`ipex-llm/docs/readthedocs`](.) folder, install required packages:
 ```bash
 cd docs/readthedocs
--- a/docs/readthedocs/image/GitHub-Mark-32px.png
+++ b/docs/readthedocs/image/GitHub-Mark-32px.png
--- a/docs/readthedocs/image/KMS-Client.png
+++ b/docs/readthedocs/image/KMS-Client.png
--- a/docs/readthedocs/image/KMS_End-to-end_Example.png
+++ b/docs/readthedocs/image/KMS_End-to-end_Example.png
--- a/docs/readthedocs/image/bigdl_logo.jpg
+++ b/docs/readthedocs/image/bigdl_logo.jpg
--- a/docs/readthedocs/image/bigdl_logo.png
+++ b/docs/readthedocs/image/bigdl_logo.png
--- a/docs/readthedocs/image/colab_logo_32px.png
+++ b/docs/readthedocs/image/colab_logo_32px.png
--- a/docs/readthedocs/image/friesian_architecture.png
+++ b/docs/readthedocs/image/friesian_architecture.png
--- a/docs/readthedocs/image/ipex-llm_logo_temp.png
+++ b/docs/readthedocs/image/ipex-llm_logo_temp.png
--- a/docs/readthedocs/image/orca-workflow.png
+++ b/docs/readthedocs/image/orca-workflow.png
--- a/docs/readthedocs/image/ppml_memory_config.png
+++ b/docs/readthedocs/image/ppml_memory_config.png
--- a/docs/readthedocs/image/scala_logo.png
+++ b/docs/readthedocs/image/scala_logo.png
--- a/docs/readthedocs/image/trial_dataframe.png
+++ b/docs/readthedocs/image/trial_dataframe.png
--- a/docs/readthedocs/source/_templates/sidebar_quicklinks.html
+++ b/docs/readthedocs/source/_templates/sidebar_quicklinks.html
@ -3,34 +3,44 @@
    <div class="navbar-nav">
        <ul class="nav">
            <li>
-                <strong class="bigdl-quicklinks-section-title">BigDL-LLM Quickstart</strong>
+                <a href="doc/LLM/index.html">
                    <strong class="bigdl-quicklinks-section-title">IPEX-LLM Document</strong>
                </a>
            </li>
            <li>
                <a href="doc/LLM/Quickstart/bigdl_llm_migration.html">
                    <strong class="bigdl-quicklinks-section-title"><code>bigdl-llm</code> Migration Guide</strong>
                </a>
            </li>
            <li>
                <strong class="bigdl-quicklinks-section-title">IPEX-LLM Quickstart</strong>
                <input id="quicklink-cluster-llm-quickstart" type="checkbox" class="toctree-checkbox" />
                <label for="quicklink-cluster-llm-quickstart" class="toctree-toggle">
                    <i class="fa-solid fa-chevron-down"></i>
                </label>
                <ul class="nav bigdl-quicklinks-section-nav">
                    <li>
-                        <a href="doc/LLM/Quickstart/install_linux_gpu.html">Install BigDL-LLM on Linux with Intel GPU</a>
+                        <a href="doc/LLM/Quickstart/install_linux_gpu.html">Install IPEX-LLM on Linux with Intel GPU</a>
                    </li>
                    <li>
-                        <a href="doc/LLM/Quickstart/install_windows_gpu.html">Install BigDL-LLM on Windows with Intel GPU</a>
+                        <a href="doc/LLM/Quickstart/install_windows_gpu.html">Install IPEX-LLM on Windows with Intel GPU</a>
                    </li>
                    <li>
-                        <a href="doc/LLM/Quickstart/docker_windows_gpu.html">Install BigDL-LLM in Docker on Windows with Intel GPU</a>
+                        <a href="doc/LLM/Quickstart/docker_windows_gpu.html">Install IPEX-LLM in Docker on Windows with Intel GPU</a>
                    </li>
                    <li>
                        <a href="doc/LLM/Quickstart/webui_quickstart.html">Use Text Generation WebUI on Windows with Intel GPU</a>
                    </li>
                    <li>
-                        <a href="doc/LLM/Quickstart/benchmark_quickstart.html">BigDL-LLM Benchmarking</a>
+                        <a href="doc/LLM/Quickstart/benchmark_quickstart.html">IPEX-LLM Benchmarking</a>
                    </li>
                    <li>
-                        <a href="doc/LLM/Quickstart/llama_cpp_quickstart.html">Use llama.cpp with BigDL-LLM on Intel GPU</a>
+                        <a href="doc/LLM/Quickstart/llama_cpp_quickstart.html">Use llama.cpp with IPEX-LLM on Intel GPU</a>
                    </li>
                </ul>
            </li>
            <li>
-                <strong class="bigdl-quicklinks-section-title">BigDL-LLM Installation</strong>
+                <strong class="bigdl-quicklinks-section-title">IPEX-LLM Installation</strong>
                <input id="quicklink-cluster-llm-installation" type="checkbox" class="toctree-checkbox" />
                <label for="quicklink-cluster-llm-installation" class="toctree-toggle">
                    <i class="fa-solid fa-chevron-down"></i>
@ -47,7 +57,7 @@
            </li>
            <li>
                <a href="doc/LLM/Overview/FAQ/faq.html">
-                    <strong class="bigdl-quicklinks-section-title">BigDL-LLM FAQ</strong>
+                    <strong class="bigdl-quicklinks-section-title">IPEX-LLM FAQ</strong>
                </a>
            </li>
        </ul>
--- a/docs/readthedocs/source/_toc.yml
+++ b/docs/readthedocs/source/_toc.yml
@ -1,27 +1,8 @@
 root: index
 subtrees:
  - entries:
    - file: doc/UserGuide/index
      title: 'User guide'
      subtrees:
        - entries:
          - file: doc/UserGuide/python
          - file: doc/UserGuide/scala
          - file: doc/UserGuide/win
          - file: doc/UserGuide/docker
          - file: doc/UserGuide/colab
          - file: doc/UserGuide/hadoop
          - file: doc/UserGuide/k8s
          - file: doc/UserGuide/databricks
  - entries:
    - file: doc/Application/powered-by
      title: "Powered by"
  - entries:
    - file: doc/LLM/index
-      title: "LLM"
+      title: "IPEX-LLM Document"
      subtrees:
        - entries:
          - file: doc/LLM/Overview/llm
@ -38,6 +19,7 @@ subtrees:
            title: "Quickstart"
            subtrees:
              - entries:
                - file: doc/LLM/Quickstart/bigdl_llm_migration
                - file: doc/LLM/Quickstart/install_linux_gpu
                - file: doc/LLM/Quickstart/install_windows_gpu
                - file: doc/LLM/Quickstart/docker_windows_gpu
@ -77,347 +59,5 @@ subtrees:
          - file: doc/LLM/Overview/FAQ/faq
            title: "FAQ"
  - entries:
    - file: doc/Orca/index
      title: "Orca"
      subtrees:
        - entries:
          - file: doc/Orca/Overview/orca
            title: "Orca in 5 minutes"
          - file: doc/Orca/Overview/install
            title: "Installation"
          - file: doc/Orca/Overview/index
            title: "Key Features"
            subtrees:
              - entries:
                - file: doc/Orca/Overview/orca-context
                - file: doc/Orca/Overview/data-parallel-processing
                - file: doc/Orca/Overview/distributed-training-inference
                - file: doc/Orca/Overview/distributed-tuning
                - file: doc/Orca/Overview/ray
          - file: doc/Orca/Howto/index
            title: "How-to Guides"
            subtrees:
              - entries:
                - file: doc/Orca/Howto/tf2keras-quickstart
                - file: doc/Orca/Howto/pytorch-quickstart
                - file: doc/Orca/Howto/ray-quickstart
                - file: doc/Orca/Howto/spark-dataframe
                - file: doc/Orca/Howto/xshards-pandas
                - file: doc/Orca/Howto/autoestimator-pytorch-quickstart
                - file: doc/Orca/Howto/autoxgboost-quickstart
                - file: doc/Orca/Howto/tf1-quickstart
                - file: doc/Orca/Howto/tf1keras-quickstart
          - file: doc/Orca/Tutorial/index
            title: "Tutorials"
            subtrees:
              - entries:
                - file: doc/Orca/Tutorial/yarn
                - file: doc/Orca/Tutorial/k8s
          - file: doc/Orca/Overview/known_issues
            title: "Tips and Known Issues"
          - file: doc/PythonAPI/Orca/index
            title: "API Reference"
  - entries:
      - file: doc/Nano/index
        title: "Nano"
        subtrees:
          - entries:
            - file: doc/Nano/Overview/nano
              title: "Nano in 5 minutes"
            - file: doc/Nano/Overview/install
              title: "Installation"
            - file: doc/Nano/Overview/index
              title: "Key Features"
              subtrees:
                - entries:
                  - file: doc/Nano/Overview/pytorch_train
                  - file: doc/Nano/Overview/pytorch_inference
                  - file: doc/Nano/Overview/pytorch_cuda_patch
                  - file: doc/Nano/Overview/tensorflow_train
                  - file: doc/Nano/Overview/tensorflow_inference
                  - file: doc/Nano/Overview/hpo
            - file: doc/Nano/QuickStart/index
              title: "Tutorials"
              subtrees:
                - entries:
                  - file: doc/Nano/QuickStart/pytorch_train_quickstart
                  - file: doc/Nano/QuickStart/pytorch_nano
                  - file: doc/Nano/QuickStart/pytorch_onnxruntime
                  - file: doc/Nano/QuickStart/pytorch_openvino
                  - file: doc/Nano/QuickStart/pytorch_quantization_inc_onnx
                  - file: doc/Nano/QuickStart/pytorch_quantization_inc
                  - file: doc/Nano/QuickStart/pytorch_quantization_openvino
                  - file: doc/Nano/QuickStart/tensorflow_train_quickstart
                  - file: doc/Nano/QuickStart/tensorflow_embedding
                  - file: doc/Nano/QuickStart/tensorflow_quantization_quickstart
            - file: doc/Nano/Howto/index
              title: "How-to Guides"
              subtrees:
                - entries:
                  - file: doc/Nano/Howto/Preprocessing/index
                    subtrees:
                      - entries:
                        - file: doc/Nano/Howto/Preprocessing/PyTorch/index
                          title: "PyTorch"
                          subtrees:
                            - entries:      
                              - file: doc/Nano/Howto/Preprocessing/PyTorch/accelerate_pytorch_cv_data_pipeline
                  - file: doc/Nano/Howto/Training/index
                    subtrees:
                      - entries:
                        - file: doc/Nano/Howto/Training/PyTorchLightning/index
                          title: "PyTorch Lightning"
                          subtrees:
                            - entries:
                              - file: doc/Nano/Howto/Training/PyTorchLightning/accelerate_pytorch_lightning_training_ipex
                              - file: doc/Nano/Howto/Training/PyTorchLightning/accelerate_pytorch_lightning_training_multi_instance
                              - file: doc/Nano/Howto/Training/PyTorchLightning/pytorch_lightning_training_channels_last
                              - file: doc/Nano/Howto/Training/PyTorchLightning/pytorch_lightning_training_bf16
                        - file: doc/Nano/Howto/Training/PyTorch/index
                          title: "PyTorch"
                          subtrees:
                            - entries:
                              - file: doc/Nano/Howto/Training/PyTorch/convert_pytorch_training_torchnano
                              - file: doc/Nano/Howto/Training/PyTorch/use_nano_decorator_pytorch_training
                              - file: doc/Nano/Howto/Training/PyTorch/accelerate_pytorch_training_ipex
                              - file: doc/Nano/Howto/Training/PyTorch/accelerate_pytorch_training_multi_instance
                              - file: doc/Nano/Howto/Training/PyTorch/pytorch_training_channels_last
                              - file: doc/Nano/Howto/Training/PyTorch/accelerate_pytorch_training_bf16
                        - file: doc/Nano/Howto/Training/TensorFlow/index
                          title: "TensorFlow"
                          subtrees:
                            - entries:
                              - file: doc/Nano/Howto/Training/TensorFlow/accelerate_tensorflow_training_multi_instance
                              - file: doc/Nano/Howto/Training/TensorFlow/tensorflow_training_embedding_sparseadam
                              - file: doc/Nano/Howto/Training/TensorFlow/tensorflow_training_bf16
                              - file: doc/Nano/Howto/Training/TensorFlow/tensorflow_custom_training_multi_instance
                        - file: doc/Nano/Howto/Training/General/index
                          title: "General"
                          subtrees:
                            - entries:
                              - file: doc/Nano/Howto/Training/General/choose_num_processes_training
                  - file: doc/Nano/Howto/Inference/index
                    subtrees:
                      - entries:
                        - file: doc/Nano/Howto/Inference/OpenVINO/index
                          title: "OpenVINO"
                          subtrees:
                            - entries:    
                              - file: doc/Nano/Howto/Inference/OpenVINO/openvino_inference
                              - file: doc/Nano/Howto/Inference/OpenVINO/openvino_inference_async
                              - file: doc/Nano/Howto/Inference/OpenVINO/accelerate_inference_openvino_gpu
                        - file: doc/Nano/Howto/Inference/PyTorch/index
                          title: "PyTorch"
                          subtrees:
                            - entries: 
                              - file: doc/Nano/Howto/Inference/PyTorch/inference_optimizer_optimize
                              - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_onnx
                              - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_openvino
                              - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_jit_ipex
                              - file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_inc
                              - file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_pot
                              - file: doc/Nano/Howto/Inference/PyTorch/pytorch_context_manager
                              - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_ipex
                              - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_jit
                              - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_onnx
                              - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_openvino
                              - file: doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference
                              - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_gpu
                              - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_async_pipeline
                        - file: doc/Nano/Howto/Inference/TensorFlow/index
                          title: "TensorFlow"
                          subtrees:
                            - entries: 
                              - file: doc/Nano/Howto/Inference/TensorFlow/accelerate_tensorflow_inference_onnx
                              - file: doc/Nano/Howto/Inference/TensorFlow/accelerate_tensorflow_inference_openvino
                              - file: doc/Nano/Howto/Inference/TensorFlow/tensorflow_inference_bf16
                              - file: doc/Nano/Howto/Inference/TensorFlow/tensorflow_save_and_load_onnx
                              - file: doc/Nano/Howto/Inference/TensorFlow/tensorflow_save_and_load_openvino
                  - file: doc/Nano/Howto/Install/index
                    subtrees:
                      - entries:
                        - file: doc/Nano/Howto/Install/install_in_colab
                        - file: doc/Nano/Howto/Install/windows_guide
            - file: doc/Nano/Overview/known_issues
              title: "Tips and Known Issues"
            - file: doc/Nano/Overview/troubshooting
              title: "Troubleshooting Guide"
            - file: doc/Nano/Overview/support
              title: "OS Support"
            - file: doc/PythonAPI/Nano/index
              title: "API Reference"
  - entries:
    - file: doc/DLlib/index
      title: "DLlib"
      subtrees:
        - entries:
          - file: doc/DLlib/Overview/dllib
            title: "DLLib in 5 minutes"
          - file: doc/DLlib/Overview/install
            title: "Installation"
          - file: doc/DLlib/Overview/index
            title: "Key Features"
            subtrees:
              - entries:
                - file: doc/DLlib/Overview/keras-api
                - file: doc/DLlib/Overview/nnframes
                - file: doc/DLlib/Overview/visualization
                  title: "Visualization"
          - file: doc/DLlib/QuickStart/index
            title: "Tutorials"
            subtrees:
              - entries:
                - file: doc/DLlib/QuickStart/python-getting-started
                  title: "Python Quick Start"
                - file: doc/DLlib/QuickStart/scala-getting-started
                  title: "Scala Quick Start"
          - file: doc/PythonAPI/DLlib/index
            title: "API Reference"
  - entries:
    - file: doc/Chronos/index
      title: "Chronos"
      subtrees:
        - entries:
          - file: doc/Chronos/Overview/quick-tour
            title: "Chronos in 5 minutes"
          - file: doc/Chronos/Overview/install
            title: "Installation"
          - file: doc/Chronos/Overview/deep_dive
            title: "Key Features"
            subtrees:
              - entries:
                - file: doc/Chronos/Overview/data_processing_feature_engineering
                - file: doc/Chronos/Overview/forecasting
                - file: doc/Chronos/Overview/anomaly_detection
                - file: doc/Chronos/Overview/simulation
                - file: doc/Chronos/Overview/aiops
                - file: doc/Chronos/Overview/speed_up
                - file: doc/Chronos/Overview/useful_functionalities
          - file: doc/Chronos/Howto/index
            title: "How-to Guides"
            subtrees:
              - entries:
                - file: doc/Chronos/Howto/windows_guide
                - file: doc/Chronos/Howto/docker_guide_single_node
                - file: doc/Chronos/Howto/how_to_use_benchmark_tool
                - file: doc/Chronos/Howto/how_to_create_forecaster
                - file: doc/Chronos/Howto/how_to_train_forecaster_on_one_node
                - file: doc/Chronos/Howto/how_to_save_and_load_forecaster
                - file: doc/Chronos/Howto/how_to_tune_forecaster_model
                - file: doc/Chronos/Howto/how_to_speedup_inference_of_forecaster_through_ONNXRuntime
                - file: doc/Chronos/Howto/how_to_speedup_inference_of_forecaster_through_OpenVINO
                - file: doc/Chronos/Howto/how_to_evaluate_a_forecaster
                - file: doc/Chronos/Howto/how_to_use_forecaster_to_predict_future_data
                - file: doc/Chronos/Howto/how_to_optimize_a_forecaster
                - file: doc/Chronos/Howto/how_to_generate_confidence_interval_for_prediction
                - file: doc/Chronos/Howto/how_to_export_onnx_files
                - file: doc/Chronos/Howto/how_to_export_openvino_files
                - file: doc/Chronos/Howto/how_to_export_torchscript_files
                - file: doc/Chronos/Howto/how_to_preprocess_my_data
                - file: doc/Chronos/Howto/how_to_process_data_in_production_environment
                - file: doc/Chronos/Howto/how_to_choose_forecasting_alg
                - file: doc/Chronos/Howto/how_to_export_data_processing_pipeline_to_torchscript
          - file: doc/Chronos/QuickStart/index
            title: "Tutorials"
            subtrees:
              - entries:
                - file: doc/Chronos/QuickStart/chronos-tsdataset-forecaster-quickstart
                - file: doc/Chronos/QuickStart/chronos-autotsest-quickstart
                - file: doc/Chronos/QuickStart/chronos-anomaly-detector
          - file: doc/Chronos/Overview/chronos_known_issue
            title: "Tips and Known Issues"
          - file: doc/PythonAPI/Chronos/index
            title: "API Reference"
  - entries:
    - file: doc/Friesian/index
      title: "Friesian"
      subtrees:
        - entries:
          - file: doc/Friesian/intro
            title: "Introduction"
          - file: doc/Friesian/serving
            title: "Serving"
          - file: doc/Friesian/examples
            title: "Use Cases"
          - file: doc/PythonAPI/Friesian/index
            title: "API Reference"
  - entries:
    - file: doc/PPML/index
      title: "PPML"
      subtrees:
        - entries:
            - file: doc/PPML/Overview/intro
              title: "PPML Introduction"
            - file: doc/PPML/Overview/install
              title: 'Installation'
            - file: doc/PPML/Overview/examples
              title: "Tutorials"
              subtrees:
                - entries:
                  - file: doc/PPML/Overview/quicktour
                  - file: doc/PPML/QuickStart/end-to-end
                  - file: doc/PPML/Overview/devguide
                  - file: doc/PPML/Overview/azure_ppml
                  - file: doc/PPML/Overview/ali_ecs_occlum_cn
            - file: doc/PPML/Overview/misc
              title: "Advanced Topics"
              subtrees:
                - entries:
                  - file: doc/PPML/Overview/ppml
                  - file: doc/PPML/Overview/attestation_basic
                  - file: doc/PPML/Overview/trusted_big_data_analytics_and_ml
                  - file: doc/PPML/Overview/trusted_fl
                  - file: doc/PPML/QuickStart/secure_your_services
                  - file: doc/PPML/QuickStart/deploy_ppml_in_production
                  - file: doc/PPML/QuickStart/install_sgx_driver
                  - file: doc/PPML/QuickStart/deploy_intel_sgx_device_plugin_for_kubernetes
                  - file: doc/PPML/QuickStart/trusted-serving-on-k8s-guide
                  - file: doc/PPML/QuickStart/tpc-h_with_sparksql_on_k8s
                  - file: doc/PPML/QuickStart/tpc-ds_with_sparksql_on_k8s
                  - file: doc/PPML/Overview/azure_ppml_occlum
                  - file: doc/PPML/Overview/secure_lightgbm_on_spark
  - entries:
    - file: doc/UserGuide/contributor
      title: "Contributor guide"
      subtrees:
        - entries:
          - file: doc/UserGuide/develop
          - file: doc/UserGuide/documentation
  - entries:
    - file: doc/Serving/index
      title: "Cluster serving"
      subtrees:
        - entries:
          - file: doc/Serving/Overview/serving.md
            title: "User Guide"
          - file: doc/Serving/QuickStart/serving-quickstart
            title: "Serving in 5 miniutes"
          - file: doc/Serving/ProgrammingGuide/serving-installation
          - file: doc/Serving/ProgrammingGuide/serving-start
          - file: doc/Serving/ProgrammingGuide/serving-inference
          - file: doc/Serving/Example/example
            title: "Examples"
          - file: doc/Serving/FAQ/faq
          - file: doc/Serving/FAQ/contribute-guide
  - entries:
    - file: doc/Application/presentations
      title: "Presentations"
  - entries:
    - file: doc/Application/blogs
--- a/docs/readthedocs/source/conf.py
+++ b/docs/readthedocs/source/conf.py
@ -37,11 +37,11 @@ sys.path.insert(0, os.path.abspath("../../../python/llm/src/"))
 # -- Project information -----------------------------------------------------
 html_theme = "pydata_sphinx_theme"
 html_theme_options = {
-  "header_links_before_dropdown": 3,
+  "header_links_before_dropdown": 1,
  "icon_links": [
        {
-            "name": "GitHub Repository for BigDL",
+            "name": "GitHub Repository for IPEX-LLM",
-            "url": "https://github.com/intel-analytics/BigDL",
+            "url": "https://github.com/intel-analytics/ipex-llm",
            "icon": "fa-brands fa-square-github",
            "type": "fontawesome",
        }
@ -63,7 +63,7 @@ html_context = {
    "default_mode": "light" 
 }
-html_logo = "../image/bigdl_logo.png"
+html_logo = "../image/ipex-llm_logo_temp.png"
 # hard code it for now, may change it to read from installed bigdl
 release = "latest"
@ -76,9 +76,9 @@ source_suffix = {'.rst': 'restructuredtext',
 master_doc = 'index'
-project = 'BigDL'
+project = 'IPEX-LLM'
-copyright = '2020, BigDL Authors'
+copyright = '2024, IPEX-LLM Authors'
-author = 'BigDL Authors'
+author = 'IPEX-LLM Authors'
 # The short X.Y version
 #version = ''
--- a/docs/readthedocs/source/doc/LLM/Quickstart/index.rst
+++ b/docs/readthedocs/source/doc/LLM/Quickstart/index.rst
@ -7,9 +7,14 @@ IPEX-LLM Quickstart
 This section includes efficient guide to show you how to:
 * |bigdl_llm_migration_guide|_
 * `Install IPEX-LLM on Linux with Intel GPU <./install_linux_gpu.html>`_
 * `Install IPEX-LLM on Windows with Intel GPU <./install_windows_gpu.html>`_
 * `Install IPEX-LLM in Docker on Windows with Intel GPU <./docker_windows_gpu.html>`_
 * `Use Text Generation WebUI on Windows with Intel GPU <./webui_quickstart.html>`_
 * `Conduct Performance Benchmarking with IPEX-LLM <./benchmark_quickstart.html>`_
 * `Use llama.cpp with IPEX-LLM on Intel GPU <./llama_cpp_quickstart.html>`_
 .. |bigdl_llm_migration_guide| replace:: ``bigdl-llm`` Migration Guide
 .. _bigdl_llm_migration_guide: bigdl_llm_migration.html
--- a/docs/readthedocs/source/doc/PythonAPI/Chronos/aiops.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/Chronos/aiops.rst
@ -1,21 +0,0 @@
 AIOps
 =====================
 ConfigGenerator
 ----------------------------------------
 AIOps application typically relies on a decision system with one or multiple AI models.
 The `ConfigGenerator` provides a easy-to-use builder for an AIOps decision system with the usage of `Trigger`.
 .. automodule:: bigdl.chronos.aiops.config_generator.config_generator
    :members:
    :undoc-members:
    :show-inheritance:
    :inherited-members:
 .. automodule:: bigdl.chronos.aiops.config_generator.trigger
    :members:
    :undoc-members:
    :show-inheritance:
    :inherited-members:
--- a/docs/readthedocs/source/doc/PythonAPI/Chronos/anomaly_detectors.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/Chronos/anomaly_detectors.rst
@ -1,31 +0,0 @@
 Anomaly Detectors
 =====================
 AEDetector
 ----------------------------------------
 AEDetector is unsupervised anomaly detector. It builds an autoencoder network, tries to fit the model to the input data, and calcuates the reconstruction error. The samples with larger reconstruction errors are more likely the anomalies.
 .. automodule:: bigdl.chronos.detector.anomaly.ae_detector
    :members:
    :show-inheritance:
 DBScanDetector
 ----------------------------------------
 DBScanDetector uses DBSCAN clustering for anomaly detection. The DBSCAN algorithm tries to cluster the points and label the points that do not belong to any clusters as -1. It thus detects outliers in the input time series.
 .. automodule:: bigdl.chronos.detector.anomaly.dbscan_detector
    :members:
    :show-inheritance:
 ThresholdDetector
 ----------------------------------------
 ThresholdDetector is a simple anomaly detector that detectes anomalies based on threshold. The target value for anomaly testing can be either 1) the sample value itself or 2) the difference between the forecasted value and the actual value, if the forecasted values are provied. The thresold can be set by user or esitmated from the train data accoring to anomaly ratio and statistical distributions.
 .. automodule:: bigdl.chronos.detector.anomaly.th_detector
    :members: ThresholdDetector
    :show-inheritance:
--- a/docs/readthedocs/source/doc/PythonAPI/Chronos/automodels.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/Chronos/automodels.rst
@ -1,76 +0,0 @@
 Auto Models
 =====================
 AutoTCN
 -------------------------------------------
 AutoTCN is a TCN forecasting model with Auto tuning.
 .. tabs::
    .. tab:: PyTorch/Tensorflow
        .. automodule:: bigdl.chronos.autots.model.auto_tcn
            :members:
            :undoc-members:
            :show-inheritance:
            :inherited-members:
 AutoLSTM
 ----------------------------------------
 AutoLSTM is an LSTM forecasting model with Auto tuning.
 .. tabs::
    .. tab:: PyTorch/Tensorflow
        .. automodule:: bigdl.chronos.autots.model.auto_lstm
            :members:
            :undoc-members:
            :show-inheritance:
            :inherited-members:
 AutoSeq2Seq
 ----------------------------------------
 AutoSeq2Seq is an Seq2Seq forecasting model with Auto tuning.
 .. tabs::
    .. tab:: PyTorch/Tensorflow
        .. automodule:: bigdl.chronos.autots.model.auto_seq2seq
            :members:
            :undoc-members:
            :show-inheritance:
            :inherited-members:
 AutoARIMA
 ----------------------------------------
 AutoARIMA is an ARIMA forecasting model with Auto tuning.
 .. automodule:: bigdl.chronos.autots.model.auto_arima
    :members:
    :undoc-members:
    :show-inheritance:
    :inherited-members:
 AutoProphet
 ----------------------------------------
 AutoProphet is a Prophet forecasting model with Auto tuning.
 .. automodule:: bigdl.chronos.autots.model.auto_prophet
    :members:
    :undoc-members:
    :show-inheritance:
    :inherited-members:
--- a/docs/readthedocs/source/doc/PythonAPI/Chronos/autots.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/Chronos/autots.rst
@ -1,86 +0,0 @@
 AutoTS (deprecated)
 =====================
 .. warning::
    The API in this page will be deprecated soon. Please refer to our new AutoTS API.
 AutoTSTrainer
 ----------------------------------------
 AutoTSTrainer trains a time series pipeline (including data processing, feature engineering, and model) with AutoML.
 .. autoclass:: bigdl.chronos.autots.deprecated.forecast.AutoTSTrainer
    :members:
    :show-inheritance:
 TSPipeline
 ----------------------------------------
 A pipeline for time series forecasting.
 .. autoclass:: bigdl.chronos.autots.deprecated.forecast.TSPipeline
    :members:
    :show-inheritance:
 Recipe
 ----------------------------------------
 Recipe is used for search configuration for AutoTSTrainer.
 .. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.SmokeRecipe
    :members:
    :show-inheritance:
 .. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.MTNetSmokeRecipe
    :members:
    :show-inheritance:
 .. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.TCNSmokeRecipe
    :members:
    :show-inheritance:
 .. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.PastSeqParamHandler
    :members:
    :show-inheritance:
 .. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.GridRandomRecipe
    :members:
    :show-inheritance:
 .. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.LSTMSeq2SeqRandomRecipe
    :members:
    :show-inheritance:
 .. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.LSTMGridRandomRecipe
    :members:
    :show-inheritance:
 .. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.Seq2SeqRandomRecipe
    :members:
    :show-inheritance:
 .. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.MTNetGridRandomRecipe
    :members:
    :show-inheritance:
 .. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.TCNGridRandomRecipe
    :members:
    :show-inheritance:
 .. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.RandomRecipe
    :members:
    :show-inheritance:
 .. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.BayesRecipe
    :members:
    :show-inheritance:
 .. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.XgbRegressorGridRandomRecipe
    :members:
    :show-inheritance:
 .. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.XgbRegressorSkOptRecipe
    :members:
    :show-inheritance:
--- a/docs/readthedocs/source/doc/PythonAPI/Chronos/autotsestimator.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/Chronos/autotsestimator.rst
@ -1,34 +0,0 @@
 AutoTS
 =====================
 AutoTSEstimator
 -------------------------------------------
 Automated TimeSeries Estimator for time series forecasting task.
 AutoTSEstimator will replace AutoTSTrainer in later version.
 .. tabs::
    .. tab:: PyTorch/Tensorflow
        .. automodule:: bigdl.chronos.autots.autotsestimator
            :members:
            :undoc-members:
            :show-inheritance:
 TSPipeline
 -------------------------------------------
 TSPipeline is an E2E solution for time series forecasting task.
 AutoTSEstimator will replace original TSPipeline returned by AutoTSTrainer in later version.
 .. tabs::
    .. tab:: PyTorch
        .. automodule:: bigdl.chronos.autots.tspipeline
            :members:
            :undoc-members:
            :show-inheritance:
--- a/docs/readthedocs/source/doc/PythonAPI/Chronos/evaluator.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/Chronos/evaluator.rst
@ -1,10 +0,0 @@
 Evaluator
 ====================================
 Evaluator
 ------------------------------------
 .. automodule:: bigdl.chronos.metric.forecast_metrics
    :members:
    :undoc-members:
    :show-inheritance:
--- a/docs/readthedocs/source/doc/PythonAPI/Chronos/forecasters.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/Chronos/forecasters.rst
@ -1,182 +0,0 @@
 Forecasters
 =====================
 LSTMForecaster
 ----------------------------------------
 Long short-term memory(LSTM) is a special type of recurrent neural network(RNN). We implement the basic version of LSTM - VanillaLSTM for this forecaster for time-series forecasting task. It has two LSTM layers, two dropout layer and a dense layer.
 For the detailed algorithm description, please refer to `here <https://github.com/intel-analytics/BigDL/blob/main/docs/docs/Chronos/Algorithm/LSTMAlgorithm.md>`__.
 .. tabs::
    .. tab:: PyTorch
        .. automodule:: bigdl.chronos.forecaster.lstm_forecaster
            :members:
            :undoc-members:
            :show-inheritance:
            :inherited-members:
    .. tab:: Tensorflow
        .. automodule:: bigdl.chronos.forecaster.tf.lstm_forecaster
            :members:
            :undoc-members:
            :show-inheritance:
            :inherited-members:
 Seq2SeqForecaster
 -------------------------------------------
 Seq2SeqForecaster wraps a sequence to sequence model based on LSTM, and is suitable for multivariant & multistep time series forecasting.
 .. tabs::
    .. tab:: PyTorch
        .. automodule:: bigdl.chronos.forecaster.seq2seq_forecaster
            :members:
            :undoc-members:
            :show-inheritance:
            :inherited-members:
    .. tab:: Tensorflow
        .. automodule:: bigdl.chronos.forecaster.tf.seq2seq_forecaster
            :members:
            :undoc-members:
            :show-inheritance:
            :inherited-members:
 TCNForecaster
 ----------------------------------------
 Temporal Convolutional Networks (TCN) is a neural network that use convolutional architecture rather than recurrent networks. It supports multi-step and multi-variant cases. Causal Convolutions enables large scale parallel computing which makes TCN has less inference time than RNN based model such as LSTM.
 .. tabs::
    .. tab:: PyTorch
        .. automodule:: bigdl.chronos.forecaster.tcn_forecaster
            :members:
            :undoc-members:
            :show-inheritance:
            :inherited-members:
    .. tab:: Tensorflow
        .. automodule:: bigdl.chronos.forecaster.tf.tcn_forecaster
            :members:
            :undoc-members:
            :show-inheritance:
            :inherited-members:
 AutoformerForecaster
 ----------------------------------------
 Autoformer is a neural network that use transformer architecture with autocorrelation. It supports multi-step and multi-variant cases. It shows significant accuracy improvement while longer training/inference time than TCN.
 .. tabs::
    .. tab:: PyTorch
        .. automodule:: bigdl.chronos.forecaster.autoformer_forecaster
            :members:
            :undoc-members:
            :show-inheritance:
            :inherited-members:
 NBeatsForecaster
 ----------------------------------------
 .. tabs::
    .. tab:: PyTorch
        Neural basis expansion analysis for interpretable time series forecasting (`N-BEATS <https://arxiv.org/abs/1905.10437>`__) is a deep neural architecture based on backward and forward residual links and a very deep stack of fully-connected layers. Nbeats can solve univariate time series point forecasting problems, being interpretable, and fast to train.
        .. automodule:: bigdl.chronos.forecaster.nbeats_forecaster
            :members:
            :undoc-members:
            :show-inheritance:
            :inherited-members:
 TCMFForecaster
 ----------------------------------------
 Chronos TCMFForecaster provides an efficient way to forecast high dimensional time series.
 TCMFForecaster is based on DeepGLO algorithm, which is a deep forecasting model which thinks globally and acts locally.
 You can refer to `the deepglo paper <https://arxiv.org/abs/1905.03806>`__ for more details.
 TCMFForecaster supports distributed training and inference. It is based on Orca PyTorch Estimator, which is an estimator to do PyTorch training/evaluation/prediction on Spark in a distributed fashion. Also you can choose to enable distributed training and inference or not.
 **Remarks**:
 * You can refer to `TCMFForecaster installation <https://github.com/intel-analytics/BigDL/blob/main/docs/docs/Chronos/tutorials/TCMFForecaster.md#step-0-prepare-environment>`__ to install required packages.
 * Your operating system (OS) is required to be one of the following 64-bit systems: **Ubuntu 16.04 or later** and **macOS 10.12.6 or later**.
 .. tabs::
    .. tab:: PyTorch
        .. automodule:: bigdl.chronos.forecaster.tcmf_forecaster
            :members:
            :undoc-members:
            :show-inheritance:
            :inherited-members:
 MTNetForecaster
 ----------------------------------------
 MTNet is a memory-network based solution for multivariate time-series forecasting. In a specific task of multivariate time-series forecasting, we have several variables observed in time series and we want to forecast some or all of the variables' value in a future time stamp.
 MTNet is proposed by paper `A Memory-Network Based Solution for Multivariate Time-Series Forecasting <https://arxiv.org/abs/1809.02105>`__.
 For the detailed algorithm description, please refer to `here <https://github.com/intel-analytics/BigDL/blob/main/docs/docs/Chronos/Algorithm/MTNetAlgorithm.md>`__.
 .. tabs::
    .. tab:: Tensorflow
        .. automodule:: bigdl.chronos.forecaster.tf.mtnet_forecaster
            :members:
            :undoc-members:
            :show-inheritance:
            :inherited-members:
 ARIMAForecaster
 ----------------------------------------
 AutoRegressive Integrated Moving Average (ARIMA) is a class of statistical models for analyzing and forecasting time series data. It consists of 3 components: AR (AutoRegressive), I (Integrated) and MA (Moving Average). In ARIMAForecaster we use the SARIMA model (Seasonal ARIMA), which is an extension of ARIMA that additionally supports the direct modeling of the seasonal component of the time series.
 .. automodule:: bigdl.chronos.forecaster.arima_forecaster
    :members:
    :undoc-members:
    :show-inheritance:
    :inherited-members:
 ProphetForecaster
 ----------------------------------------
 Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.
 For the detailed algorithm description, please refer to `here <https://github.com/facebook/prophet>`__.
 .. automodule:: bigdl.chronos.forecaster.prophet_forecaster
    :members:
    :undoc-members:
    :show-inheritance:
    :inherited-members:
--- a/docs/readthedocs/source/doc/PythonAPI/Chronos/index.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/Chronos/index.rst
@ -1,19 +0,0 @@
 Chronos API
 ==================
 .. toctree::
    :maxdepth: 2
    autotsestimator.rst
    automodels.rst
    forecasters.rst
    anomaly_detectors.rst
    tsdataset.rst
    simulator.rst
    evaluator.rst
    aiops.rst
 .. toctree::
    :maxdepth: 1
    autots.rst
--- a/docs/readthedocs/source/doc/PythonAPI/Chronos/simulator.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/Chronos/simulator.rst
@ -1,15 +0,0 @@
 Simulator
 ====================================
 DPGANSimulator
 ------------------------------------
 .. tabs::
    .. tab:: PyTorch
        .. automodule:: bigdl.chronos.simulator.doppelganger_simulator
            :members:
            :undoc-members:
            :show-inheritance:
--- a/docs/readthedocs/source/doc/PythonAPI/Chronos/tsdataset.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/Chronos/tsdataset.rst
@ -1,33 +0,0 @@
 TSDataset
 ===========
 TSDataset
 ----------------------------------------
 Time series data is a special data formulation with specific operations. TSDataset is an abstract of time series dataset, which provides various data processing operations (e.g. impute, deduplicate, resample, scale/unscale, roll) and feature engineering methods (e.g. datetime feature, aggregation feature). Cascade call is supported for most of the methods.
 TSDataset can be initialized from a pandas dataframe and be converted to a pandas dataframe or numpy ndarray.
 .. automodule:: bigdl.chronos.data.tsdataset
    :members:
    :undoc-members:
    :show-inheritance:
 XShardsTSDataset
 ----------------------------------------
 Time series data is a special data formulation with specific operations. XShardsTSDataset is an abstract of time series dataset, which provides various data processing operations (e.g. impute, deduplicate, resample, scale/unscale, roll) and feature engineering methods (e.g. datetime feature, aggregation feature). Cascade call is supported for most of the methods.
 XShardsTSDataset can be initialized from xshards of pandas dataframe and be converted to xshards of numpy in an distributed and parallized fashion.
 .. automodule:: bigdl.chronos.data.experimental.xshards_tsdataset
    :members:
    :undoc-members:
    :show-inheritance:
 Built-in Dataset
 --------------------------------------------
 Built-in dataset can be downloaded and preprocessed by this function. Train, validation and test split is also supported.
 .. automodule:: bigdl.chronos.data.repo_dataset
    :members:
--- a/docs/readthedocs/source/doc/PythonAPI/DLlib/clipping.md
+++ b/docs/readthedocs/source/doc/PythonAPI/DLlib/clipping.md
@ -1,24 +0,0 @@
 # Clipping
 --------
 ## ConstantGradientClipping ##
 Set constant gradient clipping during the training process.
 ```scala
 model.setConstantGradientClipping(min, max)
 ```
 param:
   * min: The minimum value to clip by.
   * max: The maximum value to clip by.
 ## GradientClippingByL2Norm ##
 Clip gradient to a maximum L2-Norm during the training process.
 ```scala
 model.setGradientClippingByL2Norm(clipNorm)
 ```
 param:
   * clipNorm: Gradient L2-Norm threshold
--- a/docs/readthedocs/source/doc/PythonAPI/DLlib/core_layers.md
+++ b/docs/readthedocs/source/doc/PythonAPI/DLlib/core_layers.md
--- a/docs/readthedocs/source/doc/PythonAPI/DLlib/freeze.md
+++ b/docs/readthedocs/source/doc/PythonAPI/DLlib/freeze.md
@ -1,108 +0,0 @@
 # Model Freeze
 To "freeze" a model means to exclude some layers of model from training.
 ```scala
 model.freeze("layer1", "layer2")
 model.unFreeze("layer1", "layer2")
 ```
 * The model can be "freezed" by calling ```freeze()```. If a model is freezed,
 its parameters(weight/bias, if exists) are not changed in training process.
 If model names are passed, then layers that match the given names will be freezed.
 * The whole model can be "unFreezed" by calling ```unFreeze()```.
 If model names are provided, then layers that match the given names will be unFreezed.
 * stop the input gradient of layers that match the given names. Their input gradient are not computed.
 And they will not contributed to the input gradient computation of layers that depend on them.
 Original model without "freeze"
 ```scala
 val reshape = Reshape(Array(4)).inputs()
 val fc1 = Linear(4, 2).setName("fc1").inputs()
 val fc2 = Linear(4, 2).setName("fc2").inputs(reshape)
 val cadd_1 = CAddTable().setName("cadd").inputs(fc1, fc2)
 val output1_1 = ReLU().inputs(cadd_1)
 val output2_1 = Threshold(10.0).inputs(cadd_1)
 val model = Graph(Array(reshape, fc1), Array(output1_1, output2_1))
 val input = T(Tensor(T(0.1f, 0.2f, -0.3f, -0.4f)),
  Tensor(T(0.5f, 0.4f, -0.2f, -0.1f)))
 val gradOutput = T(Tensor(T(1.0f, 2.0f)), Tensor(T(3.0f, 4.0f)))
 fc1.element.getParameters()._1.apply1(_ => 1.0f)
 fc2.element.getParameters()._1.apply1(_ => 2.0f)
 model.zeroGradParameters()
 println("output1: \n", model.forward(input))
 model.backward(input, gradOutput)
 model.updateParameters(1)
 println("fc2 weight \n", fc2.element.parameters()._1(0))
 ```
 ```
 (output1:
 , {
 	2: 0.0
 	   0.0
 	   [com.intel.analytics.bigdl.tensor.DenseTensor of size 2]
 	1: 2.8
 	   2.8
 	   [com.intel.analytics.bigdl.tensor.DenseTensor of size 2]
 })
 (fc2 weight
 ,1.9	1.8	2.3	2.4
 1.8	1.6	2.6	2.8
 [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x4])
 ```
 "Freeze" ```fc2```, the parameters of ```fc2``` is not changed.
 ```scala
 fc1.element.getParameters()._1.apply1(_ => 1.0f)
 fc2.element.getParameters()._1.apply1(_ => 2.0f)
 model.zeroGradParameters()
 model.freeze("fc2")
 println("output2: \n", model.forward(input))
 model.backward(input, gradOutput)
 model.updateParameters(1)
 println("fc2 weight \n", fc2.element.parameters()._1(0))
 ```
 ```
 (output2:
 , {
 	2: 0.0
 	   0.0
 	   [com.intel.analytics.bigdl.tensor.DenseTensor of size 2]
 	1: 2.8
 	   2.8
 	   [com.intel.analytics.bigdl.tensor.DenseTensor of size 2]
 })
 (fc2 weight
 ,2.0	2.0	2.0	2.0
 2.0	2.0	2.0	2.0
 [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x4])
 ```
 "unFreeze" ```fc2```, the parameters of ```fc2``` will be updated.
 ```scala
 fc1.element.getParameters()._1.apply1(_ => 1.0f)
 fc2.element.getParameters()._1.apply1(_ => 2.0f)
 model.zeroGradParameters()
 model.unFreeze()
 println("output3: \n", model.forward(input))
 model.backward(input, gradOutput)
 model.updateParameters(1)
 println("fc2 weight \n", fc2.element.parameters()._1(0))
 ```
 ```
 (output3:
 , {
 	2: 0.0
 	   0.0
 	   [com.intel.analytics.bigdl.tensor.DenseTensor of size 2]
 	1: 2.8
 	   2.8
 	   [com.intel.analytics.bigdl.tensor.DenseTensor of size 2]
 })
 (fc2 weight
 ,1.9	1.8	2.3	2.4
 1.8	1.6	2.6	2.8
 [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x4])
 ```
--- a/docs/readthedocs/source/doc/PythonAPI/DLlib/index.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/DLlib/index.rst
@ -1,13 +0,0 @@
 DLlib API
 ==================
 .. toctree::
    :maxdepth: 1
    model.rst
    core_layers.md
    optim-Methods.md
    regularizers.md
    learningrate-Scheduler.md
    freeze.md
    clipping.md
--- a/docs/readthedocs/source/doc/PythonAPI/DLlib/learningrate-Scheduler.md
+++ b/docs/readthedocs/source/doc/PythonAPI/DLlib/learningrate-Scheduler.md
@ -1,441 +0,0 @@
 # Learning Rate Scheduler
 --------
 ## Poly ##
 **Scala:**
 ```scala
 val lrScheduler = Poly(power=0.5, maxIteration=1000)
 ```
 **Python:**
 ```python
 lr_scheduler = Poly(power=0.5, max_iteration=1000, bigdl_type="float")
 ```
 A learning rate decay policy, where the effective learning rate follows a polynomial decay, to be zero by the max_iteration. Calculation: base_lr (1 - iter/maxIteration) `^` (power)
 `power` coeffient of decay, refer to calculation formula
 `maxIteration` max iteration when lr becomes zero
 **Scala example:**
 ```scala
 import com.intel.analytics.bigdl.dllib.optim.SGD._
 import com.intel.analytics.bigdl.dllib.optim._
 import com.intel.analytics.bigdl.dllib.tensor.{Storage, Tensor}
 import com.intel.analytics.bigdl.dllib.tensor.TensorNumericMath.TensorNumeric.NumericFloat
 import com.intel.analytics.bigdl.dllib.utils.T
 val optimMethod = new SGD[Double](0.1)
 optimMethod.learningRateSchedule = Poly(3, 100)
 def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
  return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
 }
 val x = Tensor[Double](Storage(Array(10.0, 10.0)))
 optimMethod.optimize(feval, x)
 > print(optimMethod.learningRateSchedule.currentRate)
 -0.1
 optimMethod.optimize(feval, x)
 > print(optimMethod.learningRateSchedule.currentRate)
 -0.0970299
 ```
 **Python example:**
 ```python
 optim_method = SGD(0.1)
 optimMethod.learningRateSchedule = Poly(3, 100)
 ```
 ## Default ##
 It is the default learning rate schedule. For each iteration, the learning rate would update with the following formula:
 l_{n + 1} = l / (1 + n * learning_rate_decay) where `l` is the initial learning rate
 **Scala:**
 ```scala
 val lrScheduler = Default()
 ```
 **Python:**
 ```python
 lr_scheduler = Default()
 ```
 **Scala example:**
 ```scala
 val optimMethod = new SGD[Double](0.1, 0.1)
 def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
  return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
 }
 val x = Tensor[Double](Storage(Array(10.0, 10.0)))
 optimMethod.optimize(feval, x)
 > print(optimMethod.learningRateSchedule.currentRate)
 -0.1
 optimMethod.optimize(feval, x)
 > print(optimMethod.learningRateSchedule.currentRate)
 -0.09090909090909091
 optimMethod.optimize(feval, x)
 > print(optimMethod.learningRateSchedule.currentRate)
 -0.08333333333333334
 ```
 **Python example:**
 ```python
 optimMethod = SGD(leaningrate_schedule=Default())
 ```
 ## NaturalExp ##
 A learning rate schedule, which rescale the learning rate by exp ( -decay_rate * iter / decay_step ) referring to tensorflow's learning rate decay # natural_exp_decay
 `decay_step` how often to apply decay
 `gamma` the decay rate. e.g. 0.96
 **Scala:**
 ```scala
 val learningRateScheduler = NaturalExp(1, 1)
 ```
 **Scala example:**
 ```scala
 val optimMethod = new SGD[Double](0.1)
 optimMethod.learningRateSchedule = NaturalExp(1, 1)
 def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
  (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
 }
 val x = Tensor[Double](Storage(Array(10.0, 10.0)))
 val state = T("epoch" -> 0, "evalCounter" -> 0)
 optimMethod.state = state
 optimMethod.optimize(feval, x)
 > print(optimMethod.learningRateSchedule.currentRate)
 -0.1
 optimMethod.optimize(feval, x)
 > print(optimMethod.learningRateSchedule.currentRate)
 -0.036787944117144235
 optimMethod.optimize(feval, x)
 > print(optimMethod.learningRateSchedule.currentRate)
 -0.013533528323661271
 ```
 ## Exponential ##
 A learning rate schedule, which rescale the learning rate by lr_{n + 1} = lr * decayRate `^` (iter / decayStep)
 `decayStep` the inteval for lr decay
 `decayRate` decay rate
 `stairCase` if true, iter / decayStep is an integer division and the decayed learning rate follows a staircase function.
 **Scala:**
 ```scala
 val learningRateSchedule = Exponential(10, 0.96)
 ```
 **Python:**
 ```python
 exponential = Exponential(100, 0.1)
 ```
 **Scala example:**
 ```scala
 val optimMethod = new SGD[Double](0.05)
 optimMethod.learningRateSchedule = Exponential(10, 0.96)
 def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
  (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
 }
 val x = Tensor[Double](Storage(Array(10.0, 10.0)))
 val state = T("epoch" -> 0, "evalCounter" -> 0)
 optimMethod.state = state
 optimMethod.optimize(feval, x)
 > print(optimMethod.learningRateSchedule.currentRate)
 -0.05
 optimMethod.optimize(feval, x)
 > print(optimMethod.learningRateSchedule.currentRate)
 -0.049796306069892535
 ```
 **Python example:**
 ```python
 optimMethod = SGD(leaningrate_schedule=Exponential(100, 0.1))
 ```
 ## Plateau ##
 Plateau is the learning rate schedule when a metric has stopped improving. Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. It monitors a quantity and if no improvement is seen for a 'patience' number of epochs, the learning rate is reduced.
 `monitor` quantity to be monitored, can be Loss or score
 `factor` factor by which the learning rate will be reduced. new_lr = lr * factor
 `patience` number of epochs with no improvement after which learning rate will be reduced.
 `mode` one of {min, max}. In min mode, lr will be reduced when the quantity monitored has stopped decreasing;
 in max mode it will be reduced when the quantity monitored has stopped increasing
 `epsilon` threshold for measuring the new optimum, to only focus on significant changes.
 `cooldown` number of epochs to wait before resuming normal operation after lr has been reduced.
 `minLr` lower bound on the learning rate.
 **Scala:**
 ```scala
 val learningRateSchedule = Plateau(monitor="score", factor=0.1, patience=10, mode="min", epsilon=1e-4f, cooldown=0, minLr=0)
 ```
 **Python:**
 ```python
 plateau = Plateau("score", factor=0.1, patience=10, mode="min", epsilon=1e-4, cooldown=0, minLr=0)
 ```
 **Scala example:**
 ```scala
 val optimMethod = new SGD[Double](0.05)
 optimMethod.learningRateSchedule = Plateau(monitor="score", factor=0.1, patience=10, mode="min", epsilon=1e-4f, cooldown=0, minLr=0)
 def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
  (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
 }
 val x = Tensor[Double](Storage(Array(10.0, 10.0)))
 val state = T("epoch" -> 0, "evalCounter" -> 0)
 optimMethod.state = state
 optimMethod.optimize(feval, x)
 > print(optimMethod.learningRateSchedule.currentRate)
 optimMethod.optimize(feval, x)
 > print(optimMethod.learningRateSchedule.currentRate)
 ```
 **Python example:**
 ```python
 optimMethod = SGD(leaningrate_schedule=Plateau("score"))
 ```
 ## Warmup ##
 A learning rate gradual increase policy, where the effective learning rate increase delta after each iteration. Calculation: base_lr + delta * iteration
 `delta` increase amount after each iteration
 **Scala:**
 ```scala
 val learningRateSchedule = Warmup(delta = 0.05)
 ```
 **Python:**
 ```python
 warmup = Warmup(delta=0.05)
 ```
 **Scala example:**
 ```scala
 val lrSchedules = new SequentialSchedule(100)
 lrSchedules.add(Warmup(0.3), 3).add(Poly(3, 100), 100)
 val optimMethod = new SGD[Double](learningRate = 0.1, learningRateSchedule = lrSchedules)
 def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
  return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
 }
 val x = Tensor[Double](Storage(Array(10.0, 10.0)))
 optimMethod.optimize(feval, x)
 > print(optimMethod.learningRateSchedule.currentRate)
 -0.1
 optimMethod.optimize(feval, x)
 > print(optimMethod.learningRateSchedule.currentRate)
 -0.4
 ```
 **Python example:**
 ```python
 optimMethod = SGD(leaningrate_schedule=Warmup(0.05))
 ```
 ## SequentialSchedule ##
 A learning rate scheduler which can stack several learning rate schedulers.
 `iterationPerEpoch` iteration numbers per epoch
 **Scala:**
 ```scala
 val learningRateSchedule = SequentialSchedule(iterationPerEpoch=100)
 ```
 **Python:**
 ```python
 sequentialSchedule = SequentialSchedule(iteration_per_epoch=5)
 ```
 **Scala example:**
 ```scala
 val lrSchedules = new SequentialSchedule(100)
 lrSchedules.add(Warmup(0.3), 3).add(Poly(3, 100), 100)
 val optimMethod = new SGD[Double](learningRate = 0.1, learningRateSchedule = lrSchedules)
 def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
  return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
 }
 val x = Tensor[Double](Storage(Array(10.0, 10.0)))
 optimMethod.optimize(feval, x)
 > print(optimMethod.learningRateSchedule.currentRate)
 -0.1
 optimMethod.optimize(feval, x)
 > print(optimMethod.learningRateSchedule.currentRate)
 -0.4
 optimMethod.optimize(feval, x)
 > print(optimMethod.learningRateSchedule.currentRate)
 -0.7
 optimMethod.optimize(feval, x)
 > print(optimMethod.learningRateSchedule.currentRate)
 -1.0
 optimMethod.optimize(feval, x)
 > print(optimMethod.learningRateSchedule.currentRate)
 -0.9702989999999999
 ```
 **Python example:**
 ```python
 sequentialSchedule = SequentialSchedule(5)
 poly = Poly(0.5, 2)
 sequentialSchedule.add(poly, 5)
 ```
 ## EpochDecay ##
 **Scala:**
 ```scala
 def decay(epoch: Int): Double =
  if (epoch >= 1) 2.0 else if (epoch >= 2) 1.0 else 0.0
 val learningRateSchedule = EpochDecay(decay)
 ```
 It is an epoch decay learning rate schedule. The learning rate decays through a function argument on number of run epochs l_{n + 1} = l_{n} * 0.1 `^` decayType(epoch)
 `decayType` is a function with number of run epochs as the argument
 **Scala example:**
 ```scala
 def decay(epoch: Int): Double =
  if (epoch == 1) 2.0 else if (epoch == 2) 1.0 else 0.0
 val optimMethod = new SGD[Double](1000)
 optimMethod.learningRateSchedule = EpochDecay(decay)
 def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
  return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
 }
 val x = Tensor[Double](Storage(Array(10.0, 10.0)))
 val state = T("epoch" -> 0)
 for(e <- 1 to 3) {
  state("epoch") = e
  optimMethod.state = state
  optimMethod.optimize(feval, x)
  if(e <= 1) {
    assert(optimMethod.learningRateSchedule.currentRate==10)
  } else if (e <= 2) {
    assert(optimMethod.learningRateSchedule.currentRate==100)
  } else {
    assert(optimMethod.learningRateSchedule.currentRate==1000)
  }
 }
 ```
 ## Regime ##
 A structure to specify hyper parameters by start epoch and end epoch. Usually work with [[EpochSchedule]].
 `startEpoch` start epoch
 `endEpoch` end epoch
 `config` config table contains hyper parameters
 ## EpochSchedule ##
 A learning rate schedule which configure the learning rate according to some pre-defined [[Regime]]. If the running epoch is within the interval of a regime `r` [r.startEpoch, r.endEpoch], then the learning
 rate will take the "learningRate" in r.config.
 `regimes` an array of pre-defined [[Regime]].
 **Scala:**
 ```scala
 val regimes: Array[Regime] = Array(
  Regime(1, 3, T("learningRate" -> 1e-2, "weightDecay" -> 2e-4)),
  Regime(4, 7, T("learningRate" -> 5e-3, "weightDecay" -> 2e-4)),
  Regime(8, 10, T("learningRate" -> 1e-3, "weightDecay" -> 0.0))
 )
 val learningRateScheduler = EpochSchedule(regimes)
 ```
 **Scala example:**
 ```scala
 val regimes: Array[Regime] = Array(
  Regime(1, 3, T("learningRate" -> 1e-2, "weightDecay" -> 2e-4)),
  Regime(4, 7, T("learningRate" -> 5e-3, "weightDecay" -> 2e-4)),
  Regime(8, 10, T("learningRate" -> 1e-3, "weightDecay" -> 0.0))
 )
 val state = T("epoch" -> 0)
 val optimMethod = new SGD[Double](0.1)
 optimMethod.learningRateSchedule = EpochSchedule(regimes)
 def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
  return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
 }
 val x = Tensor[Double](Storage(Array(10.0, 10.0)))
 for(e <- 1 to 10) {
  state("epoch") = e
  optimMethod.state = state
  optimMethod.optimize(feval, x)
  if(e <= 3) {
    assert(optimMethod.learningRateSchedule.currentRate==-1e-2)
    assert(optimMethod.weightDecay==2e-4)
  } else if (e <= 7) {
    assert(optimMethod.learningRateSchedule.currentRate==-5e-3)
    assert(optimMethod.weightDecay==2e-4)
  } else if (e <= 10) {
    assert(optimMethod.learningRateSchedule.currentRate==-1e-3)
    assert(optimMethod.weightDecay==0.0)
  }
 }
 ```
 ## EpochStep ##
 A learning rate schedule which rescale the learning rate by `gamma` for each `stepSize` epochs.
 `stepSize` For how many epochs to update the learning rate once
 `gamma` the rescale factor
 **Scala:**
 ```scala
 val learningRateScheduler = EpochStep(1, 0.5)
 ```
 **Scala example:**
 ```scala
 val optimMethod = new SGD[Double](0.1)
 optimMethod.learningRateSchedule = EpochStep(1, 0.5)
 def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
   (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
 }
 val x = Tensor[Double](Storage(Array(10.0, 10.0)))
 val state = T("epoch" -> 0)
 for(e <- 1 to 10) {
   state("epoch") = e
   optimMethod.state = state
   optimMethod.optimize(feval, x)
   assert(optimMethod.learningRateSchedule.currentRate==(-0.1 * Math.pow(0.5, e)))
 }
 ```
--- a/docs/readthedocs/source/doc/PythonAPI/DLlib/model.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/DLlib/model.rst
@ -1,17 +0,0 @@
 Model/Sequential
 ==================
 dllib.keras.models.Model
 ---------------------------
 .. autoclass:: bigdl.dllib.keras.models.Model
    :members:
    :undoc-members:
 dllib.keras.models.Sequential
 ---------------------------
 .. autoclass:: bigdl.dllib.keras.models.Sequential
    :members:
    :undoc-members:
--- a/docs/readthedocs/source/doc/PythonAPI/DLlib/optim-Methods.md
+++ b/docs/readthedocs/source/doc/PythonAPI/DLlib/optim-Methods.md
@ -1,380 +0,0 @@
 # Optimizer
 --------
 ## Adam ##
 **Scala:**
 ```scala
 val optim = new Adam(learningRate=1e-3, learningRateDecay=0.0, beta1=0.9, beta2=0.999, Epsilon=1e-8)
 ```
 **Python:**
 ```python
 optim = Adam(learningrate=1e-3, learningrate_decay=0.0, beta1=0.9, beta2=0.999, epsilon=1e-8, bigdl_type="float")
 ```
 An implementation of Adam optimization, first-order gradient-based optimization of stochastic  objective  functions. http://arxiv.org/pdf/1412.6980.pdf
 `learningRate` learning rate. Default value is 1e-3.
 `learningRateDecay` learning rate decay. Default value is 0.0.
 `beta1` first moment coefficient. Default value is 0.9.
 `beta2` second moment coefficient. Default value is 0.999.
 `Epsilon` for numerical stability. Default value is 1e-8.
 **Scala example:**
 ```scala
 import com.intel.analytics.bigdl.dllib.optim._
 import com.intel.analytics.bigdl.dllib.tensor.Tensor
 import com.intel.analytics.bigdl.dllib.tensor.TensorNumericMath.TensorNumeric.NumericFloat
 import com.intel.analytics.bigdl.dllib.utils.T
 val optm = new Adam(learningRate=0.002)
 def rosenBrock(x: Tensor[Float]): (Float, Tensor[Float]) = {
    // (1) compute f(x)
    val d = x.size(1)
    // x1 = x(i)
    val x1 = Tensor[Float](d - 1).copy(x.narrow(1, 1, d - 1))
    // x(i + 1) - x(i)^2
    x1.cmul(x1).mul(-1).add(x.narrow(1, 2, d - 1))
    // 100 * (x(i + 1) - x(i)^2)^2
    x1.cmul(x1).mul(100)
    // x0 = x(i)
    val x0 = Tensor[Float](d - 1).copy(x.narrow(1, 1, d - 1))
    // 1-x(i)
    x0.mul(-1).add(1)
    x0.cmul(x0)
    // 100*(x(i+1) - x(i)^2)^2 + (1-x(i))^2
    x1.add(x0)
    val fout = x1.sum()
    // (2) compute f(x)/dx
    val dxout = Tensor[Float]().resizeAs(x).zero()
    // df(1:D-1) = - 400*x(1:D-1).*(x(2:D)-x(1:D-1).^2) - 2*(1-x(1:D-1));
    x1.copy(x.narrow(1, 1, d - 1))
    x1.cmul(x1).mul(-1).add(x.narrow(1, 2, d - 1)).cmul(x.narrow(1, 1, d - 1)).mul(-400)
    x0.copy(x.narrow(1, 1, d - 1)).mul(-1).add(1).mul(-2)
    x1.add(x0)
    dxout.narrow(1, 1, d - 1).copy(x1)
    // df(2:D) = df(2:D) + 200*(x(2:D)-x(1:D-1).^2);
    x0.copy(x.narrow(1, 1, d - 1))
    x0.cmul(x0).mul(-1).add(x.narrow(1, 2, d - 1)).mul(200)
    dxout.narrow(1, 2, d - 1).add(x0)
    (fout, dxout)
  }
 val x = Tensor(2).fill(0)
 > print(optm.optimize(rosenBrock, x))
 (0.0019999996
 0.0
 [com.intel.analytics.bigdl.tensor.DenseTensor$mcD$sp of size 2],[D@302d88d8)
 ```
 **Python example:**
 ```python
 optim_method = Adam(learningrate=0.002)
 optimizer = Optimizer(
    model=mlp_model,
    training_rdd=train_data,
    criterion=ClassNLLCriterion(),
    optim_method=optim_method,
    end_trigger=MaxEpoch(20),
    batch_size=32)
 ```
 ## SGD ##
 **Scala:**
 ```scala
 val optimMethod = new SGD(learningRate= 1e-3,learningRateDecay=0.0,
                      weightDecay=0.0,momentum=0.0,dampening=Double.MaxValue,
                      nesterov=false,learningRateSchedule=Default(),
                      learningRates=null,weightDecays=null)
 ```
 **Python:**
 ```python
 optim_method = SGD(learningrate=1e-3,learningrate_decay=0.0,weightdecay=0.0,
                   momentum=0.0,dampening=DOUBLEMAX,nesterov=False,
                   leaningrate_schedule=None,learningrates=None,
                   weightdecays=None,bigdl_type="float")
 ```
 A plain implementation of SGD which provides optimize method. After setting
 optimization method when create Optimize, Optimize will call optimization method at the end of
 each iteration.
 **Scala example:**
 ```scala
 val optimMethod = new SGD[Float](learningRate= 1e-3,learningRateDecay=0.0,
                               weightDecay=0.0,momentum=0.0,dampening=Double.MaxValue,
                               nesterov=false,learningRateSchedule=Default(),
                               learningRates=null,weightDecays=null)
 optimizer.setOptimMethod(optimMethod)
 ```
 **Python example:**
 ```python
 optim_method = SGD(learningrate=1e-3,learningrate_decay=0.0,weightdecay=0.0,
                  momentum=0.0,dampening=DOUBLEMAX,nesterov=False,
                  leaningrate_schedule=None,learningrates=None,
                  weightdecays=None,bigdl_type="float")
 optimizer = Optimizer(
    model=mlp_model,
    training_rdd=train_data,
    criterion=ClassNLLCriterion(),
    optim_method=optim_method,
    end_trigger=MaxEpoch(20),
    batch_size=32)
 ```
 ## Adadelta ##
 *AdaDelta* implementation for *SGD*
 It has been proposed in `ADADELTA: An Adaptive Learning Rate Method`.
 http://arxiv.org/abs/1212.5701.
 **Scala:**
 ```scala
 val optimMethod = new Adadelta(decayRate = 0.9, Epsilon = 1e-10)
 ```
 **Python:**
 ```python
 optim_method = AdaDelta(decayrate = 0.9, epsilon = 1e-10)
 ```
 **Scala example:**
 ```scala
 optimizer.setOptimMethod(new Adadelta(0.9, 1e-10))
 ```
 **Python example:**
 ```python
 optimizer = Optimizer(
    model=mlp_model,
    training_rdd=train_data,
    criterion=ClassNLLCriterion(),
    optim_method=Adadelta(0.9, 0.00001),
    end_trigger=MaxEpoch(20),
    batch_size=32)
 ```
 ## RMSprop ##
 An implementation of RMSprop (Reference: http://arxiv.org/pdf/1308.0850v5.pdf, Sec 4.2)
 * learningRate : learning rate
 * learningRateDecay : learning rate decay
 * decayRate : decayRate, also called rho
 * Epsilon : for numerical stability
 ## Adamax ##
 An implementation of Adamax http://arxiv.org/pdf/1412.6980.pdf
 Arguments:
 * learningRate : learning rate
 * beta1 : first moment coefficient
 * beta2 : second moment coefficient
 * Epsilon : for numerical stability
 Returns:
 the new x vector and the function list {fx}, evaluated before the update
 ## Adagrad ##
 **Scala:**
 ```scala
 val adagrad = new Adagrad(learningRate = 1e-3,
                          learningRateDecay = 0.0,
                          weightDecay = 0.0)
 ```
 An implementation of Adagrad. See the original paper:
 <http://jmlr.org/papers/volume12/duchi11a/duchi11a.pdf>
 **Scala example:**
 ```scala
 import com.intel.analytics.bigdl.dllib.tensor.TensorNumericMath.TensorNumeric.NumericFloat
 import com.intel.analytics.bigdl.dllib.optim._
 import com.intel.analytics.bigdl.dllib.tensor._
 import com.intel.analytics.bigdl.dllib.utils.T
 val adagrad = new Adagrad(0.01, 0.0, 0.0)
    def feval(x: Tensor[Float]): (Float, Tensor[Float]) = {
      // (1) compute f(x)
      val d = x.size(1)
      // x1 = x(i)
      val x1 = Tensor[Float](d - 1).copy(x.narrow(1, 1, d - 1))
      // x(i + 1) - x(i)^2
      x1.cmul(x1).mul(-1).add(x.narrow(1, 2, d - 1))
      // 100 * (x(i + 1) - x(i)^2)^2
      x1.cmul(x1).mul(100)
      // x0 = x(i)
      val x0 = Tensor[Float](d - 1).copy(x.narrow(1, 1, d - 1))
      // 1-x(i)
      x0.mul(-1).add(1)
      x0.cmul(x0)
      // 100*(x(i+1) - x(i)^2)^2 + (1-x(i))^2
      x1.add(x0)
      val fout = x1.sum()
      // (2) compute f(x)/dx
      val dxout = Tensor[Float]().resizeAs(x).zero()
      // df(1:D-1) = - 400*x(1:D-1).*(x(2:D)-x(1:D-1).^2) - 2*(1-x(1:D-1));
      x1.copy(x.narrow(1, 1, d - 1))
      x1.cmul(x1).mul(-1).add(x.narrow(1, 2, d - 1)).cmul(x.narrow(1, 1, d - 1)).mul(-400)
      x0.copy(x.narrow(1, 1, d - 1)).mul(-1).add(1).mul(-2)
      x1.add(x0)
      dxout.narrow(1, 1, d - 1).copy(x1)
      // df(2:D) = df(2:D) + 200*(x(2:D)-x(1:D-1).^2);
      x0.copy(x.narrow(1, 1, d - 1))
      x0.cmul(x0).mul(-1).add(x.narrow(1, 2, d - 1)).mul(200)
      dxout.narrow(1, 2, d - 1).add(x0)
      (fout, dxout)
    }
 val x = Tensor(2).fill(0)
 val config = T("learningRate" -> 1e-1)
 for (i <- 1 to 10) {
  adagrad.optimize(feval, x, config, config)
 }
 x after optimize: 0.27779138
 0.07226955
 [com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 2]
 ```
 ## LBFGS ##
 **Scala:**
 ```scala
 val optimMethod = new LBFGS(maxIter=20, maxEval=Double.MaxValue,
                            tolFun=1e-5, tolX=1e-9, nCorrection=100,
                            learningRate=1.0, lineSearch=None, lineSearchOptions=None)
 ```
 **Python:**
 ```python
 optim_method = LBFGS(max_iter=20, max_eval=Double.MaxValue, \
                 tol_fun=1e-5, tol_x=1e-9, n_correction=100, \
                 learning_rate=1.0, line_search=None, line_search_options=None)
 ```
 This implementation of L-BFGS relies on a user-provided line search function
 (state.lineSearch). If this function is not provided, then a simple learningRate
 is used to produce fixed size steps. Fixed size steps are much less costly than line
 searches, and can be useful for stochastic problems.
 The learning rate is used even when a line search is provided.This is also useful for
 large-scale stochastic problems, where opfunc is a noisy approximation of f(x). In that
 case, the learning rate allows a reduction of confidence in the step size.
 **Parameters:**
 * maxIter - Maximum number of iterations allowed. Default: 20
 * maxEval - Maximum number of function evaluations. Default: Double.MaxValue
 * tolFun - Termination tolerance on the first-order optimality. Default: 1e-5
 * tolX - Termination tol on progress in terms of func/param changes. Default: 1e-9
 * learningRate - the learning rate. Default: 1.0
 * lineSearch - A line search function. Default: None
 * lineSearchOptions - If no line search provided, then a fixed step size is used. Default: None
 **Scala example:**
 ```scala
 val optimMethod = new LBFGS(maxIter=20, maxEval=Double.MaxValue,
                            tolFun=1e-5, tolX=1e-9, nCorrection=100,
                            learningRate=1.0, lineSearch=None, lineSearchOptions=None)
 optimizer.setOptimMethod(optimMethod)
 ```
 **Python example:**
 ```python
 optim_method = LBFGS(max_iter=20, max_eval=DOUBLEMAX, \
                 tol_fun=1e-5, tol_x=1e-9, n_correction=100, \
                 learning_rate=1.0, line_search=None, line_search_options=None)
 optimizer = Optimizer(
    model=mlp_model,
    training_rdd=train_data,
    criterion=ClassNLLCriterion(),
    optim_method=optim_method,
    end_trigger=MaxEpoch(20),
    batch_size=32)
 ```
 ## Ftrl ##
 **Scala:**
 ```scala
 val optimMethod = new Ftrl(
  learningRate = 1e-3, learningRatePower = -0.5,
  initialAccumulatorValue = 0.1, l1RegularizationStrength = 0.0,
  l2RegularizationStrength = 0.0, l2ShrinkageRegularizationStrength = 0.0)
 ```
 **Python:**
 ```python
 optim_method = Ftrl(learningrate = 1e-3, learningrate_power = -0.5, \
                 initial_accumulator_value = 0.1, l1_regularization_strength = 0.0, \
                 l2_regularization_strength = 0.0, l2_shrinkage_regularization_strength = 0.0)
 ```
 An implementation of (Ftrl)[https://www.eecs.tufts.edu/~dsculley/papers/ad-click-prediction.pdf.]
 Support L1 penalty, L2 penalty and shrinkage-type L2 penalty.
 **Parameters:**
 * learningRate: learning rate
 * learningRatePower: double, must be less or equal to zero. Default is -0.5.
 * initialAccumulatorValue: double, the starting value for accumulators, require zero or positive values. Default is 0.1.
 * l1RegularizationStrength: double, must be greater or equal to zero. Default is zero.
 * l2RegularizationStrength: double, must be greater or equal to zero. Default is zero.
 * l2ShrinkageRegularizationStrength: double, must be greater or equal to zero. Default is zero. This differs from l2RegularizationStrength above. L2 above is a stabilization penalty, whereas this one is a magnitude penalty.
 **Scala example:**
 ```scala
 val optimMethod = new Ftrl(learningRate = 5e-3, learningRatePower = -0.5,
  initialAccumulatorValue = 0.01)
 optimizer.setOptimMethod(optimMethod)
 ```
 **Python example:**
 ```python
 optim_method = Ftrl(learningrate = 5e-3, \
    learningrate_power = -0.5, \
    initial_accumulator_value = 0.01)
 optimizer = Optimizer(
    model=mlp_model,
    training_rdd=train_data,
    criterion=ClassNLLCriterion(),
    optim_method=optim_method,
    end_trigger=MaxEpoch(20),
    batch_size=32)
 ```
 ## ParallelAdam ##
 Multi-Thread version of [Adam](#adam).
 **Scala:**
 ```scala
 val optim = new ParallelAdam(learningRate=1e-3, learningRateDecay=0.0, beta1=0.9, beta2=0.999, Epsilon=1e-8, parallelNum=Engine.coreNumber())
 ```
 **Python:**
 ```python
 optim = ParallelAdam(learningrate=1e-3, learningrate_decay=0.0, beta1=0.9, beta2=0.999, epsilon=1e-8, parallel_num=get_node_and_core_number()[1], bigdl_type="float")
 ```
--- a/docs/readthedocs/source/doc/PythonAPI/DLlib/regularizers.md
+++ b/docs/readthedocs/source/doc/PythonAPI/DLlib/regularizers.md
@ -1,270 +0,0 @@
 # Regularizer
 --------
 ## L1 Regularizer ##
 **Scala:**
 ```scala
 val l1Regularizer = L1Regularizer(rate)
 ```
 **Python:**
 ```python
 regularizerl1 = L1Regularizer(rate)
 ```
 L1 regularizer is used to add penalty to the gradWeight to avoid overfitting.
 In our code implementation, gradWeight = gradWeight + alpha * abs(weight)
 For more details, please refer to [wiki](https://en.wikipedia.org/wiki/Regularization_(mathematics)).
 **Scala example:**
 ```scala
 import com.intel.analytics.bigdl.dllib.utils.RandomGenerator.RNG
 import com.intel.analytics.bigdl.dllib.tensor._
 import com.intel.analytics.bigdl.dllib.optim._
 import com.intel.analytics.bigdl.numeric.NumericFloat
 import com.intel.analytics.bigdl.dllib.nn._
 RNG.setSeed(100)
 val input = Tensor(3, 5).rand
 val gradOutput = Tensor(3, 5).rand
 val linear = Linear(5, 5, wRegularizer = L1Regularizer(0.2), bRegularizer = L1Regularizer(0.2))
 val output = linear.forward(input)
 val gradInput = linear.backward(input, gradOutput)
 scala> input
 input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
 0.54340494      0.67115563      0.2783694       0.4120464       0.4245176
 0.52638245      0.84477615      0.14860484      0.004718862     0.15671109
 0.12156912      0.18646719      0.67074907      0.21010774      0.82585275
 [com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]
 scala> gradOutput
 gradOutput: com.intel.analytics.bigdl.tensor.Tensor[Float] =
 0.4527399       0.13670659      0.87014264      0.5750933       0.063681036
 0.89132196      0.62431186      0.20920213      0.52334774      0.18532822
 0.5622963       0.10837689      0.0058171963    0.21969749      0.3074232
 [com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]
 scala> linear.gradWeight
 res2: com.intel.analytics.bigdl.tensor.Tensor[Float] =
 0.9835552       1.3616763       0.83564335      0.108898684     0.59625006
 0.21608911      0.8393639       0.0035243928    -0.11795368     0.4453743
 0.38366735      0.9618148       0.47721142      0.5607486       0.6069793
 0.81469804      0.6690552       0.18522228      0.08559488      0.7075894
 -0.030468717    0.056625083     0.051471338     0.2917061       0.109963015
 [com.intel.analytics.bigdl.tensor.DenseTensor of size 5x5]
 ```
 **Python example:**
 ```python
 from bigdl.dllib.nn.layer import *
 from bigdl.dllib.nn.criterion import *
 from bigdl.dllib.optim.optimizer import *
 from bigdl.dllib.util.common import *
 input = np.random.uniform(0, 1, (3, 5)).astype("float32")
 gradOutput = np.random.uniform(0, 1, (3, 5)).astype("float32")
 linear = Linear(5, 5, wRegularizer = L1Regularizer(0.2), bRegularizer = L1Regularizer(0.2))
 output = linear.forward(input)
 gradInput = linear.backward(input, gradOutput)
 > linear.parameters()
 {u'Linear@596d857b': {u'bias': array([ 0.3185505 , -0.02004393,  0.34620118, -0.09206461,  0.40776938], dtype=float32),
  u'gradBias': array([ 2.14087653,  1.82181644,  1.90674937,  1.37307787,  0.81534696], dtype=float32),
  u'gradWeight': array([[ 0.34909648,  0.85083449,  1.44904375,  0.90150446,  0.57136625],
         [ 0.3745544 ,  0.42218602,  1.53656614,  1.1836741 ,  1.00702667],
         [ 0.30529332,  0.26813674,  0.85559171,  0.61224306,  0.34721529],
         [ 0.22859855,  0.8535381 ,  1.19809723,  1.37248564,  0.50041491],
         [ 0.36197871,  0.03069445,  0.64837945,  0.12765063,  0.12872688]], dtype=float32),
  u'weight': array([[-0.12423037,  0.35694697,  0.39038274, -0.34970999, -0.08283543],
         [-0.4186025 , -0.33235055,  0.34948507,  0.39953214,  0.16294235],
         [-0.25171402, -0.28955361, -0.32243955, -0.19771226, -0.29320192],
         [-0.39263198,  0.37766701,  0.14673658,  0.24882999, -0.0779015 ],
         [ 0.0323218 , -0.31266898,  0.31543773, -0.0898933 , -0.33485892]], dtype=float32)}}
 ```
 ## L2 Regularizer ##
 **Scala:**
 ```scala
 val l2Regularizer = L2Regularizer(rate)
 ```
 **Python:**
 ```python
 regularizerl2 = L2Regularizer(rate)
 ```
 L2 regularizer is used to add penalty to the gradWeight to avoid overfitting.
 In our code implementation, gradWeight = gradWeight + alpha * weight * weight
 For more details, please refer to [wiki](https://en.wikipedia.org/wiki/Regularization_(mathematics)).
 **Scala example:**
 ```scala
 import com.intel.analytics.bigdl.dllib.utils.RandomGenerator.RNG
 import com.intel.analytics.bigdl.dllib.tensor._
 import com.intel.analytics.bigdl.dllib.optim._
 import com.intel.analytics.bigdl.numeric.NumericFloat
 import com.intel.analytics.bigdl.dllib.nn._
 RNG.setSeed(100)
 val input = Tensor(3, 5).rand
 val gradOutput = Tensor(3, 5).rand
 val linear = Linear(5, 5, wRegularizer = L2Regularizer(0.2), bRegularizer = L2Regularizer(0.2))
 val output = linear.forward(input)
 val gradInput = linear.backward(input, gradOutput)
 scala> input
 input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
 0.54340494      0.67115563      0.2783694       0.4120464       0.4245176
 0.52638245      0.84477615      0.14860484      0.004718862     0.15671109
 0.12156912      0.18646719      0.67074907      0.21010774      0.82585275
 [com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]
 scala> gradOutput
 gradOutput: com.intel.analytics.bigdl.tensor.Tensor[Float] =
 0.4527399       0.13670659      0.87014264      0.5750933       0.063681036
 0.89132196      0.62431186      0.20920213      0.52334774      0.18532822
 0.5622963       0.10837689      0.0058171963    0.21969749      0.3074232
 [com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]
 scala> linear.gradWeight
 res0: com.intel.analytics.bigdl.tensor.Tensor[Float] =
 1.0329735       0.047239657     0.8979603       0.53614384      1.2781229
 0.5621818       0.29772854      0.69706535      0.30559152      0.8352279
 1.3044653       0.43065858      0.9896795       0.7435816       1.6003494
 0.94218314      0.6793372       0.97101355      0.62892824      1.3458569
 0.73134506      0.5975239       0.9109101       0.59374434      1.1656629
 [com.intel.analytics.bigdl.tensor.DenseTensor of size 5x5]
 ```
 **Python example:**
 ```python
 from bigdl.dllib.nn.layer import *
 from bigdl.dllib.nn.criterion import *
 from bigdl.dllib.optim.optimizer import *
 from bigdl.dllib.util.common import *
 input = np.random.uniform(0, 1, (3, 5)).astype("float32")
 gradOutput = np.random.uniform(0, 1, (3, 5)).astype("float32")
 linear = Linear(5, 5, wRegularizer = L2Regularizer(0.2), bRegularizer = L2Regularizer(0.2))
 output = linear.forward(input)
 gradInput = linear.backward(input, gradOutput)
 > linear.parameters()
 {u'Linear@787aab5e': {u'bias': array([-0.43960261, -0.12444571,  0.22857292, -0.43216187,  0.27770036], dtype=float32),
  u'gradBias': array([ 0.51726723,  1.32883406,  0.57567948,  1.7791357 ,  1.2887038 ], dtype=float32),
  u'gradWeight': array([[ 0.45477036,  0.22262168,  0.21923628,  0.26152173,  0.19836383],
         [ 1.12261093,  0.72921795,  0.08405925,  0.78192139,  0.48798928],
         [ 0.34581488,  0.21195598,  0.26357424,  0.18987852,  0.2465664 ],
         [ 1.18659711,  1.11271608,  0.72589797,  1.19098675,  0.33769298],
         [ 0.82314551,  0.71177536,  0.4428404 ,  0.764337  ,  0.3500182 ]], dtype=float32),
  u'weight': array([[ 0.03727285, -0.39697152,  0.42733836, -0.34291714, -0.13833708],
         [ 0.09232076, -0.09720675, -0.33625153,  0.06477787, -0.34739712],
         [ 0.17145753,  0.10128133,  0.16679128, -0.33541158,  0.40437087],
         [-0.03005157, -0.36412898,  0.0629965 ,  0.13443278, -0.38414535],
         [-0.16630849,  0.06934392,  0.40328237,  0.22299488, -0.1178569 ]], dtype=float32)}}
 ```
 ## L1L2 Regularizer ##
 **Scala:**
 ```scala
 val l1l2Regularizer = L1L2Regularizer(l1rate, l2rate)
 ```
 **Python:**
 ```python
 regularizerl1l2 = L1L2Regularizer(l1rate, l2rate)
 ```
 L1L2 regularizer is used to add penalty to the gradWeight to avoid overfitting.
 In our code implementation, we will apply L1regularizer and L2regularizer sequentially.
 For more details, please refer to [wiki](https://en.wikipedia.org/wiki/Regularization_(mathematics)).
 **Scala example:**
 ```scala
 import com.intel.analytics.bigdl.dllib.utils.RandomGenerator.RNG
 import com.intel.analytics.bigdl.dllib.tensor._
 import com.intel.analytics.bigdl.dllib.optim._
 import com.intel.analytics.bigdl.numeric.NumericFloat
 import com.intel.analytics.bigdl.dllib.nn._
 RNG.setSeed(100)
 val input = Tensor(3, 5).rand
 val gradOutput = Tensor(3, 5).rand
 val linear = Linear(5, 5, wRegularizer = L1L2Regularizer(0.2, 0.2), bRegularizer = L1L2Regularizer(0.2, 0.2))
 val output = linear.forward(input)
 val gradInput = linear.backward(input, gradOutput)
 scala> input
 input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
 0.54340494      0.67115563      0.2783694       0.4120464       0.4245176
 0.52638245      0.84477615      0.14860484      0.004718862     0.15671109
 0.12156912      0.18646719      0.67074907      0.21010774      0.82585275
 [com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]
 scala> gradOutput
 gradOutput: com.intel.analytics.bigdl.tensor.Tensor[Float] =
 0.4527399       0.13670659      0.87014264      0.5750933       0.063681036
 0.89132196      0.62431186      0.20920213      0.52334774      0.18532822
 0.5622963       0.10837689      0.0058171963    0.21969749      0.3074232
 [com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]
 scala> linear.gradWeight
 res1: com.intel.analytics.bigdl.tensor.Tensor[Float] =
 1.069174        1.4422078       0.8913989       0.042112567     0.53756505
 0.14077617      0.8959319       -0.030221784    -0.1583686      0.4690558
 0.37145022      0.99747723      0.5559263       0.58614403      0.66380215
 0.88983417      0.639738        0.14924419      0.027530536     0.71988696
 -0.053217214    -8.643427E-4    -0.036953792    0.29753304      0.06567569
 [com.intel.analytics.bigdl.tensor.DenseTensor of size 5x5]
 ```
 **Python example:**
 ```python
 from bigdl.dllib.nn.layer import *
 from bigdl.dllib.nn.criterion import *
 from bigdl.dllib.optim.optimizer import *
 from bigdl.dllib.util.common import *
 input = np.random.uniform(0, 1, (3, 5)).astype("float32")
 gradOutput = np.random.uniform(0, 1, (3, 5)).astype("float32")
 linear = Linear(5, 5, wRegularizer = L1L2Regularizer(0.2, 0.2), bRegularizer = L1L2Regularizer(0.2, 0.2))
 output = linear.forward(input)
 gradInput = linear.backward(input, gradOutput)
 > linear.parameters()
 {u'Linear@1356aa91': {u'bias': array([-0.05799473, -0.0548001 ,  0.00408955, -0.22004321, -0.07143869], dtype=float32),
  u'gradBias': array([ 0.89119786,  1.09953558,  1.03394508,  1.19511735,  2.02241182], dtype=float32),
  u'gradWeight': array([[ 0.89061081,  0.58810186, -0.10087357,  0.19108151,  0.60029608],
         [ 0.95275503,  0.2333075 ,  0.46897018,  0.74429053,  1.16038764],
         [ 0.22894514,  0.60031962,  0.3836292 ,  0.15895618,  0.83136207],
         [ 0.49079862,  0.80913013,  0.55491877,  0.69608945,  0.80458677],
         [ 0.98890561,  0.49226439,  0.14861123,  1.37666655,  1.47615671]], dtype=float32),
  u'weight': array([[ 0.44654208,  0.16320795, -0.36029238, -0.25365737, -0.41974261],
         [ 0.18809238, -0.28065765,  0.27677274, -0.29904234,  0.41338971],
         [-0.03731538,  0.22493915,  0.10021331, -0.19495697,  0.25470355],
         [-0.30836752,  0.12083009,  0.3773002 ,  0.24059358, -0.40325543],
         [-0.13601269, -0.39310011, -0.05292636,  0.20001481, -0.08444868]], dtype=float32)}}
 ```
--- a/docs/readthedocs/source/doc/PythonAPI/Friesian/feature.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/Friesian/feature.rst
@ -1,11 +0,0 @@
 Friesian Feature API
 =====================
 friesian.feature.table
 ---------------------------
 .. automodule:: bigdl.friesian.feature.table
    :members:
    :undoc-members:
    :show-inheritance:
--- a/docs/readthedocs/source/doc/PythonAPI/Friesian/index.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/Friesian/index.rst
@ -1,7 +0,0 @@
 Friesian API
 ==================
 .. toctree::
    :maxdepth: 2
    feature.rst
--- a/docs/readthedocs/source/doc/PythonAPI/LLM/langchain.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/LLM/langchain.rst
@ -37,7 +37,7 @@ For ``llama``/``chatglm``/``bloom``/``gptneox``/``starcoder`` model families, yo
    .. tab:: Llama
-        .. autoclass:: ipex_llm.langchain.llms.ipexllm.LlamaLLM
+        .. autoclass:: ipex_llm.langchain.llms.LlamaLLM
            :members:
            :undoc-members:
            :show-inheritance:
@ -49,7 +49,7 @@ For ``llama``/``chatglm``/``bloom``/``gptneox``/``starcoder`` model families, yo
    .. tab:: ChatGLM
-        .. autoclass:: ipex_llm.langchain.llms.ipexllm.ChatGLMLLM
+        .. autoclass:: ipex_llm.langchain.llms.ChatGLMLLM
            :members:
            :undoc-members:
            :show-inheritance:
@ -61,7 +61,7 @@ For ``llama``/``chatglm``/``bloom``/``gptneox``/``starcoder`` model families, yo
    .. tab:: Bloom
-        .. autoclass:: ipex_llm.langchain.llms.ipexllm.BloomLLM
+        .. autoclass:: ipex_llm.langchain.llms.BloomLLM
            :members:
            :undoc-members:
            :show-inheritance:
@ -73,7 +73,7 @@ For ``llama``/``chatglm``/``bloom``/``gptneox``/``starcoder`` model families, yo
    .. tab:: Gptneox
-        .. autoclass:: ipex_llm.langchain.llms.ipexllm.GptneoxLLM
+        .. autoclass:: ipex_llm.langchain.llms.GptneoxLLM
            :members:
            :undoc-members:
            :show-inheritance:
@ -85,7 +85,7 @@ For ``llama``/``chatglm``/``bloom``/``gptneox``/``starcoder`` model families, yo
    .. tab:: Starcoder
-        .. autoclass:: ipex_llm.langchain.llms.ipexllm.StarcoderLLM
+        .. autoclass:: ipex_llm.langchain.llms.StarcoderLLM
            :members:
            :undoc-members:
            :show-inheritance:
@ -117,7 +117,7 @@ For ``llama``/``bloom``/``gptneox``/``starcoder`` model families, you could also
    .. tab:: Llama
-        .. autoclass:: ipex_llm.langchain.embeddings.ipexllm.LlamaEmbeddings
+        .. autoclass:: ipex_llm.langchain.embeddings.LlamaEmbeddings
            :members:
            :undoc-members:
            :show-inheritance:
@ -129,7 +129,7 @@ For ``llama``/``bloom``/``gptneox``/``starcoder`` model families, you could also
    .. tab:: Bloom
-        .. autoclass:: ipex_llm.langchain.embeddings.ipexllm.BloomEmbeddings
+        .. autoclass:: ipex_llm.langchain.embeddings.BloomEmbeddings
            :members:
            :undoc-members:
            :show-inheritance:
@ -141,7 +141,7 @@ For ``llama``/``bloom``/``gptneox``/``starcoder`` model families, you could also
    .. tab:: Gptneox
-        .. autoclass:: ipex_llm.langchain.embeddings.ipexllm.GptneoxEmbeddings
+        .. autoclass:: ipex_llm.langchain.embeddings.GptneoxEmbeddings
            :members:
            :undoc-members:
            :show-inheritance:
@ -153,7 +153,7 @@ For ``llama``/``bloom``/``gptneox``/``starcoder`` model families, you could also
    .. tab:: Starcoder
-        .. autoclass:: ipex_llm.langchain.embeddings.ipexllm.StarcoderEmbeddings
+        .. autoclass:: ipex_llm.langchain.embeddings.StarcoderEmbeddings
            :members:
            :undoc-members:
            :show-inheritance:
--- a/docs/readthedocs/source/doc/PythonAPI/Nano/hpo_api.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/Nano/hpo_api.rst
@ -1,42 +0,0 @@
 Nano HPO API
 ==================
 Search Space
 ---------------------------
 .. autoclass:: bigdl.nano.automl.hpo.space.Categorical
 .. autoclass:: bigdl.nano.automl.hpo.space.Real
 .. autoclass:: bigdl.nano.automl.hpo.space.Int
 .. autoclass:: bigdl.nano.automl.hpo.space.Bool
 HPO for Tensorflow
 ---------------------------
 bigdl.nano.automl.tf.keras.Model
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 .. autoclass:: bigdl.nano.automl.tf.keras.Model
    :members: search, search_summary
 bigdl.nano.automl.tf.keras.Sequential
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 .. autoclass:: bigdl.nano.automl.tf.keras.Sequential
    :members: search, search_summary
 HPO for PyTorch
 ---------------------------
 bigdl.nano.pytorch.Trainer
 ^^^^^^^^^^^^^^^^^^^^^^^^^^
 .. autoclass:: bigdl.nano.pytorch.Trainer
    :members: search, search_summary
    :undoc-members:
--- a/docs/readthedocs/source/doc/PythonAPI/Nano/index.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/Nano/index.rst
@ -1,19 +0,0 @@
 Nano API
 ==================
 .. toctree::
    :maxdepth: 2
    pytorch.rst
 .. toctree::
    :maxdepth: 2
    tensorflow.rst
 .. toctree::
    :maxdepth: 3
    hpo_api.rst
--- a/docs/readthedocs/source/doc/PythonAPI/Nano/pytorch.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/Nano/pytorch.rst
@ -1,44 +0,0 @@
 Nano PyTorch API
 ==================
 bigdl.nano.pytorch.Trainer
 ---------------------------
 .. autoclass:: bigdl.nano.pytorch.Trainer
    :members:
    :undoc-members:
    :exclude-members: accelerator_connector, checkpoint_connector, reload_dataloaders_every_n_epochs, limit_val_batches, logger, logger_connector, state
 bigdl.nano.pytorch.InferenceOptimizer
 ---------------------------------------
 .. autoclass:: bigdl.nano.pytorch.InferenceOptimizer
    :members:
    :undoc-members:
    :exclude-members: ALL_INFERENCE_ACCELERATION_METHOD, DEFAULT_INFERENCE_ACCELERATION_METHOD, method
    :inherited-members:
 TorchNano API
 ---------------------------
 .. autoclass:: bigdl.nano.pytorch.TorchNano
    :members:
    :undoc-members:
    :exclude-members: run
 .. autofunction:: bigdl.nano.pytorch.nano
 Patch API
 ---------------------------
 .. autofunction:: bigdl.nano.pytorch.patch_torch
 .. autofunction:: bigdl.nano.pytorch.unpatch_torch
 .. autofunction:: bigdl.nano.pytorch.patching.patch_cuda
 .. autofunction:: bigdl.nano.pytorch.patching.unpatch_cuda
 .. autofunction:: bigdl.nano.pytorch.patching.patch_dtype
 .. autofunction:: bigdl.nano.pytorch.patching.patch_encryption
--- a/docs/readthedocs/source/doc/PythonAPI/Nano/tensorflow.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/Nano/tensorflow.rst
@ -1,41 +0,0 @@
 Nano Tensorflow API
 ==================
 bigdl.nano.tf.keras
 ---------------------------
 .. autoclass:: bigdl.nano.tf.keras.Model
    :members: fit, quantize, trace
    :undoc-members:
 .. autoclass:: bigdl.nano.tf.keras.Sequential
    :members:
    :undoc-members:
    :inherited-members: Sequential
 .. autoclass:: bigdl.nano.tf.keras.layers.Embedding
    :members:
    :undoc-members:
 bigdl.nano.tf.optimizers
 ---------------------------
 .. autoclass:: bigdl.nano.tf.optimizers.SparseAdam
    :members: 
    :undoc-members:
 bigdl.nano.tf.keras.InferenceOptimizer
 ---------------------------------------
 .. autoclass:: bigdl.nano.tf.keras.InferenceOptimizer
    :members:
    :undoc-members:
    :exclude-members: ALL_INFERENCE_ACCELERATION_METHOD
    :inherited-members:
 Patch API
 ---------------------------
 .. autofunction:: bigdl.nano.tf.patch_tensorflow
 .. autofunction:: bigdl.nano.tf.unpatch_tensorflow
 .. autofunction:: bigdl.nano.tf.keras.nano_bf16
--- a/docs/readthedocs/source/doc/PythonAPI/Orca/automl.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/Orca/automl.rst
@ -1,41 +0,0 @@
 Orca AutoML
 ============================
 orca.automl.auto_estimator
 ---------------------------
 A general estimator supports automatic model tuning. It allows users to fit and search the best hyperparameter for their model.
 .. automodule:: bigdl.orca.automl.auto_estimator
    :members:
    :show-inheritance:
 orca.automl.hp
 ----------------------------------------
 Sampling specs to be used in search space configuration.
 .. automodule:: bigdl.orca.automl.hp
    :members:
    :show-inheritance:
 orca.automl.metrics
 ----------------------------
 Evaluate unscaled metrics between y true value and y predicted value.
 .. automodule:: bigdl.orca.automl.metrics
    :members:
    :show-inheritance:
 orca.automl.auto_xgb
 ---------------------------
 Automatic hyperparameter optimization for XGBoost models.
 AutoXGBoost is inherited from AutoEstimator. You could refer to `AutoEstimator API Guide <https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/AutoML/automl.html#orca-automl-auto-estimator>`__ for more APIs.
 .. automodule:: bigdl.orca.automl.xgboost.auto_xgb
    :members:
    :show-inheritance:
--- a/docs/readthedocs/source/doc/PythonAPI/Orca/context.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/Orca/context.rst
@ -1,15 +0,0 @@
 Orca Context
 =========
 orca.init_orca_context
 -------------------------
 .. automodule:: bigdl.orca.common
    :members: init_orca_context
    :undoc-members:
    :show-inheritance:
--- a/docs/readthedocs/source/doc/PythonAPI/Orca/data.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/Orca/data.rst
@ -1,20 +0,0 @@
 Orca Data
 =========
 orca.data.XShards
 ---------------------------
 .. autoclass:: bigdl.orca.data.XShards
    :members:
    :undoc-members:
    :show-inheritance:
 orca.data.pandas
 ---------------------------
 .. automodule:: bigdl.orca.data.pandas.preprocessing
    :members:
    :undoc-members:
    :show-inheritance:
--- a/docs/readthedocs/source/doc/PythonAPI/Orca/index.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/Orca/index.rst
@ -1,10 +0,0 @@
 Orca API
 ==================
 .. toctree::
    :maxdepth: 2
    context.rst
    data.rst
    orca.rst
    automl.rst
--- a/docs/readthedocs/source/doc/PythonAPI/Orca/orca.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/Orca/orca.rst
@ -1,87 +0,0 @@
 Orca Learn
 =========
 orca.learn.bigdl.estimator
 ---------------------------
 .. automodule:: bigdl.orca.learn.bigdl.estimator
    :members:
    :undoc-members:
    :show-inheritance:
 orca.learn.tf.estimator
 ------------------------
 .. automodule:: bigdl.orca.learn.tf.estimator
    :members:
    :undoc-members:
    :show-inheritance:
 orca.learn.tf2.estimator
 -------------------------
 .. automodule:: bigdl.orca.learn.tf2.estimator
    :members:
    :undoc-members:
    :show-inheritance:
 orca.learn.tf2.tf2_ray_estimator
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 Orca TF2Estimator with backend of "horovod" or "ray".
 .. autoclass:: bigdl.orca.learn.tf2.ray_estimator.TensorFlow2Estimator
    :members:
    :undoc-members:
    :show-inheritance:
 orca.learn.tf2.tf2_spark_estimator
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 Orca TF2Estimator with backend of "spark".
 .. autoclass:: bigdl.orca.learn.tf2.pyspark_estimator.SparkTFEstimator
    :members:
    :undoc-members:
    :show-inheritance:
 orca.learn.pytorch.estimator
 -----------------------------
 .. automodule:: bigdl.orca.learn.pytorch.estimator
    :members:
    :undoc-members:
    :show-inheritance:
 orca.learn.pytorch.pytorch_ray_estimator
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 Orca Pytorch Estimator with backend of "horovod" or "ray".
 .. autoclass:: bigdl.orca.learn.pytorch.pytorch_ray_estimator.PyTorchRayEstimator
    :members:
    :undoc-members:
    :show-inheritance:
 orca.learn.openvino.estimator
 ------------------------------
 .. automodule:: bigdl.orca.learn.openvino.estimator
    :members:
    :undoc-members:
    :show-inheritance:
 orca.learn.mpi.mpi_estimator
 ------------------------------
 .. autoclass:: bigdl.orca.learn.mpi.MPIEstimator
    :members:
    :undoc-members:
    :show-inheritance:
--- a/docs/readthedocs/source/index.rst
+++ b/docs/readthedocs/source/index.rst
@ -2,14 +2,8 @@
   :google-site-verification: S66K6GAclKw1RroxU0Rka_2d1LZFVe27M0gRneEsIVI
 ################################################
 The IPEX-LLM Project
 ################################################
 ------
 ************************************************
 IPEX-LLM
-************************************************
+################################################
 .. raw:: html
@ -21,9 +15,9 @@ IPEX-LLM
   It is built on top of the excellent work of `llama.cpp <https://github.com/ggerganov/llama.cpp>`_, `gptq <https://github.com/IST-DASLab/gptq>`_, `bitsandbytes <https://github.com/TimDettmers/bitsandbytes>`_, `qlora <https://github.com/artidoro/qlora>`_, etc.
-============================================
+************************************************
 Latest update 🔥
-============================================
+************************************************
 - [2024/03] **LangChain** added support for ``ipex-llm``; see the details `here <https://python.langchain.com/docs/integrations/llms/ipex>`_.
 - [2024/02] ``ipex-llm`` now supports directly loading model from `ModelScope <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/ModelScope-Models>`_ (`魔搭 <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/ModelScope-Models>`_).
 - [2024/02] ``ipex-llm`` added inital **INT2** support (based on llama.cpp `IQ2 <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF-IQ2>`_ mechanism), which makes it possible to run large-size LLM (e.g., Mixtral-8x7B) on Intel GPU with 16GB VRAM.
@ -44,9 +38,9 @@ Latest update 🔥
 - [2023/09] ``ipex-llm`` `tutorial <https://github.com/intel-analytics/ipex-llm-tutorial>`_ is released.
 - Over 30 models have been verified on ``ipex-llm``, including *LLaMA/LLaMA2, ChatGLM2/ChatGLM3, Mistral, Falcon, MPT, LLaVA, WizardCoder, Dolly, Whisper, Baichuan/Baichuan2, InternLM, Skywork, QWen/Qwen-VL, Aquila, MOSS* and more; see the complete list `here <https://github.com/intel-analytics/ipex#verified-models>`_.
-============================================
+************************************************
 ``ipex-llm`` demos
-============================================
+************************************************
 See the **optimized performance** of ``chatglm2-6b`` and ``llama-2-13b-chat`` models on 12th Gen Intel Core CPU and Intel Arc GPU below.
@ -79,9 +73,9 @@ See the **optimized performance** of ``chatglm2-6b`` and ``llama-2-13b-chat`` mo
      </tr>
   </table>
-============================================
+************************************************
 ``ipex-llm`` quickstart
-============================================
+************************************************
 - `Windows GPU installation <doc/LLM/Quickstart/install_windows_gpu.html>`_
 - `Run IPEX-LLM in Text-Generation-WebUI <doc/LLM/Quickstart/webui_quickstart.html>`_
@ -89,9 +83,9 @@ See the **optimized performance** of ``chatglm2-6b`` and ``llama-2-13b-chat`` mo
 - `CPU quickstart <#cpu-quickstart>`_
 - `GPU quickstart <#gpu-quickstart>`_
--------------------------------------------
+============================================
 CPU Quickstart
--------------------------------------------
+============================================
 You may install ``ipex-llm`` on Intel CPU as follows as follows:
@ -122,9 +116,9 @@ You can then apply INT4 optimizations to any Hugging Face *Transformers* models
   output_ids = model.generate(input_ids, ...)
   output = tokenizer.batch_decode(output_ids)
--------------------------------------------
+============================================
 GPU Quickstart
--------------------------------------------
+============================================
 You may install ``ipex-llm`` on Intel GPU as follows as follows: