diff --git a/docs/readthedocs/README.md b/docs/readthedocs/README.md
index 92af58a4..c1aa31d6 100644
--- a/docs/readthedocs/README.md
+++ b/docs/readthedocs/README.md
@@ -1,15 +1,16 @@
-# BigDL Documentation
-This is the repository for BigDL documentation, which is hosted at https://bigdl.readthedocs.io/en/latest/
+# IPEX-LLM Documentation
+This is the repository for IPEX-LLM documentation, which is hosted at https://ipex-llm.readthedocs.io/en/latest/
+
 ## Local build
 ### 1. Set up environment
-To build BigDL documentation locally for testing purposes, it is recommended to create a conda environment with specified Python version:
+To build IPEX-LLM documentation locally for testing purposes, it is recommended to create a conda environment with specified Python version:
 
 ```bash
 conda create -n docs python=3.7
 conda activate docs
 ```
 
-Then inside [`BigDL/docs/readthedocs`](.) folder, install required packages:
+Then inside [`ipex-llm/docs/readthedocs`](.) folder, install required packages:
 
 ```bash
 cd docs/readthedocs
diff --git a/docs/readthedocs/image/GitHub-Mark-32px.png b/docs/readthedocs/image/GitHub-Mark-32px.png
deleted file mode 100644
index 8b25551a..00000000
Binary files a/docs/readthedocs/image/GitHub-Mark-32px.png and /dev/null differ
diff --git a/docs/readthedocs/image/KMS-Client.png b/docs/readthedocs/image/KMS-Client.png
deleted file mode 100644
index 6603dd2a..00000000
Binary files a/docs/readthedocs/image/KMS-Client.png and /dev/null differ
diff --git a/docs/readthedocs/image/KMS_End-to-end_Example.png b/docs/readthedocs/image/KMS_End-to-end_Example.png
deleted file mode 100644
index d3913298..00000000
Binary files a/docs/readthedocs/image/KMS_End-to-end_Example.png and /dev/null differ
diff --git a/docs/readthedocs/image/bigdl_logo.jpg b/docs/readthedocs/image/bigdl_logo.jpg
deleted file mode 100644
index 090c6ce7..00000000
Binary files a/docs/readthedocs/image/bigdl_logo.jpg and /dev/null differ
diff --git a/docs/readthedocs/image/bigdl_logo.png b/docs/readthedocs/image/bigdl_logo.png
deleted file mode 100644
index c7274c3c..00000000
Binary files a/docs/readthedocs/image/bigdl_logo.png and /dev/null differ
diff --git a/docs/readthedocs/image/colab_logo_32px.png b/docs/readthedocs/image/colab_logo_32px.png
deleted file mode 100644
index 4888368f..00000000
Binary files a/docs/readthedocs/image/colab_logo_32px.png and /dev/null differ
diff --git a/docs/readthedocs/image/friesian_architecture.png b/docs/readthedocs/image/friesian_architecture.png
deleted file mode 100644
index b0b3f14e..00000000
Binary files a/docs/readthedocs/image/friesian_architecture.png and /dev/null differ
diff --git a/docs/readthedocs/image/ipex-llm_logo_temp.png b/docs/readthedocs/image/ipex-llm_logo_temp.png
new file mode 100644
index 00000000..bd3d2e62
Binary files /dev/null and b/docs/readthedocs/image/ipex-llm_logo_temp.png differ
diff --git a/docs/readthedocs/image/orca-workflow.png b/docs/readthedocs/image/orca-workflow.png
deleted file mode 100644
index e469470c..00000000
Binary files a/docs/readthedocs/image/orca-workflow.png and /dev/null differ
diff --git a/docs/readthedocs/image/ppml_memory_config.png b/docs/readthedocs/image/ppml_memory_config.png
deleted file mode 100644
index a384ba96..00000000
Binary files a/docs/readthedocs/image/ppml_memory_config.png and /dev/null differ
diff --git a/docs/readthedocs/image/scala_logo.png b/docs/readthedocs/image/scala_logo.png
deleted file mode 100644
index 193034df..00000000
Binary files a/docs/readthedocs/image/scala_logo.png and /dev/null differ
diff --git a/docs/readthedocs/image/trial_dataframe.png b/docs/readthedocs/image/trial_dataframe.png
deleted file mode 100644
index 9df97a04..00000000
Binary files a/docs/readthedocs/image/trial_dataframe.png and /dev/null differ
diff --git a/docs/readthedocs/source/_templates/sidebar_quicklinks.html b/docs/readthedocs/source/_templates/sidebar_quicklinks.html
index 3b5a36b0..150b5ef7 100644
--- a/docs/readthedocs/source/_templates/sidebar_quicklinks.html
+++ b/docs/readthedocs/source/_templates/sidebar_quicklinks.html
@@ -3,34 +3,44 @@
     <div class="navbar-nav">
         <ul class="nav">
             <li>
-                <strong class="bigdl-quicklinks-section-title">BigDL-LLM Quickstart</strong>
+                <a href="doc/LLM/index.html">
+                    <strong class="bigdl-quicklinks-section-title">IPEX-LLM Document</strong>
+                </a>
+            </li>
+            <li>
+                <a href="doc/LLM/Quickstart/bigdl_llm_migration.html">
+                    <strong class="bigdl-quicklinks-section-title"><code>bigdl-llm</code> Migration Guide</strong>
+                </a>
+            </li>
+            <li>
+                <strong class="bigdl-quicklinks-section-title">IPEX-LLM Quickstart</strong>
                 <input id="quicklink-cluster-llm-quickstart" type="checkbox" class="toctree-checkbox" />
                 <label for="quicklink-cluster-llm-quickstart" class="toctree-toggle">
                     <i class="fa-solid fa-chevron-down"></i>
                 </label>
                 <ul class="nav bigdl-quicklinks-section-nav">
                     <li>
-                        <a href="doc/LLM/Quickstart/install_linux_gpu.html">Install BigDL-LLM on Linux with Intel GPU</a>
+                        <a href="doc/LLM/Quickstart/install_linux_gpu.html">Install IPEX-LLM on Linux with Intel GPU</a>
                     </li>
                     <li>
-                        <a href="doc/LLM/Quickstart/install_windows_gpu.html">Install BigDL-LLM on Windows with Intel GPU</a>
+                        <a href="doc/LLM/Quickstart/install_windows_gpu.html">Install IPEX-LLM on Windows with Intel GPU</a>
                     </li>
                     <li>
-                        <a href="doc/LLM/Quickstart/docker_windows_gpu.html">Install BigDL-LLM in Docker on Windows with Intel GPU</a>
+                        <a href="doc/LLM/Quickstart/docker_windows_gpu.html">Install IPEX-LLM in Docker on Windows with Intel GPU</a>
                     </li>
                     <li>
                         <a href="doc/LLM/Quickstart/webui_quickstart.html">Use Text Generation WebUI on Windows with Intel GPU</a>
                     </li>
                     <li>
-                        <a href="doc/LLM/Quickstart/benchmark_quickstart.html">BigDL-LLM Benchmarking</a>
+                        <a href="doc/LLM/Quickstart/benchmark_quickstart.html">IPEX-LLM Benchmarking</a>
                     </li>
                     <li>
-                        <a href="doc/LLM/Quickstart/llama_cpp_quickstart.html">Use llama.cpp with BigDL-LLM on Intel GPU</a>
+                        <a href="doc/LLM/Quickstart/llama_cpp_quickstart.html">Use llama.cpp with IPEX-LLM on Intel GPU</a>
                     </li>
                 </ul>
             </li>
             <li>
-                <strong class="bigdl-quicklinks-section-title">BigDL-LLM Installation</strong>
+                <strong class="bigdl-quicklinks-section-title">IPEX-LLM Installation</strong>
                 <input id="quicklink-cluster-llm-installation" type="checkbox" class="toctree-checkbox" />
                 <label for="quicklink-cluster-llm-installation" class="toctree-toggle">
                     <i class="fa-solid fa-chevron-down"></i>
@@ -47,7 +57,7 @@
             </li>
             <li>
                 <a href="doc/LLM/Overview/FAQ/faq.html">
-                    <strong class="bigdl-quicklinks-section-title">BigDL-LLM FAQ</strong>
+                    <strong class="bigdl-quicklinks-section-title">IPEX-LLM FAQ</strong>
                 </a>
             </li>
         </ul>
diff --git a/docs/readthedocs/source/_toc.yml b/docs/readthedocs/source/_toc.yml
index a02e0a42..4e25302f 100644
--- a/docs/readthedocs/source/_toc.yml
+++ b/docs/readthedocs/source/_toc.yml
@@ -1,27 +1,8 @@
 root: index
 subtrees:
-  - entries:
-    - file: doc/UserGuide/index
-      title: 'User guide'
-      subtrees:
-        - entries:
-          - file: doc/UserGuide/python
-          - file: doc/UserGuide/scala
-          - file: doc/UserGuide/win
-          - file: doc/UserGuide/docker
-          - file: doc/UserGuide/colab
-          - file: doc/UserGuide/hadoop
-          - file: doc/UserGuide/k8s
-          - file: doc/UserGuide/databricks
-
-
-  - entries:
-    - file: doc/Application/powered-by
-      title: "Powered by"
-
   - entries:
     - file: doc/LLM/index
-      title: "LLM"
+      title: "IPEX-LLM Document"
       subtrees:
         - entries:
           - file: doc/LLM/Overview/llm
@@ -38,6 +19,7 @@ subtrees:
             title: "Quickstart"
             subtrees:
               - entries:
+                - file: doc/LLM/Quickstart/bigdl_llm_migration
                 - file: doc/LLM/Quickstart/install_linux_gpu
                 - file: doc/LLM/Quickstart/install_windows_gpu
                 - file: doc/LLM/Quickstart/docker_windows_gpu
@@ -77,347 +59,5 @@ subtrees:
           - file: doc/LLM/Overview/FAQ/faq
             title: "FAQ"
 
-  - entries:
-    - file: doc/Orca/index
-      title: "Orca"
-      subtrees:
-        - entries:
-          - file: doc/Orca/Overview/orca
-            title: "Orca in 5 minutes"
-          - file: doc/Orca/Overview/install
-            title: "Installation"
-          - file: doc/Orca/Overview/index
-            title: "Key Features"
-            subtrees:
-              - entries:
-                - file: doc/Orca/Overview/orca-context
-                - file: doc/Orca/Overview/data-parallel-processing
-                - file: doc/Orca/Overview/distributed-training-inference
-                - file: doc/Orca/Overview/distributed-tuning
-                - file: doc/Orca/Overview/ray
-          - file: doc/Orca/Howto/index
-            title: "How-to Guides"
-            subtrees:
-              - entries:
-                - file: doc/Orca/Howto/tf2keras-quickstart
-                - file: doc/Orca/Howto/pytorch-quickstart
-                - file: doc/Orca/Howto/ray-quickstart
-                - file: doc/Orca/Howto/spark-dataframe
-                - file: doc/Orca/Howto/xshards-pandas
-                - file: doc/Orca/Howto/autoestimator-pytorch-quickstart
-                - file: doc/Orca/Howto/autoxgboost-quickstart
-                - file: doc/Orca/Howto/tf1-quickstart
-                - file: doc/Orca/Howto/tf1keras-quickstart
-          - file: doc/Orca/Tutorial/index
-            title: "Tutorials"
-            subtrees:
-              - entries:
-                - file: doc/Orca/Tutorial/yarn
-                - file: doc/Orca/Tutorial/k8s
-          - file: doc/Orca/Overview/known_issues
-            title: "Tips and Known Issues"
-          - file: doc/PythonAPI/Orca/index
-            title: "API Reference"
-
-
-
-  - entries:
-      - file: doc/Nano/index
-        title: "Nano"
-        subtrees:
-          - entries:
-            - file: doc/Nano/Overview/nano
-              title: "Nano in 5 minutes"
-            - file: doc/Nano/Overview/install
-              title: "Installation"
-            - file: doc/Nano/Overview/index
-              title: "Key Features"
-              subtrees:
-                - entries:
-                  - file: doc/Nano/Overview/pytorch_train
-                  - file: doc/Nano/Overview/pytorch_inference
-                  - file: doc/Nano/Overview/pytorch_cuda_patch
-                  - file: doc/Nano/Overview/tensorflow_train
-                  - file: doc/Nano/Overview/tensorflow_inference
-                  - file: doc/Nano/Overview/hpo
-            - file: doc/Nano/QuickStart/index
-              title: "Tutorials"
-              subtrees:
-                - entries:
-                  - file: doc/Nano/QuickStart/pytorch_train_quickstart
-                  - file: doc/Nano/QuickStart/pytorch_nano
-                  - file: doc/Nano/QuickStart/pytorch_onnxruntime
-                  - file: doc/Nano/QuickStart/pytorch_openvino
-                  - file: doc/Nano/QuickStart/pytorch_quantization_inc_onnx
-                  - file: doc/Nano/QuickStart/pytorch_quantization_inc
-                  - file: doc/Nano/QuickStart/pytorch_quantization_openvino
-                  - file: doc/Nano/QuickStart/tensorflow_train_quickstart
-                  - file: doc/Nano/QuickStart/tensorflow_embedding
-                  - file: doc/Nano/QuickStart/tensorflow_quantization_quickstart
-            - file: doc/Nano/Howto/index
-              title: "How-to Guides"
-              subtrees:
-                - entries:
-                  - file: doc/Nano/Howto/Preprocessing/index
-                    subtrees:
-                      - entries:
-                        - file: doc/Nano/Howto/Preprocessing/PyTorch/index
-                          title: "PyTorch"
-                          subtrees:
-                            - entries:      
-                              - file: doc/Nano/Howto/Preprocessing/PyTorch/accelerate_pytorch_cv_data_pipeline
-                  - file: doc/Nano/Howto/Training/index
-                    subtrees:
-                      - entries:
-                        - file: doc/Nano/Howto/Training/PyTorchLightning/index
-                          title: "PyTorch Lightning"
-                          subtrees:
-                            - entries:
-                              - file: doc/Nano/Howto/Training/PyTorchLightning/accelerate_pytorch_lightning_training_ipex
-                              - file: doc/Nano/Howto/Training/PyTorchLightning/accelerate_pytorch_lightning_training_multi_instance
-                              - file: doc/Nano/Howto/Training/PyTorchLightning/pytorch_lightning_training_channels_last
-                              - file: doc/Nano/Howto/Training/PyTorchLightning/pytorch_lightning_training_bf16
-                        - file: doc/Nano/Howto/Training/PyTorch/index
-                          title: "PyTorch"
-                          subtrees:
-                            - entries:
-                              - file: doc/Nano/Howto/Training/PyTorch/convert_pytorch_training_torchnano
-                              - file: doc/Nano/Howto/Training/PyTorch/use_nano_decorator_pytorch_training
-                              - file: doc/Nano/Howto/Training/PyTorch/accelerate_pytorch_training_ipex
-                              - file: doc/Nano/Howto/Training/PyTorch/accelerate_pytorch_training_multi_instance
-                              - file: doc/Nano/Howto/Training/PyTorch/pytorch_training_channels_last
-                              - file: doc/Nano/Howto/Training/PyTorch/accelerate_pytorch_training_bf16
-                        - file: doc/Nano/Howto/Training/TensorFlow/index
-                          title: "TensorFlow"
-                          subtrees:
-                            - entries:
-                              - file: doc/Nano/Howto/Training/TensorFlow/accelerate_tensorflow_training_multi_instance
-                              - file: doc/Nano/Howto/Training/TensorFlow/tensorflow_training_embedding_sparseadam
-                              - file: doc/Nano/Howto/Training/TensorFlow/tensorflow_training_bf16
-                              - file: doc/Nano/Howto/Training/TensorFlow/tensorflow_custom_training_multi_instance
-                        - file: doc/Nano/Howto/Training/General/index
-                          title: "General"
-                          subtrees:
-                            - entries:
-                              - file: doc/Nano/Howto/Training/General/choose_num_processes_training
-                  - file: doc/Nano/Howto/Inference/index
-                    subtrees:
-                      - entries:
-                        - file: doc/Nano/Howto/Inference/OpenVINO/index
-                          title: "OpenVINO"
-                          subtrees:
-                            - entries:    
-                              - file: doc/Nano/Howto/Inference/OpenVINO/openvino_inference
-                              - file: doc/Nano/Howto/Inference/OpenVINO/openvino_inference_async
-                              - file: doc/Nano/Howto/Inference/OpenVINO/accelerate_inference_openvino_gpu
-                        - file: doc/Nano/Howto/Inference/PyTorch/index
-                          title: "PyTorch"
-                          subtrees:
-                            - entries: 
-                              - file: doc/Nano/Howto/Inference/PyTorch/inference_optimizer_optimize
-                              - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_onnx
-                              - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_openvino
-                              - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_jit_ipex
-                              - file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_inc
-                              - file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_pot
-                              - file: doc/Nano/Howto/Inference/PyTorch/pytorch_context_manager
-                              - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_ipex
-                              - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_jit
-                              - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_onnx
-                              - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_openvino
-                              - file: doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference
-                              - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_gpu
-                              - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_async_pipeline
-                        - file: doc/Nano/Howto/Inference/TensorFlow/index
-                          title: "TensorFlow"
-                          subtrees:
-                            - entries: 
-                              - file: doc/Nano/Howto/Inference/TensorFlow/accelerate_tensorflow_inference_onnx
-                              - file: doc/Nano/Howto/Inference/TensorFlow/accelerate_tensorflow_inference_openvino
-                              - file: doc/Nano/Howto/Inference/TensorFlow/tensorflow_inference_bf16
-                              - file: doc/Nano/Howto/Inference/TensorFlow/tensorflow_save_and_load_onnx
-                              - file: doc/Nano/Howto/Inference/TensorFlow/tensorflow_save_and_load_openvino
-                  - file: doc/Nano/Howto/Install/index
-                    subtrees:
-                      - entries:
-                        - file: doc/Nano/Howto/Install/install_in_colab
-                        - file: doc/Nano/Howto/Install/windows_guide
-            - file: doc/Nano/Overview/known_issues
-              title: "Tips and Known Issues"
-            - file: doc/Nano/Overview/troubshooting
-              title: "Troubleshooting Guide"
-            - file: doc/Nano/Overview/support
-              title: "OS Support"
-            - file: doc/PythonAPI/Nano/index
-              title: "API Reference"
-
-
-
-  - entries:
-    - file: doc/DLlib/index
-      title: "DLlib"
-      subtrees:
-        - entries:
-          - file: doc/DLlib/Overview/dllib
-            title: "DLLib in 5 minutes"
-          - file: doc/DLlib/Overview/install
-            title: "Installation"
-          - file: doc/DLlib/Overview/index
-            title: "Key Features"
-            subtrees:
-              - entries:
-                - file: doc/DLlib/Overview/keras-api
-                - file: doc/DLlib/Overview/nnframes
-                - file: doc/DLlib/Overview/visualization
-                  title: "Visualization"
-          - file: doc/DLlib/QuickStart/index
-            title: "Tutorials"
-            subtrees:
-              - entries:
-                - file: doc/DLlib/QuickStart/python-getting-started
-                  title: "Python Quick Start"
-                - file: doc/DLlib/QuickStart/scala-getting-started
-                  title: "Scala Quick Start"
-          - file: doc/PythonAPI/DLlib/index
-            title: "API Reference"
-
-  - entries:
-    - file: doc/Chronos/index
-      title: "Chronos"
-      subtrees:
-        - entries:
-          - file: doc/Chronos/Overview/quick-tour
-            title: "Chronos in 5 minutes"
-          - file: doc/Chronos/Overview/install
-            title: "Installation"
-          - file: doc/Chronos/Overview/deep_dive
-            title: "Key Features"
-            subtrees:
-              - entries:
-                - file: doc/Chronos/Overview/data_processing_feature_engineering
-                - file: doc/Chronos/Overview/forecasting
-                - file: doc/Chronos/Overview/anomaly_detection
-                - file: doc/Chronos/Overview/simulation
-                - file: doc/Chronos/Overview/aiops
-                - file: doc/Chronos/Overview/speed_up
-                - file: doc/Chronos/Overview/useful_functionalities
-          - file: doc/Chronos/Howto/index
-            title: "How-to Guides"
-            subtrees:
-              - entries:
-                - file: doc/Chronos/Howto/windows_guide
-                - file: doc/Chronos/Howto/docker_guide_single_node
-                - file: doc/Chronos/Howto/how_to_use_benchmark_tool
-                - file: doc/Chronos/Howto/how_to_create_forecaster
-                - file: doc/Chronos/Howto/how_to_train_forecaster_on_one_node
-                - file: doc/Chronos/Howto/how_to_save_and_load_forecaster
-                - file: doc/Chronos/Howto/how_to_tune_forecaster_model
-                - file: doc/Chronos/Howto/how_to_speedup_inference_of_forecaster_through_ONNXRuntime
-                - file: doc/Chronos/Howto/how_to_speedup_inference_of_forecaster_through_OpenVINO
-                - file: doc/Chronos/Howto/how_to_evaluate_a_forecaster
-                - file: doc/Chronos/Howto/how_to_use_forecaster_to_predict_future_data
-                - file: doc/Chronos/Howto/how_to_optimize_a_forecaster
-                - file: doc/Chronos/Howto/how_to_generate_confidence_interval_for_prediction
-                - file: doc/Chronos/Howto/how_to_export_onnx_files
-                - file: doc/Chronos/Howto/how_to_export_openvino_files
-                - file: doc/Chronos/Howto/how_to_export_torchscript_files
-                - file: doc/Chronos/Howto/how_to_preprocess_my_data
-                - file: doc/Chronos/Howto/how_to_process_data_in_production_environment
-                - file: doc/Chronos/Howto/how_to_choose_forecasting_alg
-                - file: doc/Chronos/Howto/how_to_export_data_processing_pipeline_to_torchscript
-          - file: doc/Chronos/QuickStart/index
-            title: "Tutorials"
-            subtrees:
-              - entries:
-                - file: doc/Chronos/QuickStart/chronos-tsdataset-forecaster-quickstart
-                - file: doc/Chronos/QuickStart/chronos-autotsest-quickstart
-                - file: doc/Chronos/QuickStart/chronos-anomaly-detector
-          - file: doc/Chronos/Overview/chronos_known_issue
-            title: "Tips and Known Issues"
-          - file: doc/PythonAPI/Chronos/index
-            title: "API Reference"
-
-  - entries:
-    - file: doc/Friesian/index
-      title: "Friesian"
-      subtrees:
-        - entries:
-          - file: doc/Friesian/intro
-            title: "Introduction"
-          - file: doc/Friesian/serving
-            title: "Serving"
-          - file: doc/Friesian/examples
-            title: "Use Cases"
-          - file: doc/PythonAPI/Friesian/index
-            title: "API Reference"
-
-  - entries:
-    - file: doc/PPML/index
-      title: "PPML"
-      subtrees:
-        - entries:
-            - file: doc/PPML/Overview/intro
-              title: "PPML Introduction"
-            - file: doc/PPML/Overview/install
-              title: 'Installation'
-            - file: doc/PPML/Overview/examples
-              title: "Tutorials"
-              subtrees:
-                - entries:
-                  - file: doc/PPML/Overview/quicktour
-                  - file: doc/PPML/QuickStart/end-to-end
-                  - file: doc/PPML/Overview/devguide
-                  - file: doc/PPML/Overview/azure_ppml
-                  - file: doc/PPML/Overview/ali_ecs_occlum_cn
-            - file: doc/PPML/Overview/misc
-              title: "Advanced Topics"
-              subtrees:
-                - entries:
-                  - file: doc/PPML/Overview/ppml
-                  - file: doc/PPML/Overview/attestation_basic
-                  - file: doc/PPML/Overview/trusted_big_data_analytics_and_ml
-                  - file: doc/PPML/Overview/trusted_fl
-                  - file: doc/PPML/QuickStart/secure_your_services
-                  - file: doc/PPML/QuickStart/deploy_ppml_in_production
-                  - file: doc/PPML/QuickStart/install_sgx_driver
-                  - file: doc/PPML/QuickStart/deploy_intel_sgx_device_plugin_for_kubernetes
-                  - file: doc/PPML/QuickStart/trusted-serving-on-k8s-guide
-                  - file: doc/PPML/QuickStart/tpc-h_with_sparksql_on_k8s
-                  - file: doc/PPML/QuickStart/tpc-ds_with_sparksql_on_k8s
-                  - file: doc/PPML/Overview/azure_ppml_occlum
-                  - file: doc/PPML/Overview/secure_lightgbm_on_spark
-
-  - entries:
-    - file: doc/UserGuide/contributor
-      title: "Contributor guide"
-      subtrees:
-        - entries:
-          - file: doc/UserGuide/develop
-          - file: doc/UserGuide/documentation
-
-
-
-  - entries:
-    - file: doc/Serving/index
-      title: "Cluster serving"
-      subtrees:
-        - entries:
-          - file: doc/Serving/Overview/serving.md
-            title: "User Guide"
-          - file: doc/Serving/QuickStart/serving-quickstart
-            title: "Serving in 5 miniutes"
-          - file: doc/Serving/ProgrammingGuide/serving-installation
-          - file: doc/Serving/ProgrammingGuide/serving-start
-          - file: doc/Serving/ProgrammingGuide/serving-inference
-          - file: doc/Serving/Example/example
-            title: "Examples"
-          - file: doc/Serving/FAQ/faq
-          - file: doc/Serving/FAQ/contribute-guide
-
-
-  - entries:
-    - file: doc/Application/presentations
-      title: "Presentations"
-
   - entries:
     - file: doc/Application/blogs
diff --git a/docs/readthedocs/source/conf.py b/docs/readthedocs/source/conf.py
index 76bac4d1..f0d9a76c 100644
--- a/docs/readthedocs/source/conf.py
+++ b/docs/readthedocs/source/conf.py
@@ -37,11 +37,11 @@ sys.path.insert(0, os.path.abspath("../../../python/llm/src/"))
 # -- Project information -----------------------------------------------------
 html_theme = "pydata_sphinx_theme"
 html_theme_options = {
-  "header_links_before_dropdown": 3,
+  "header_links_before_dropdown": 1,
   "icon_links": [
         {
-            "name": "GitHub Repository for BigDL",
-            "url": "https://github.com/intel-analytics/BigDL",
+            "name": "GitHub Repository for IPEX-LLM",
+            "url": "https://github.com/intel-analytics/ipex-llm",
             "icon": "fa-brands fa-square-github",
             "type": "fontawesome",
         }
@@ -63,7 +63,7 @@ html_context = {
     "default_mode": "light" 
 }
 
-html_logo = "../image/bigdl_logo.png"
+html_logo = "../image/ipex-llm_logo_temp.png"
 
 # hard code it for now, may change it to read from installed bigdl
 release = "latest"
@@ -76,9 +76,9 @@ source_suffix = {'.rst': 'restructuredtext',
 
 master_doc = 'index'
 
-project = 'BigDL'
-copyright = '2020, BigDL Authors'
-author = 'BigDL Authors'
+project = 'IPEX-LLM'
+copyright = '2024, IPEX-LLM Authors'
+author = 'IPEX-LLM Authors'
 
 # The short X.Y version
 #version = ''
diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/index.rst b/docs/readthedocs/source/doc/LLM/Quickstart/index.rst
index 675526b6..6edb209d 100644
--- a/docs/readthedocs/source/doc/LLM/Quickstart/index.rst
+++ b/docs/readthedocs/source/doc/LLM/Quickstart/index.rst
@@ -7,9 +7,14 @@ IPEX-LLM Quickstart
 
 This section includes efficient guide to show you how to:
 
+
+* |bigdl_llm_migration_guide|_
 * `Install IPEX-LLM on Linux with Intel GPU <./install_linux_gpu.html>`_
 * `Install IPEX-LLM on Windows with Intel GPU <./install_windows_gpu.html>`_
 * `Install IPEX-LLM in Docker on Windows with Intel GPU <./docker_windows_gpu.html>`_
 * `Use Text Generation WebUI on Windows with Intel GPU <./webui_quickstart.html>`_
 * `Conduct Performance Benchmarking with IPEX-LLM <./benchmark_quickstart.html>`_
 * `Use llama.cpp with IPEX-LLM on Intel GPU <./llama_cpp_quickstart.html>`_
+
+.. |bigdl_llm_migration_guide| replace:: ``bigdl-llm`` Migration Guide
+.. _bigdl_llm_migration_guide: bigdl_llm_migration.html
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/PythonAPI/Chronos/aiops.rst b/docs/readthedocs/source/doc/PythonAPI/Chronos/aiops.rst
deleted file mode 100644
index 92b91f00..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/Chronos/aiops.rst
+++ /dev/null
@@ -1,21 +0,0 @@
-AIOps
-=====================
-
-ConfigGenerator
-----------------------------------------
-
-AIOps application typically relies on a decision system with one or multiple AI models.
-
-The `ConfigGenerator` provides a easy-to-use builder for an AIOps decision system with the usage of `Trigger`.
-
-.. automodule:: bigdl.chronos.aiops.config_generator.config_generator
-    :members:
-    :undoc-members:
-    :show-inheritance:
-    :inherited-members:
-
-.. automodule:: bigdl.chronos.aiops.config_generator.trigger
-    :members:
-    :undoc-members:
-    :show-inheritance:
-    :inherited-members:
diff --git a/docs/readthedocs/source/doc/PythonAPI/Chronos/anomaly_detectors.rst b/docs/readthedocs/source/doc/PythonAPI/Chronos/anomaly_detectors.rst
deleted file mode 100644
index bb2a39b9..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/Chronos/anomaly_detectors.rst
+++ /dev/null
@@ -1,31 +0,0 @@
-Anomaly Detectors
-=====================
-
-AEDetector
-----------------------------------------
-
-AEDetector is unsupervised anomaly detector. It builds an autoencoder network, tries to fit the model to the input data, and calcuates the reconstruction error. The samples with larger reconstruction errors are more likely the anomalies.
-
-.. automodule:: bigdl.chronos.detector.anomaly.ae_detector
-    :members:
-    :show-inheritance:
-
-
-DBScanDetector
-----------------------------------------
-
-DBScanDetector uses DBSCAN clustering for anomaly detection. The DBSCAN algorithm tries to cluster the points and label the points that do not belong to any clusters as -1. It thus detects outliers in the input time series.
-
-.. automodule:: bigdl.chronos.detector.anomaly.dbscan_detector
-    :members:
-    :show-inheritance:
-
-
-ThresholdDetector
-----------------------------------------
-
-ThresholdDetector is a simple anomaly detector that detectes anomalies based on threshold. The target value for anomaly testing can be either 1) the sample value itself or 2) the difference between the forecasted value and the actual value, if the forecasted values are provied. The thresold can be set by user or esitmated from the train data accoring to anomaly ratio and statistical distributions.
-
-.. automodule:: bigdl.chronos.detector.anomaly.th_detector
-    :members: ThresholdDetector
-    :show-inheritance:
diff --git a/docs/readthedocs/source/doc/PythonAPI/Chronos/automodels.rst b/docs/readthedocs/source/doc/PythonAPI/Chronos/automodels.rst
deleted file mode 100644
index 7908420a..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/Chronos/automodels.rst
+++ /dev/null
@@ -1,76 +0,0 @@
-Auto Models
-=====================
-
-AutoTCN
--------------------------------------------
-
-AutoTCN is a TCN forecasting model with Auto tuning.
-
-
-.. tabs::
-
-    .. tab:: PyTorch/Tensorflow
-
-        .. automodule:: bigdl.chronos.autots.model.auto_tcn
-            :members:
-            :undoc-members:
-            :show-inheritance:
-            :inherited-members:
-
-
-AutoLSTM
-----------------------------------------
-
-AutoLSTM is an LSTM forecasting model with Auto tuning.
-
-
-.. tabs::
-
-    .. tab:: PyTorch/Tensorflow
-
-        .. automodule:: bigdl.chronos.autots.model.auto_lstm
-            :members:
-            :undoc-members:
-            :show-inheritance:
-            :inherited-members:
-
-
-AutoSeq2Seq
-----------------------------------------
-
-AutoSeq2Seq is an Seq2Seq forecasting model with Auto tuning.
-
-
-.. tabs::
-
-    .. tab:: PyTorch/Tensorflow
-
-        .. automodule:: bigdl.chronos.autots.model.auto_seq2seq
-            :members:
-            :undoc-members:
-            :show-inheritance:
-            :inherited-members:
-
-
-AutoARIMA
-----------------------------------------
-
-AutoARIMA is an ARIMA forecasting model with Auto tuning.
-
-.. automodule:: bigdl.chronos.autots.model.auto_arima
-    :members:
-    :undoc-members:
-    :show-inheritance:
-    :inherited-members:
-
-
-AutoProphet
-----------------------------------------
-
-AutoProphet is a Prophet forecasting model with Auto tuning.
-
-.. automodule:: bigdl.chronos.autots.model.auto_prophet
-    :members:
-    :undoc-members:
-    :show-inheritance:
-    :inherited-members:
diff --git a/docs/readthedocs/source/doc/PythonAPI/Chronos/autots.rst b/docs/readthedocs/source/doc/PythonAPI/Chronos/autots.rst
deleted file mode 100644
index cb6f4f43..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/Chronos/autots.rst
+++ /dev/null
@@ -1,86 +0,0 @@
-AutoTS (deprecated)
-=====================
-
-.. warning::
-    The API in this page will be deprecated soon. Please refer to our new AutoTS API.
-
-AutoTSTrainer
-----------------------------------------
-
-AutoTSTrainer trains a time series pipeline (including data processing, feature engineering, and model) with AutoML.
-
-.. autoclass:: bigdl.chronos.autots.deprecated.forecast.AutoTSTrainer
-    :members:
-    :show-inheritance:
-
-
-TSPipeline
-----------------------------------------
-
-A pipeline for time series forecasting.
-
-.. autoclass:: bigdl.chronos.autots.deprecated.forecast.TSPipeline
-    :members:
-    :show-inheritance:
-
-
-Recipe
-----------------------------------------
-
-Recipe is used for search configuration for AutoTSTrainer.
-
-.. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.SmokeRecipe
-    :members:
-    :show-inheritance:
-
-.. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.MTNetSmokeRecipe
-    :members:
-    :show-inheritance:
-
-.. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.TCNSmokeRecipe
-    :members:
-    :show-inheritance:
-
-.. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.PastSeqParamHandler
-    :members:
-    :show-inheritance:
-
-.. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.GridRandomRecipe
-    :members:
-    :show-inheritance:
-
-.. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.LSTMSeq2SeqRandomRecipe
-    :members:
-    :show-inheritance:
-
-.. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.LSTMGridRandomRecipe
-    :members:
-    :show-inheritance:
-
-.. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.Seq2SeqRandomRecipe
-    :members:
-    :show-inheritance:
-
-.. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.MTNetGridRandomRecipe
-    :members:
-    :show-inheritance:
-
-.. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.TCNGridRandomRecipe
-    :members:
-    :show-inheritance:
-
-.. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.RandomRecipe
-    :members:
-    :show-inheritance:
-
-.. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.BayesRecipe
-    :members:
-    :show-inheritance:
-
-.. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.XgbRegressorGridRandomRecipe
-    :members:
-    :show-inheritance:
-
-.. autoclass:: bigdl.chronos.autots.deprecated.config.recipe.XgbRegressorSkOptRecipe
-    :members:
-    :show-inheritance:
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/PythonAPI/Chronos/autotsestimator.rst b/docs/readthedocs/source/doc/PythonAPI/Chronos/autotsestimator.rst
deleted file mode 100644
index 83c29f28..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/Chronos/autotsestimator.rst
+++ /dev/null
@@ -1,34 +0,0 @@
-AutoTS
-=====================
-
-AutoTSEstimator
--------------------------------------------
-
-Automated TimeSeries Estimator for time series forecasting task.
-AutoTSEstimator will replace AutoTSTrainer in later version.
-
-.. tabs::
-
-    .. tab:: PyTorch/Tensorflow
-
-        .. automodule:: bigdl.chronos.autots.autotsestimator
-            :members:
-            :undoc-members:
-            :show-inheritance:
-
-
-TSPipeline
--------------------------------------------
-
-TSPipeline is an E2E solution for time series forecasting task.
-AutoTSEstimator will replace original TSPipeline returned by AutoTSTrainer in later version.
-
-
-.. tabs::
-
-    .. tab:: PyTorch
-
-        .. automodule:: bigdl.chronos.autots.tspipeline
-            :members:
-            :undoc-members:
-            :show-inheritance:
diff --git a/docs/readthedocs/source/doc/PythonAPI/Chronos/evaluator.rst b/docs/readthedocs/source/doc/PythonAPI/Chronos/evaluator.rst
deleted file mode 100644
index bee0866d..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/Chronos/evaluator.rst
+++ /dev/null
@@ -1,10 +0,0 @@
-Evaluator
-====================================
-
-Evaluator
-------------------------------------
-
-.. automodule:: bigdl.chronos.metric.forecast_metrics
-    :members:
-    :undoc-members:
-    :show-inheritance:
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/PythonAPI/Chronos/forecasters.rst b/docs/readthedocs/source/doc/PythonAPI/Chronos/forecasters.rst
deleted file mode 100644
index 72eaeca9..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/Chronos/forecasters.rst
+++ /dev/null
@@ -1,182 +0,0 @@
-Forecasters
-=====================
-
-LSTMForecaster
-----------------------------------------
-
-Long short-term memory(LSTM) is a special type of recurrent neural network(RNN). We implement the basic version of LSTM - VanillaLSTM for this forecaster for time-series forecasting task. It has two LSTM layers, two dropout layer and a dense layer.
-
-For the detailed algorithm description, please refer to `here <https://github.com/intel-analytics/BigDL/blob/main/docs/docs/Chronos/Algorithm/LSTMAlgorithm.md>`__.
-
-
-
-.. tabs::
-
-    .. tab:: PyTorch
-
-        .. automodule:: bigdl.chronos.forecaster.lstm_forecaster
-            :members:
-            :undoc-members:
-            :show-inheritance:
-            :inherited-members:
-
-
-    .. tab:: Tensorflow
-
-        .. automodule:: bigdl.chronos.forecaster.tf.lstm_forecaster
-            :members:
-            :undoc-members:
-            :show-inheritance:
-            :inherited-members:
-
-
-
-Seq2SeqForecaster
--------------------------------------------
-
-Seq2SeqForecaster wraps a sequence to sequence model based on LSTM, and is suitable for multivariant & multistep time series forecasting.
-
-.. tabs::
-
-    .. tab:: PyTorch
-
-        .. automodule:: bigdl.chronos.forecaster.seq2seq_forecaster
-            :members:
-            :undoc-members:
-            :show-inheritance:
-            :inherited-members:
-
-    .. tab:: Tensorflow
-
-        .. automodule:: bigdl.chronos.forecaster.tf.seq2seq_forecaster
-            :members:
-            :undoc-members:
-            :show-inheritance:
-            :inherited-members:
-
-
-TCNForecaster
-----------------------------------------
-
-Temporal Convolutional Networks (TCN) is a neural network that use convolutional architecture rather than recurrent networks. It supports multi-step and multi-variant cases. Causal Convolutions enables large scale parallel computing which makes TCN has less inference time than RNN based model such as LSTM.
-
-.. tabs::
-
-    .. tab:: PyTorch
-
-        .. automodule:: bigdl.chronos.forecaster.tcn_forecaster
-            :members:
-            :undoc-members:
-            :show-inheritance:
-            :inherited-members:
-
-    .. tab:: Tensorflow
-
-        .. automodule:: bigdl.chronos.forecaster.tf.tcn_forecaster
-            :members:
-            :undoc-members:
-            :show-inheritance:
-            :inherited-members:
-
-AutoformerForecaster
-----------------------------------------
-
-Autoformer is a neural network that use transformer architecture with autocorrelation. It supports multi-step and multi-variant cases. It shows significant accuracy improvement while longer training/inference time than TCN.
-
-.. tabs::
-
-    .. tab:: PyTorch
-
-        .. automodule:: bigdl.chronos.forecaster.autoformer_forecaster
-            :members:
-            :undoc-members:
-            :show-inheritance:
-            :inherited-members:
-
-
-NBeatsForecaster
-----------------------------------------
-
-
-.. tabs::
-
-    .. tab:: PyTorch
-
-        Neural basis expansion analysis for interpretable time series forecasting (`N-BEATS <https://arxiv.org/abs/1905.10437>`__) is a deep neural architecture based on backward and forward residual links and a very deep stack of fully-connected layers. Nbeats can solve univariate time series point forecasting problems, being interpretable, and fast to train.
-
-        .. automodule:: bigdl.chronos.forecaster.nbeats_forecaster
-            :members:
-            :undoc-members:
-            :show-inheritance:
-            :inherited-members:
-
-
-TCMFForecaster
-----------------------------------------
-
-Chronos TCMFForecaster provides an efficient way to forecast high dimensional time series.
-
-TCMFForecaster is based on DeepGLO algorithm, which is a deep forecasting model which thinks globally and acts locally.
-You can refer to `the deepglo paper <https://arxiv.org/abs/1905.03806>`__ for more details.
-
-TCMFForecaster supports distributed training and inference. It is based on Orca PyTorch Estimator, which is an estimator to do PyTorch training/evaluation/prediction on Spark in a distributed fashion. Also you can choose to enable distributed training and inference or not.
-
-**Remarks**:
-
-* You can refer to `TCMFForecaster installation <https://github.com/intel-analytics/BigDL/blob/main/docs/docs/Chronos/tutorials/TCMFForecaster.md#step-0-prepare-environment>`__ to install required packages.
-* Your operating system (OS) is required to be one of the following 64-bit systems: **Ubuntu 16.04 or later** and **macOS 10.12.6 or later**.
-
-.. tabs::
-
-    .. tab:: PyTorch
-
-        .. automodule:: bigdl.chronos.forecaster.tcmf_forecaster
-            :members:
-            :undoc-members:
-            :show-inheritance:
-            :inherited-members:
-
-
-MTNetForecaster
-----------------------------------------
-
-MTNet is a memory-network based solution for multivariate time-series forecasting. In a specific task of multivariate time-series forecasting, we have several variables observed in time series and we want to forecast some or all of the variables' value in a future time stamp.
-
-MTNet is proposed by paper `A Memory-Network Based Solution for Multivariate Time-Series Forecasting <https://arxiv.org/abs/1809.02105>`__.
-
-For the detailed algorithm description, please refer to `here <https://github.com/intel-analytics/BigDL/blob/main/docs/docs/Chronos/Algorithm/MTNetAlgorithm.md>`__.
-
-.. tabs::
-
-    .. tab:: Tensorflow
-
-        .. automodule:: bigdl.chronos.forecaster.tf.mtnet_forecaster
-            :members:
-            :undoc-members:
-            :show-inheritance:
-            :inherited-members:
-
-
-ARIMAForecaster
-----------------------------------------
-
-AutoRegressive Integrated Moving Average (ARIMA) is a class of statistical models for analyzing and forecasting time series data. It consists of 3 components: AR (AutoRegressive), I (Integrated) and MA (Moving Average). In ARIMAForecaster we use the SARIMA model (Seasonal ARIMA), which is an extension of ARIMA that additionally supports the direct modeling of the seasonal component of the time series.
-
-.. automodule:: bigdl.chronos.forecaster.arima_forecaster
-    :members:
-    :undoc-members:
-    :show-inheritance:
-    :inherited-members:
-
-ProphetForecaster
-----------------------------------------
-
-Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.
-
-For the detailed algorithm description, please refer to `here <https://github.com/facebook/prophet>`__.
-
-.. automodule:: bigdl.chronos.forecaster.prophet_forecaster
-    :members:
-    :undoc-members:
-    :show-inheritance:
-    :inherited-members:
diff --git a/docs/readthedocs/source/doc/PythonAPI/Chronos/index.rst b/docs/readthedocs/source/doc/PythonAPI/Chronos/index.rst
deleted file mode 100644
index 1ec68e22..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/Chronos/index.rst
+++ /dev/null
@@ -1,19 +0,0 @@
-Chronos API
-==================
-
-.. toctree::
-    :maxdepth: 2
-
-    autotsestimator.rst
-    automodels.rst
-    forecasters.rst
-    anomaly_detectors.rst
-    tsdataset.rst
-    simulator.rst
-    evaluator.rst
-    aiops.rst
-
-.. toctree::
-    :maxdepth: 1
-
-    autots.rst
diff --git a/docs/readthedocs/source/doc/PythonAPI/Chronos/simulator.rst b/docs/readthedocs/source/doc/PythonAPI/Chronos/simulator.rst
deleted file mode 100644
index 013cef9b..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/Chronos/simulator.rst
+++ /dev/null
@@ -1,15 +0,0 @@
-Simulator
-====================================
-
-DPGANSimulator
-------------------------------------
-
-
-.. tabs::
-
-    .. tab:: PyTorch
-
-        .. automodule:: bigdl.chronos.simulator.doppelganger_simulator
-            :members:
-            :undoc-members:
-            :show-inheritance:
diff --git a/docs/readthedocs/source/doc/PythonAPI/Chronos/tsdataset.rst b/docs/readthedocs/source/doc/PythonAPI/Chronos/tsdataset.rst
deleted file mode 100644
index 2be5f92f..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/Chronos/tsdataset.rst
+++ /dev/null
@@ -1,33 +0,0 @@
-TSDataset
-===========
-
-TSDataset
-----------------------------------------
-
-Time series data is a special data formulation with specific operations. TSDataset is an abstract of time series dataset, which provides various data processing operations (e.g. impute, deduplicate, resample, scale/unscale, roll) and feature engineering methods (e.g. datetime feature, aggregation feature). Cascade call is supported for most of the methods.
-TSDataset can be initialized from a pandas dataframe and be converted to a pandas dataframe or numpy ndarray.
-
-.. automodule:: bigdl.chronos.data.tsdataset
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-XShardsTSDataset
-----------------------------------------
-
-Time series data is a special data formulation with specific operations. XShardsTSDataset is an abstract of time series dataset, which provides various data processing operations (e.g. impute, deduplicate, resample, scale/unscale, roll) and feature engineering methods (e.g. datetime feature, aggregation feature). Cascade call is supported for most of the methods.
-XShardsTSDataset can be initialized from xshards of pandas dataframe and be converted to xshards of numpy in an distributed and parallized fashion.
-
-.. automodule:: bigdl.chronos.data.experimental.xshards_tsdataset
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-
-Built-in Dataset
---------------------------------------------
-
-Built-in dataset can be downloaded and preprocessed by this function. Train, validation and test split is also supported.
-
-.. automodule:: bigdl.chronos.data.repo_dataset
-    :members:
diff --git a/docs/readthedocs/source/doc/PythonAPI/DLlib/clipping.md b/docs/readthedocs/source/doc/PythonAPI/DLlib/clipping.md
deleted file mode 100644
index 038667db..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/DLlib/clipping.md
+++ /dev/null
@@ -1,24 +0,0 @@
-# Clipping
-
---------
-
-## ConstantGradientClipping ##
-
-Set constant gradient clipping during the training process.
-
-```scala
-model.setConstantGradientClipping(min, max)
-```
-param:
-   * min: The minimum value to clip by.
-   * max: The maximum value to clip by.
-
-## GradientClippingByL2Norm ##
-
-Clip gradient to a maximum L2-Norm during the training process.
-
-```scala
-model.setGradientClippingByL2Norm(clipNorm)
-```
-param:
-   * clipNorm: Gradient L2-Norm threshold
diff --git a/docs/readthedocs/source/doc/PythonAPI/DLlib/core_layers.md b/docs/readthedocs/source/doc/PythonAPI/DLlib/core_layers.md
deleted file mode 100644
index 9c83648a..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/DLlib/core_layers.md
+++ /dev/null
@@ -1,3328 +0,0 @@
-# Core Layers API
-
---------
-
-This section describes all the available layers in the Keras-like API.
-
-To set the name of a specific layer, you call the method `setName(name)` of that layer.
-
-------------------
-
-### Masking
-
-Use a mask value to skip timesteps for a sequence.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            Masking(maskValue = 0.0, inputShape = null)
-
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            Masking(mask_value=0.0, input_shape=None, name=None)
-
-```
-
-**Parameters:**
-
-* `maskValue`: Mask value. For each timestep in the input (the second dimension), if all the values in the input at that timestep are equal to 'maskValue', then the timestep will be masked (skipped) in all downstream layers.
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`Shape`](../keras-api-scala/#shape) object. For Python API, it should be a shape tuple. Batch dimension should be excluded.
-
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.keras.layers.Masking
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            model.add(Masking[Float](inputShape = Shape(3)))
-            val input = Tensor[Float](2, 3).randn()
-            val output = model.forward(input)
-
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            1.4539868       1.5623108       -1.4101523
-            0.77073747      -0.18994702     2.2574463
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x3]
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            1.4539868       1.5623108       -1.4101523
-            0.77073747      -0.18994702     2.2574463
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x3]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            import numpy as np
-            from bigdl.nn.keras.topology import Sequential
-            from bigdl.nn.keras.layer import Masking
-
-            model = Sequential()
-            model.add(Masking(input_shape=(3, )))
-            input = np.random.random([2, 3])
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            [[0.31542103 0.20640659 0.22282763]
-            [0.99352167 0.90135718 0.24504717]]
-
-        Output is
-
-        ..  code-block:: python
-
-            [[0.31542102 0.2064066  0.22282763]
-            [0.9935217  0.9013572  0.24504717]]
-
-```
-
----
-
-### SparseDense
-
-
-SparseDense is the sparse version of layer Dense. SparseDense has two different from Dense:
-firstly, SparseDense's input Tensor is a SparseTensor. Secondly, SparseDense doesn't backward
-gradient to next layer in the backpropagation by default, as the gradInput of SparseDense is
-useless and very big in most cases.
-
-But, considering model like Wide&Deep, we provide backwardStart and backwardLength to backward
-part of the gradient to next layer.
-
-The most common input is 2D.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            SparseDense(outputDim, init = "glorot_uniform", activation = null, wRegularizer = null, bRegularizer = null, backwardStart = -1, backwardLength = -1, initWeight = null, initBias = null, initGradWeight = null, initGradBias = null, bias = true, inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            SparseDense(output_dim, init="glorot_uniform", activation=None, W_regularizer=None, b_regularizer=None, backward_start=-1, backward_length=-1, init_weight=None, init_bias=None, init_grad_weight=None, init_grad_bias=None, bias=True, input_shape=None, name=None)
-
-```
-
-**Parameters:**
-
-* `outputDim`: The size of the output dimension.
-* `init`: String representation of the initialization method for the weights of the layer. Default is 'glorot_uniform'.
-* `activation`: String representation of the activation function to use. Default is null.
-* `wRegularizer`: An instance of [Regularizer], applied to the input weights matrices. Default is null.
-* `bRegularizer`: An instance of [Regularizer], applied to the bias. Default is null.
-* `bias`: Whether to include a bias (i.e. make the layer affine rather than linear). Default is true.
-* `backwardStart`: Backward start index, counting from 1.
-* `backwardLength`: Backward length.
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a `Shape` object. For Python API, it should be a shape tuple. Batch dimension should be excluded.
-* `name`: String to set the name of the layer. If not specified, its name will by default to be a generated string.
-
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.layers.SparseDense
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val layer = SparseDense[Float](outputDim = 5, inputShape = Shape(2, 4))
-            layer.build(Shape(-1, 2, 4))
-            val input = Tensor[Float](Array(2, 4)).rand()
-            input.setValue(1, 1, 1f)
-            input.setValue(2, 3, 3f)
-            val sparseInput = Tensor.sparse(input)
-            val output = layer.forward(sparseInput)
-
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input:
-            (0, 0) : 1.0
-            (0, 1) : 0.2992794
-            (0, 2) : 0.11227019
-            (0, 3) : 0.722947
-            (1, 0) : 0.6147614
-            (1, 1) : 0.4288646
-            (1, 2) : 3.0
-            (1, 3) : 0.7749917
-            [com.intel.analytics.bigdl.tensor.SparseTensor of size 2x4]
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output:
-            0.053516	0.33429605	0.22587383	-0.8998945	0.24308181
-            0.76745665	-1.614114	0.5381658	-2.2226436	-0.15573677
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x5]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            import numpy as np
-            from bigdl.dllib.keras.layers import *
-            from bigdl.dllib.keras.models import Sequential
-            from bigdl.dllib.utils.common import JTensor
-
-            model = Sequential()
-            model.add(SparseDense(output_dim=2, input_shape=(3, 4)))
-            input = JTensor.sparse(
-                a_ndarray=np.array([1, 3, 2, 4]),
-                i_ndarray = np.array([[0, 0, 1, 2],
-                        [0, 3, 2, 1]]),
-                shape = np.array([3, 4])
-            )
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            JTensor: storage: [1. 3. 2. 4.], shape: [3 4] ,indices [[0 0 1 2]
-            [0 3 2 1]], float
-
-        Output is
-
-        ..  code-block:: python
-
-            [[ 1.57136     2.29596   ]
-            [ 0.5791738  -1.6598101 ]
-            [ 2.331141   -0.84687066]]
-
-```
-
----
-
-### SoftShrink
-
-
-Applies the soft shrinkage function element-wise to the input.
-
-When you use this layer as the first layer of a model, you need to provide
-the argument inputShape (a Single Shape, does not include the batch dimension).
-
-Remark: This layer is from Torch and wrapped in Keras style.
-
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            SoftShrink(value = 0.5, inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            SoftShrink(value = 0.5, input_shape=None, name=None)
-
-```
-
-**Parameters:**
-
-* `value`: value The threshold value. Default is 0.5.
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a `Shape` object. For Python API, it should be a shape tuple. Batch dimension should be excluded.
-* `name`: String to set the name of the layer. If not specified, its name will by default to be a generated string.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.layers.SoftShrink
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            model.add(SoftShrink[Float](0.6, inputShape = Shape(2, 3, 4)))
-            val input = Tensor[Float](2, 2, 3, 4).randn()
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            (1,1,.,.) =
-            -0.36938807	0.023556225	-1.1655436	-0.34449077
-            0.9444338	-0.086538695	-1.0425501	1.364976
-            -1.2563878	-0.1842559	0.43428117	1.0756494
-
-            (1,2,.,.) =
-            -0.19888283	1.251872	0.114836805	-0.6208773
-            0.0051822234	-0.8998633	0.06937465	-0.3929931
-            -0.1058129	0.6945743	-0.40083578	-0.6252444
-
-            (2,1,.,.) =
-            -0.9899709	-0.77926594	-0.15497442	-0.15031165
-            -0.6028622	0.86623466	-2.1543107	0.41970536
-            -0.8215522	0.3014275	-0.32184362	0.14445356
-
-            (2,2,.,.) =
-            0.74701905	0.10044397	-0.40519297	0.03822808
-            0.30726334	0.27862388	1.731753	0.032177072
-            -1.3476961	-0.2294767	0.99794704	0.7398458
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3x4]
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            (1,1,.,.) =
-            0.0	0.0	-0.56554353	0.0
-            0.34443378	0.0	-0.44255006	0.764976
-            -0.6563878	0.0	0.0	0.47564936
-
-            (1,2,.,.) =
-            0.0	0.6518719	0.0	-0.020877302
-            0.0	-0.29986328	0.0	0.0
-            0.0	0.09457427	0.0	-0.025244355
-
-            (2,1,.,.) =
-            -0.3899709	-0.17926592	0.0	0.0
-            -0.0028621554	0.26623464	-1.5543107	0.0
-            -0.2215522	0.0	0.0	0.0
-
-            (2,2,.,.) =
-            0.14701903	0.0	0.0	0.0
-            0.0	0.0	1.131753	0.0
-            -0.74769604	0.0	0.397947	0.13984579
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3x4]
-
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            import numpy as np
-            from bigdl.dllib.keras.layers import *
-            from bigdl.dllib.keras.models import Sequential
-
-            model = Sequential()
-            model.add(SoftShrink(0.6, input_shape=(2, 3, 4)))
-            input = np.random.random([2, 2, 3, 4])
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            array([[[[ 0.43421006,  0.28394451,  0.15221226,  0.47268966],
-                    [ 0.22426224,  0.24855662,  0.790498  ,  0.67767582],
-                    [ 0.14879562,  0.56077882,  0.61470262,  0.94875862]],
-
-                    [[ 0.72404932,  0.89780875,  0.08456734,  0.01303937],
-                    [ 0.25023568,  0.45392504,  0.587254  ,  0.51164461],
-                    [ 0.12277567,  0.05571182,  0.17076456,  0.71660884]]],
-
-
-                [[[ 0.06369975,  0.85395557,  0.35752425,  0.606633  ],
-                    [ 0.67640252,  0.86861737,  0.18040722,  0.55467108],
-                    [ 0.24102058,  0.37580645,  0.81601612,  0.56513788]],
-
-                    [[ 0.8461435 ,  0.65668365,  0.17969807,  0.51602926],
-                    [ 0.86191073,  0.34245714,  0.62795207,  0.36706125],
-                    [ 0.80344028,  0.81056003,  0.80959083,  0.15366483]]]])
-
-        Output is
-
-        ..  code-block:: python
-
-            array([[[[ 0.        ,  0.        ,  0.        ,  0.        ],
-                    [ 0.        ,  0.        ,  0.19049799,  0.07767582],
-                    [ 0.        ,  0.        ,  0.01470262,  0.34875858]],
-
-                    [[ 0.12404931,  0.29780871,  0.        ,  0.        ],
-                    [ 0.        ,  0.        ,  0.        ,  0.        ],
-                    [ 0.        ,  0.        ,  0.        ,  0.1166088 ]]],
-
-
-                [[[ 0.        ,  0.25395554,  0.        ,  0.00663298],
-                    [ 0.07640249,  0.26861733,  0.        ,  0.        ],
-                    [ 0.        ,  0.        ,  0.21601611,  0.        ]],
-
-                    [[ 0.24614346,  0.05668366,  0.        ,  0.        ],
-                    [ 0.26191074,  0.        ,  0.02795208,  0.        ],
-                    [ 0.20344025,  0.21056002,  0.20959079,  0.        ]]]], dtype=float32)
-
- ```
-
----
-### Reshape
-Reshapes an output to a certain shape.
-
-Supports shape inference by allowing one -1 in the target shape. For example, if input shape is (2, 3, 4), target shape is (3, -1), then output shape will be (3, 8).
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            Reshape(targetShape, inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            Reshape(target_shape, input_shape=None, name=None)
-```
-
-**Parameters:**
-
-* `targetShape`: The target shape that you desire to have. Batch dimension should be excluded.
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`Shape`](../keras-api-scala/#shape) object. For Python API, it should be a shape tuple. Batch dimension should be excluded.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.layers.Reshape
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            model.add(Reshape(Array(3, 8), inputShape = Shape(2, 3, 4)))
-            val input = Tensor[Float](2, 2, 3, 4).randn()
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            (1,1,.,.) =
-            -1.7092276	-1.3941092	-0.6348466	0.71309644
-            0.3605411	0.025597548	0.4287048	-0.548675
-            0.4623341	-2.3912702	0.22030865	-0.058272455
-
-            (1,2,.,.) =
-            -1.5049093	-1.8828062	0.8230564	-0.020209199
-            -0.3415721	1.1219939	1.1089007	-0.74697906
-            -1.503861	-1.616539	0.048006497	1.1613717
-
-            (2,1,.,.) =
-            0.21216023	1.0107462	0.8586909	-0.05644316
-            -0.31436008	1.6892323	-0.9961186	-0.08169463
-            0.3559391	0.010261055	-0.70408463	-1.2480727
-
-            (2,2,.,.) =
-            1.7663039	0.07122444	0.073556066	-0.7847014
-            0.17604464	-0.99110585	-1.0302067	-0.39024687
-            -0.0260166	-0.43142694	0.28443158	0.72679126
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3x4]
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            (1,.,.) =
-            -1.7092276	-1.3941092	-0.6348466	0.71309644	    0.3605411	0.025597548	0.4287048	-0.548675
-            0.4623341	-2.3912702	0.22030865	-0.058272455	-1.5049093	-1.8828062	0.8230564	-0.020209199
-            -0.3415721	1.1219939	1.1089007	-0.74697906	    -1.503861	-1.616539	0.048006497	1.1613717
-
-            (2,.,.) =
-            0.21216023	1.0107462	0.8586909	-0.05644316	    -0.31436008	1.6892323	-0.9961186	-0.08169463
-            0.3559391	0.010261055	-0.70408463	-1.2480727	    1.7663039	0.07122444	0.073556066	-0.7847014
-            0.17604464	-0.99110585	-1.0302067	-0.39024687	    -0.0260166	-0.43142694	0.28443158	0.72679126
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x3x8]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            import numpy as np
-            from bigdl.dllib.keras.layers import *
-            from bigdl.dllib.keras.models import Sequential
-
-
-            model = Sequential()
-            model.add(Reshape(target_shape=(3, 8), input_shape=(2, 3, 4)))
-            input = np.random.random([2, 2, 3, 4])
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            [[[[0.39260304 0.10383185 0.87490319 0.89167328]
-            [0.61649117 0.43285247 0.86851582 0.97743004]
-            [0.90018969 0.04303951 0.74263493 0.14208656]]
-            [[0.66193405 0.93432157 0.76160537 0.70437459]
-            [0.99953431 0.23016734 0.42293405 0.66078049]
-            [0.03357645 0.9695145  0.30111138 0.67109948]]]
-
-            [[[0.39640201 0.92930203 0.86027666 0.13958544]
-            [0.34584767 0.14743425 0.93804016 0.38053062]
-            [0.55068792 0.77375329 0.84161166 0.48131356]]
-            [[0.90116368 0.53253689 0.03332962 0.58278686]
-            [0.34935685 0.32599554 0.97641892 0.57696434]
-            [0.53974677 0.90682861 0.20027319 0.05962118]]]]
-                ```
-
-        Output is:
-
-        ..  code-block:: python
-
-            [[[0.39260304 0.10383185 0.8749032  0.89167327 0.6164912  0.43285248 0.86851585 0.97743005]
-            [0.9001897  0.04303951 0.74263495 0.14208655 0.661934   0.9343216  0.7616054  0.7043746 ]
-            [0.9995343  0.23016734 0.42293406 0.6607805  0.03357645 0.9695145  0.30111137 0.6710995 ]]
-
-            [[0.396402   0.92930204 0.86027664 0.13958544 0.34584767 0.14743425 0.93804014 0.38053063]
-            [0.5506879  0.7737533  0.8416117  0.48131356 0.9011637  0.53253686 0.03332962 0.58278686]
-            [0.34935686 0.32599553 0.9764189  0.5769643  0.53974676 0.9068286  0.20027319 0.05962119]]]
-```
-
----
-### Merge
-
-Used to merge a list of inputs into a single output, following some merge mode.
-
-Merge must have at least two input layers.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            Merge(layers = null, mode = "sum", concatAxis = -1, inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            Merge(layers=None, mode="sum", concat_axis=-1, input_shape=None, name=None)
-
-```
-
-**Parameters:**
-
-* `layers`: A list of layer instances. Must be more than one layer.
-* `mode`: Merge mode. String, must be one of: 'sum', 'mul', 'concat', 'ave', 'cos', 'dot', 'max'. Default is 'sum'.
-* `concatAxis`: Integer, axis to use when concatenating layers. Only specify this when merge mode is 'concat'. Default is -1, meaning the last axis of the input.
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`MultiShape`](../keras-api-scala/#shape) object. For Python API, it should be a list of shape tuple. Batch dimension should be excluded.
-
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.layers.InputLayer
-            import com.intel.analytics.bigdl.dllib.keras.layers.Merge
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.utils.{Shape, T}
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            val l1 = InputLayer[Float](inputShape = Shape(2, 3))
-            val l2 = InputLayer[Float](inputShape = Shape(2, 3))
-            val layer = Merge[Float](layers = List(l1, l2), mode = "sum")
-            model.add(layer)
-            val input1 = Tensor[Float](2, 2, 3).rand(0, 1)
-            val input2 = Tensor[Float](2, 2, 3).rand(0, 1)
-            val input = T(1 -> input1, 2 -> input2)
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.utils.Table =
-            {
-                2: (1,.,.) =
-                0.87815475	0.15025006	0.34412447
-                0.07909282	0.008027249	0.111715704
-
-                (2,.,.) =
-                0.52245367	0.2547527	0.35857987
-                0.7718501	0.26783863	0.8642062
-
-                [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3]
-                1: (1,.,.) =
-                0.5377018	0.28364193	0.3424284
-                0.0075349305	0.9018168	0.9435114
-
-                (2,.,.) =
-                0.09112563	0.88585275	0.3100201
-                0.7910178	0.57497376	0.39764535
-
-                [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3]
-            }
-
-        Output is:
-
-         ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            (1,.,.) =
-            1.4158566	0.433892	0.6865529
-            0.08662775	0.90984404	1.0552272
-
-            (2,.,.) =
-            0.6135793	1.1406054	0.66859996
-            1.5628679	0.8428124	1.2618515
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            import numpy as np
-            from bigdl.dllib.keras.layers import *
-            from bigdl.dllib.keras.models import Sequential
-
-            model = Sequential()
-            l1 = InputLayer(input_shape=(3, 4))
-            l2 = InputLayer(input_shape=(3, 4))
-            model.add(Merge(layers=[l1, l2], mode='sum'))
-            input = [np.random.random([2, 3, 4]), np.random.random([2, 3, 4])]
-            output = model.forward(input)
-
-
-        Input is:
-
-        ..  code-block:: python
-
-            [[[[0.28764351, 0.0236015 , 0.78927442, 0.52646492],
-            [0.63922826, 0.45101604, 0.4555552 , 0.70105653],
-            [0.75790798, 0.78551523, 0.00686686, 0.61290369]],
-
-            [[0.00430865, 0.3303661 , 0.59915782, 0.90362298],
-            [0.26230717, 0.99383052, 0.50630521, 0.99119486],
-            [0.56138318, 0.68165639, 0.10644523, 0.51860127]]],
-
-            [[[0.84365767, 0.8854741 , 0.84183673, 0.96322321],
-            [0.49354248, 0.97936826, 0.2266097 , 0.88083622],
-            [0.11011776, 0.65762034, 0.17446099, 0.76658969]],
-
-            [[0.58266689, 0.86322199, 0.87122999, 0.19031255],
-            [0.42275118, 0.76379413, 0.21355413, 0.81132937],
-            [0.97294728, 0.68601731, 0.39871792, 0.63172344]]]]
-
-        Output is
-
-        ..  code-block:: python
-
-            [[[1.1313012  0.90907556 1.6311111  1.4896882 ]
-            [1.1327708  1.4303843  0.6821649  1.5818927 ]
-            [0.8680257  1.4431355  0.18132785 1.3794935 ]]
-
-            [[0.5869755  1.1935881  1.4703878  1.0939355 ]
-            [0.68505836 1.7576246  0.71985936 1.8025242 ]
-            [1.5343305  1.3676738  0.50516313 1.1503248 ]]]
-
-```
-
----
-### MaxoutDense
-
-A dense maxout layer that takes the element-wise maximum of linear layers.
-
-This allows the layer to learn a convex, piecewise linear activation function over the inputs.
-
-The input of this layer should be 2D.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            MaxoutDense(outputDim, nbFeature = 4, wRegularizer = null, bRegularizer = null, bias = true, inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            MaxoutDense(output_dim, nb_feature=4, W_regularizer=None, b_regularizer=None, bias=True, input_dim=None, input_shape=None, name=None)
-```
-
-
-**Parameters:**
-
-* `outputDim`: The size of output dimension.
-* `nbFeature`: Number of Dense layers to use internally. Integer. Default is 4.
-* `wRegularizer`: An instance of [Regularizer](https://bigdl-project.github.io/master/#APIGuide/Regularizers/), (eg. L1 or L2 regularization), applied to the input weights matrices. Default is null.
-* `bRegularizer`: An instance of [Regularizer](https://bigdl-project.github.io/master/#APIGuide/Regularizers/), applied to the bias. Default is null.
-* `bias`: Whether to include a bias (i.e. make the layer affine rather than linear). Default is true.
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`Shape`](../keras-api-scala/#shape) object. For Python API, it should be a shape tuple. Batch dimension should be excluded.
-
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.layers.MaxoutDense
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            model.add(MaxoutDense(2, inputShape = Shape(3)))
-            val input = Tensor[Float](2, 3).randn()
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            -1.3550005	-1.1668127	-1.2882779
-            0.83600295	-1.94683	1.323666
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x3]
-
-        Output is:
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            0.71675766	1.2987505
-            0.9871184	0.6634239
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            import numpy as np
-            from bigdl.dllib.keras.layers import *
-            from bigdl.dllib.keras.models import Sequential
-
-            model = Sequential()
-            model.add(MaxoutDense(2, input_shape=(3, )))
-            input = np.random.random([2, 3])
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            [[0.15996114 0.8391686  0.81922903]
-            [0.52929427 0.35061754 0.88167693]]
-
-        Output is
-
-        ..  code-block:: python
-
-            [[0.4479192  0.4842512]
-            [0.16833156 0.521764 ]]
-```
-
----
-### Squeeze
-Delete the singleton dimension(s). The batch dimension needs to be unchanged.
-
-For example, if input has size (2, 1, 3, 4, 1):
-
-Squeeze(1) will give output size (2, 3, 4, 1),
-
-Squeeze() will give output size (2, 3, 4)
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            Squeeze(dims = null, inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            Squeeze(dim=None, input_shape=None, name=None)
-```
-
-**Parameters:**
-
-* `dims`: The dimension(s) to squeeze. 0-based index. Cannot squeeze the batch dimension. The selected dimensions must be singleton, i.e. having size 1. Default is null, and in this case all the non-batch singleton dimensions will be deleted.
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`Shape`](../keras-api-scala/#shape) object. For Python API, it should be a shape tuple. Batch dimension should be excluded.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.layers.Squeeze
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            model.add(Squeeze[Float](1, inputShape = Shape(1, 1, 32)))
-            val input = Tensor[Float](1, 1, 1, 32).randn()
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            (1,1,.,.) =
-            0.5521966       -1.2199087      0.365958        1.3845297       0.115254946     -0.20352958     2.4912808       0.987046        -0.2115477      3.0530396      -1.0043625      1.4688021       -1.2412603      -0.25383064     0.49164283      -0.40329486     0.26323202      0.7979045       0.025444122   0.47221214       1.3995043       0.48498031      -0.86961967     -0.058370713    -0.85965866     -1.2727696      0.45570874      0.73393697      0.2567143      1.4261572       -0.37773672     -0.7339463
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 1x1x1x32]
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            (1,.,.) =
-            0.5521966       -1.2199087      0.365958        1.3845297       0.115254946     -0.20352958     2.4912808       0.987046        -0.2115477      3.0530396      -1.0043625      1.4688021       -1.2412603      -0.25383064     0.49164283      -0.40329486     0.26323202      0.7979045       0.025444122   0.47221214       1.3995043       0.48498031      -0.86961967     -0.058370713    -0.85965866     -1.2727696      0.45570874      0.73393697      0.2567143      1.4261572       -0.37773672     -0.7339463
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 1x1x32]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            import numpy as np
-            from bigdl.dllib.keras.layers import *
-            from bigdl.dllib.keras.models import Sequential
-
-            model = Sequential()
-            model.add(Squeeze(1, input_shape=(1, 1, 32)))
-            input = np.random.random([1, 1, 1, 32])
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            [[[[0.20585343, 0.47011701, 0.14553177, 0.93915599, 0.57234281,
-                0.91631229, 0.32244256, 0.94243351, 0.86595631, 0.73916763,
-                0.35898731, 0.65208275, 0.07935983, 0.89313423, 0.68601269,
-                0.48919672, 0.28406399, 0.20962799, 0.88071757, 0.45501821,
-                0.60931183, 0.46709718, 0.14218838, 0.42517758, 0.9149958 ,
-                0.0843243 , 0.27302307, 0.75281922, 0.3688931 , 0.86913729,
-                0.89774196, 0.77838838]]]]
-
-        Output is
-
-        ..  code-block:: python
-
-            [[[0.20585343, 0.470117  , 0.14553176, 0.939156  , 0.5723428 ,
-            0.9163123 , 0.32244256, 0.94243354, 0.8659563 , 0.73916763,
-            0.3589873 , 0.65208274, 0.07935983, 0.89313424, 0.6860127 ,
-            0.48919672, 0.284064  , 0.20962799, 0.8807176 , 0.45501822,
-            0.6093118 , 0.46709716, 0.14218839, 0.42517757, 0.9149958 ,
-            0.0843243 , 0.27302307, 0.75281924, 0.36889312, 0.8691373 ,
-            0.897742  , 0.7783884 ]]]
-```
-
----
-### BinaryThreshold
-Threshold the input.
-
-If an input element is smaller than the threshold value, it will be replaced by 0; otherwise, it will be replaced by 1.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            BinaryThreshold(value = 1e-6, inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            BinaryThreshold(value=1e-6, input_shape=None, name=None)
-```
-
-**Parameters:**
-
-* `value`: The threshold value to compare with. Default is 1e-6.
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`Shape`](../keras-api-scala/#shape) object. For Python API, it should be a shape tuple. Batch dimension should be excluded.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.layers.BinaryThreshold
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            model.add(BinaryThreshold[Float](inputShape = Shape(2, 3, 4)))
-            val input = Tensor[Float](2, 2, 3, 4).randn()
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            (1,1,.,.) =
-            -1.1907398      -0.18995096     -2.0344417      -1.3789974
-            -1.8801064      -0.74757665     -0.4339697      0.0058485097
-            0.7012256       -0.6363152      2.0156987       -0.5512639
-
-            (1,2,.,.) =
-            -0.5251603      0.082127444     0.29550993      1.6357868
-            -1.3828015      -0.11842779     0.3316966       -0.14360528
-            0.21216457      -0.117370956    -0.12934707     -0.35854268
-
-            (2,1,.,.) =
-            -0.9071151      -2.8566089      -0.4796377      -0.915065
-            -0.8439908      -0.25404388     -0.39926198     -0.15191565
-            -1.0496653      -0.403675       -1.3591816      0.5311797
-
-            (2,2,.,.) =
-            0.53509855      -0.08892822     1.2196561       -0.62759316
-            -0.47476718     -0.43337926     -0.10406987     1.4035174
-            -1.7120812      1.1328355       0.9219375       1.3813454
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3x4]
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            (1,1,.,.) =
-            0.0     0.0     0.0     0.0
-            0.0     0.0     0.0     1.0
-            1.0     0.0     1.0     0.0
-
-            (1,2,.,.) =
-            0.0     1.0     1.0     1.0
-            0.0     0.0     1.0     0.0
-            1.0     0.0     0.0     0.0
-
-            (2,1,.,.) =
-            0.0     0.0     0.0     0.0
-            0.0     0.0     0.0     0.0
-            0.0     0.0     0.0     1.0
-
-            (2,2,.,.) =
-            1.0     0.0     1.0     0.0
-            0.0     0.0     0.0     1.0
-            0.0     1.0     1.0     1.0
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3x4]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            import numpy as np
-            from bigdl.dllib.keras.layers import *
-            from bigdl.dllib.keras.models import Sequential
-
-            model = Sequential()
-            model.add(BinaryThreshold(input_shape=(2, 3, 4)))
-            input = np.random.random([2, 2, 3, 4])
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            array([[[[0.30421481, 0.47800487, 0.54249411, 0.90109767],
-                    [0.72650405, 0.53096719, 0.66346109, 0.0589329 ],
-                    [0.12994731, 0.92181174, 0.43129874, 0.97306968]],
-
-                    [[0.3031087 , 0.20339982, 0.69034712, 0.40191   ],
-                    [0.57517034, 0.30159448, 0.4801747 , 0.75175084],
-                    [0.8599362 , 0.93523811, 0.34768628, 0.10840162]]],
-
-
-                [[[0.46102959, 0.33029002, 0.69340103, 0.32885719],
-                    [0.84405147, 0.03421879, 0.68242578, 0.03560338],
-                    [0.12244515, 0.3610654 , 0.01312785, 0.84485178]],
-
-                    [[0.73472287, 0.75707757, 0.77070527, 0.40863145],
-                    [0.01137898, 0.82896826, 0.1498069 , 0.22309423],
-                    [0.92737483, 0.36217222, 0.06679799, 0.33304362]]]])
-
-        Output is
-
-        ..  code-block:: python
-
-            array([[[[1., 1., 1., 1.],
-                    [1., 1., 1., 1.],
-                    [1., 1., 1., 1.]],
-
-                    [[1., 1., 1., 1.],
-                    [1., 1., 1., 1.],
-                    [1., 1., 1., 1.]]],
-
-
-                [[[1., 1., 1., 1.],
-                    [1., 1., 1., 1.],
-                    [1., 1., 1., 1.]],
-
-                    [[1., 1., 1., 1.],
-                    [1., 1., 1., 1.],
-                    [1., 1., 1., 1.]]]], dtype=float32)
-```
-
----
-### Sqrt
-Applies an element-wise square root operation to the input.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            Sqrt(inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            Sqrt(input_shape=None, name=None)
-```
-
-**Parameters:**
-
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`Shape`](../keras-api-scala/#shape) object. For Python API, it should be a shape tuple. Batch dimension should be excluded.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.layers.Sqrt
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            model.add(Sqrt[Float](inputShape = Shape(3)))
-            val input = Tensor[Float](2, 3).randn()
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            0.6950394       0.5234307       1.7375475
-            0.25833175      0.02685826      -0.6046901
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x3]
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            0.8336902       0.7234851       1.3181607
-            0.50826347      0.16388491      NaN
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x3]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            import numpy as np
-            from bigdl.dllib.keras.layers import *
-            from bigdl.dllib.keras.models import Sequential
-
-            model = Sequential()
-            model.add(Sqrt(input_shape=(3, )))
-            input = np.random.random([2, 3])
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            [[0.2484558 , 0.65280218, 0.35286984],
-            [0.19616094, 0.30966802, 0.82148169]]
-
-        Output is
-
-        ..  code-block:: python
-
-            [[0.4984534 , 0.80796176, 0.5940285 ],
-            [0.4429006 , 0.55647826, 0.9063563 ]]
-```
-
----
-### Mul
-Multiply a single scalar factor to the incoming data
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            Mul(inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            Mul(input_shape=None, name=None)
-```
-
-**Parameters:**
-
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`Shape`](../keras-api-scala/#shape) object. For Python API, it should be a shape tuple. Batch dimension should be excluded.
-* `name`: String to set the name of the layer. If not specified, its name will by default to be a generated string.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.layers.Mul
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            model.add(Mul[Float](inputShape = Shape(3, 4)))
-            val input = Tensor[Float](2, 3, 4).randn()
-            val output = model.forward(input)
-
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            (1,.,.) =
-            -1.2316265  -2.008802 -1.3908259  -0.61135375
-            -0.48992255 0.1786112 0.18872596  0.49621895
-            -0.6931602  -0.919745 -0.09019699 -0.41218707
-
-            (2,.,.) =
-            -0.3135355  -0.4385771  -0.3317269  1.0412029
-            -0.8859662  0.17758773  -0.73779273 -0.4445366
-            0.3921595 1.6923207 0.014470488 0.4044164
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x3x4]
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            (1,.,.) =
-            -0.59036994 -0.9629025  -0.6666808  -0.29304734
-            -0.2348403  0.0856158 0.09046422  0.23785843
-            -0.33226058 -0.44087213 -0.043235175  -0.19757845
-
-            (2,.,.) =
-            -0.15029064 -0.21022828 -0.15901053 0.49909195
-            -0.42468053 0.0851252 -0.3536548  -0.21308492
-            0.18797839  0.81119984  0.006936308 0.19385365
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x3x4]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            import numpy as np
-            from bigdl.dllib.keras.layers import *
-            from bigdl.dllib.keras.models import Sequential
-
-            model = Sequential()
-            model.add(Mul(input_shape=(3, 4)))
-            input = np.random.random([2, 3, 4])
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            array([[[ 0.22607292,  0.59806062,  0.19428923,  0.22928606],
-                    [ 0.13804536,  0.1615547 ,  0.52824658,  0.52794904],
-                    [ 0.4049169 ,  0.94109084,  0.58158453,  0.78368633]],
-
-                [[ 0.86233305,  0.47995805,  0.80430949,  0.9931171 ],
-                    [ 0.35179631,  0.33615276,  0.87756877,  0.73560288],
-                    [ 0.29775703,  0.11404466,  0.77695536,  0.97580018]]])
-
-        Output is
-
-        ..  code-block:: python
-
-            array([[[-0.22486402, -0.59486258, -0.1932503 , -0.22805998],
-                    [-0.13730718, -0.1606908 , -0.52542186, -0.52512592],
-                    [-0.40275168, -0.93605846, -0.57847458, -0.77949566]],
-
-                [[-0.85772187, -0.47739154, -0.80000854, -0.9878065 ],
-                    [-0.34991512, -0.33435524, -0.87287611, -0.73166931],
-                    [-0.29616481, -0.11343482, -0.77280068, -0.97058219]]], dtype=float32)
-```
-
----
-### MulConstant
-Multiply the input by a (non-learnable) scalar constant.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            MulConstant(constant, inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            MulConstant(constant, input_shape=None, name=None)
-```
-
-**Parameters:**
-
-* `constant`: The scalar constant to be multiplied.
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`Shape`](../keras-api-scala/#shape) object. For Python API, it should be a shape tuple. Batch dimension should be excluded.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.layers.MulConstant
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            model.add(MulConstant[Float](2.2, inputShape = Shape(3, 4)))
-            val input = Tensor[Float](2, 3, 4).randn()
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            (1,.,.) =
-            -0.16873977     1.0812985       1.0942211       -0.67091423
-            1.0086882       0.5915831       0.26184535      -1.361431
-            1.5616825       -0.037591368    1.2794676       1.0692137
-
-            (2,.,.) =
-            0.29868057      -0.23266982     -0.7679556      -2.209848
-            -0.13954644     -0.1368473      -0.54510623     1.8397199
-            -0.58691734     -0.56410027     -1.5567777      0.050648995
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x3x4]
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            (1,.,.) =
-            -0.3712275      2.3788567       2.4072864       -1.4760114
-            2.219114        1.3014828       0.57605976      -2.9951482
-            3.4357016       -0.08270101     2.8148286       2.3522704
-
-            (2,.,.) =
-            0.6570973       -0.5118736      -1.6895024      -4.8616657
-            -0.3070022      -0.30106407     -1.1992338      4.047384
-            -1.2912182      -1.2410206      -3.424911       0.11142779
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x3x4]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            import numpy as np
-            from bigdl.dllib.keras.layers import *
-            from bigdl.dllib.keras.models import Sequential
-
-            model = Sequential()
-            model.add(MulConstant(2.2, input_shape=(3, 4)))
-            input = np.random.random([2, 3, 4])
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            [[[0.39874191, 0.66634984, 0.23907766, 0.31587494],
-            [0.78842014, 0.93057835, 0.80739529, 0.71541279],
-            [0.2231424 , 0.3372844 , 0.94678072, 0.52928034]],
-
-            [[0.60142458, 0.41221671, 0.00890549, 0.32069845],
-            [0.51122554, 0.76280426, 0.87579418, 0.17182832],
-            [0.54133184, 0.19814384, 0.92529327, 0.5616615 ]]]
-
-        Output is
-
-        ..  code-block:: python
-
-            [[[0.8772322 , 1.4659697 , 0.5259709 , 0.6949249 ],
-            [1.7345244 , 2.0472724 , 1.7762697 , 1.5739082 ],
-            [0.4909133 , 0.7420257 , 2.0829177 , 1.1644168 ]],
-
-            [[1.3231341 , 0.9068768 , 0.01959208, 0.7055366 ],
-            [1.1246961 , 1.6781695 , 1.9267472 , 0.37802234],
-            [1.19093   , 0.43591645, 2.0356452 , 1.2356553 ]]]
-```
-
----
-### Scale
-Scale is the combination of CMul and CAdd.
-
-Computes the element-wise product of the input and weight, with the shape of the weight "expand" to match the shape of the input.
-
-Similarly, perform an expanded bias and perform an element-wise add.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            Scale(size, inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            Scale(size, input_shape=None, name=None)
-```
-
-**Parameters:**
-
-* `size`: Size of the weight and bias.
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`Shape`](../keras-api-scala/#shape) object. For Python API, it should be a shape tuple. Batch dimension should be excluded.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.layers.Scale
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            var array = Array(1, 2)
-            model.add(Scale[Float](array, inputShape = Shape(3)))
-            val input = Tensor[Float](2, 3).randn()
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            -0.006399727    -0.06412822     -0.2334789
-            0.31029955      1.6557469       1.9614618
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x3]
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            0.09936619      0.57585865      0.20324506
-            0.38537437      -0.8598822      -1.0186496
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x3]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            import numpy as np
-            from bigdl.dllib.keras.layers import Scale
-            from bigdl.dllib.keras.models import Sequential
-
-            model = Sequential()
-            model.add(Scale((2, 1), input_shape=(3, )))
-            input = np.random.random([2, 3])
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            [[0.7242994 , 0.77888884, 0.71470432],
-            [0.03058471, 0.00602764, 0.57513629]]
-
-        Output is
-
-        ..  code-block:: python
-
-            [[1.0946966 , 1.1255064 , 1.0892813 ],
-            [0.58151895, 0.5909191 , 0.37307182]]
-```
-
----
-### Log
-Applies a log transformation to the input.
-
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            Log(inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            Log(input_shape=None, name=None)
-```
-
-**Parameters:**
-
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`Shape`](../keras-api-scala/#shape) object. For Python API, it should be a shape tuple. Batch dimension should be excluded.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.layers.Log
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            model.add(Log[Float](inputShape = Shape(2, 4, 4)))
-            val input = Tensor[Float](1, 2, 4, 4).randn()
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            (1,1,.,.) =
-            0.38405678      -0.5502389      -0.383079       -0.988537
-            -0.6294056      -0.7838047      0.8747865       -1.0659786
-            -2.2445498      -0.5488076      -0.42898977     0.6916364
-            1.6542299       -0.9966279      -0.38244298     1.6954672
-
-            (1,2,.,.) =
-            0.43478605      -0.6678534      1.9530942       -0.5209587
-            0.12899925      0.20572199      2.0359943       0.55223215
-            0.65247816      0.8792108       -0.38860792     0.48663738
-            -1.0084358      0.31141177      0.69208467      0.48385203
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 1x2x4x4]
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            (1,1,.,.) =
-            -0.95696485     NaN     NaN     NaN
-            NaN     NaN     -0.13377543     NaN
-            NaN     NaN     NaN     -0.36869493
-            0.5033356       NaN     NaN     0.5279584
-
-            (1,2,.,.) =
-            -0.83290124     NaN     0.6694149       NaN
-            -2.0479486      -1.5812296      0.7109843       -0.5937868
-            -0.4269776      -0.12873057     NaN     -0.720236
-            NaN     -1.1666392      -0.36804697     -0.72597617
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 1x2x4x4]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            import numpy as np
-            from bigdl.dllib.keras.layers import Log
-            from bigdl.dllib.keras.models import Sequential
-
-            model = Sequential()
-            model.add(Log(input_shape=(2, 4, 4)))
-            input = np.random.random([1, 2, 4, 4])
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            [[[[0.90127539, 0.9861594 , 0.04722941, 0.63719453],
-            [0.46529477, 0.81511804, 0.24435558, 0.45003562],
-            [0.15170845, 0.35157662, 0.0925214 , 0.63852947],
-            [0.27817508, 0.42572846, 0.44363004, 0.03536394]],
-
-            [[0.65027784, 0.00429838, 0.07434429, 0.18653305],
-            [0.19659183, 0.66647529, 0.77821197, 0.65894478],
-            [0.28212032, 0.52307663, 0.09589939, 0.71547588],
-            [0.84344158, 0.25291738, 0.52145649, 0.82982377]]]]
-
-        Output is:
-
-        ..  code-block:: python
-
-            [[[[-0.10394441, -0.01393729, -3.0527387 , -0.45068032],
-            [-0.76508415, -0.20442237, -1.4091308 , -0.79842854],
-            [-1.8857948 , -1.0453277 , -2.3803153 , -0.44858742],
-            [-1.2795045 , -0.85395354, -0.8127643 , -3.3420627 ]],
-
-            [[-0.43035555, -5.4495163 , -2.5990484 , -1.6791469 ],
-            [-1.6266255 , -0.4057522 , -0.25075635, -0.41711554],
-            [-1.2654216 , -0.64802724, -2.3444557 , -0.33480743],
-            [-0.1702646 , -1.3746924 , -0.6511295 , -0.1865419 ]]]]
-```
-
----
-### Identity
-Identity just return the input to output.
-
-It's useful in same parallel container to get an origin input.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            Identity(inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            Identity(input_shape=None, name=None)
-```
-
-**Parameters:**
-
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`Shape`](../keras-api-scala/#shape) object. For Python API, it should be a shape tuple. Batch dimension should be excluded.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.layers.Identity
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            model.add(Identity[Float](inputShape = Shape(4, 4)))
-            val input = Tensor[Float](3, 4, 4).randn()
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            (1,.,.) =
-            1.9601166       -0.86010313     0.0023731247    -0.81219757
-            1.1469674       -1.5375912      -1.5348053      -0.34829113
-            -1.236773       -0.7183283      -0.89256984     0.8605067
-            0.7937664       0.52992857      -1.6157389      0.36134166
-
-            (2,.,.) =
-            -0.44434744     -0.23848957     -0.01632014     -0.58109635
-            -0.19856784     -2.3421717      -0.5868049      -0.76775354
-            0.80254126      1.78778 -1.1835604      1.4489703
-            0.8731402       0.8906672       0.2800079       -0.6715317
-
-            (3,.,.) =
-            1.4093032       2.358169        -1.4620789      1.1904576
-            -0.18263042     -0.31869793     2.01061 1.2159953
-            -0.5801479      1.2949371       -0.7510707      -1.0707517
-            0.30815956      -1.161963       -0.26964024     -0.4759499
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 3x4x4]
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            (1,.,.) =
-            1.9601166       -0.86010313     0.0023731247    -0.81219757
-            1.1469674       -1.5375912      -1.5348053      -0.34829113
-            -1.236773       -0.7183283      -0.89256984     0.8605067
-            0.7937664       0.52992857      -1.6157389      0.36134166
-
-            (2,.,.) =
-            -0.44434744     -0.23848957     -0.01632014     -0.58109635
-            -0.19856784     -2.3421717      -0.5868049      -0.76775354
-            0.80254126      1.78778 -1.1835604      1.4489703
-            0.8731402       0.8906672       0.2800079       -0.6715317
-
-            (3,.,.) =
-            1.4093032       2.358169        -1.4620789      1.1904576
-            -0.18263042     -0.31869793     2.01061 1.2159953
-            -0.5801479      1.2949371       -0.7510707      -1.0707517
-            0.30815956      -1.161963       -0.26964024     -0.4759499
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 3x4x4]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            import numpy as np
-            from bigdl.dllib.keras.layers import Identity
-            from bigdl.dllib.keras.models import Sequential
-
-            model = Sequential()
-            model.add(Identity(input_shape=(4, 4)))
-            input = np.random.random([3, 4, 4])
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            [[[0.36751123, 0.92287101, 0.73894405, 0.33699379],
-            [0.69405782, 0.9653215 , 0.2617223 , 0.68205229],
-            [0.71455325, 0.99419333, 0.90886495, 0.10232991],
-            [0.1644055 , 0.30013138, 0.98921154, 0.26803146]],
-
-            [[0.35898357, 0.72067882, 0.13236563, 0.71935521],
-            [0.30865626, 0.71098844, 0.86718946, 0.12531168],
-            [0.84916882, 0.84221518, 0.52186664, 0.87239729],
-            [0.50637899, 0.10890469, 0.86832705, 0.93581179]],
-
-            [[0.19640105, 0.09341008, 0.12043328, 0.09261859],
-            [0.66019486, 0.07251262, 0.80929761, 0.39094486],
-            [0.63027391, 0.39537796, 0.55578905, 0.53933265],
-            [0.13885559, 0.56695373, 0.17036027, 0.4577097 ]]]
-
-        Output is
-
-        ..  code-block:: python
-
-            [[[0.36751124, 0.922871  , 0.73894405, 0.33699378],
-            [0.6940578 , 0.9653215 , 0.2617223 , 0.6820523 ],
-            [0.71455324, 0.9941933 , 0.908865  , 0.10232991],
-            [0.1644055 , 0.30013138, 0.98921156, 0.26803148]],
-
-            [[0.35898358, 0.7206788 , 0.13236563, 0.7193552 ],
-            [0.30865628, 0.71098846, 0.86718947, 0.12531169],
-            [0.84916884, 0.8422152 , 0.5218666 , 0.8723973 ],
-            [0.506379  , 0.10890469, 0.868327  , 0.9358118 ]],
-
-            [[0.19640104, 0.09341008, 0.12043328, 0.09261858],
-            [0.6601949 , 0.07251262, 0.8092976 , 0.39094487],
-            [0.63027394, 0.39537796, 0.55578905, 0.5393326 ],
-            [0.13885559, 0.5669537 , 0.17036027, 0.4577097 ]]]
-```
-
----
-### Select
-Select an index of the input in the given dim and return the subset part.
-
-The batch dimension needs to be unchanged.
-
-For example, if input is:
-
-[[1, 2, 3],
- [4, 5, 6]]
-
-Select(1, 1) will give output [2 5]
-
-Select(1, -1) will give output [3 6]
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            Select(dim, index, inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            Select(dim, index, input_shape=None, name=None)
-```
-
-**Parameters:**
-
-* `dim`: The dimension to select. 0-based index. Cannot select the batch dimension. -1 means the last dimension of the input.
-* `index`: The index of the dimension to be selected. 0-based index. -1 means the last dimension of the input.
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`Shape`](../keras-api-scala/#shape) object. For Python API, it should be a shape tuple. Batch dimension should be excluded.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.keras.layers.Select
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            model.add(Select[Float](1, 2, inputShape = Shape(3, 1, 3)))
-            val input = Tensor[Float](1, 3, 1, 3).randn()
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            (1,1,.,.) =
-            -0.67646945     -0.5485965      -0.11103154
-            (1,2,.,.) =
-            -0.13488655     0.43843046      -0.04482145
-            (1,3,.,.) =
-            -0.18094881     0.19431554      -1.7624844
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 1x3x1x3]
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            (1,.,.) =
-            -0.18094881     0.19431554      -1.7624844
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 1x1x3]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            from bigdl.dllib.keras.layers import Select
-            from bigdl.dllib.keras.models import Sequential
-            import numpy as np
-
-            model = Sequential()
-            model.add(Select(1, 2, input_shape=(3, 1, 3)))
-            input = np.random.random([1, 3, 1, 3])
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            array([[[[0.53306099, 0.95147881, 0.15222129]],
-                    [[0.89604861, 0.90160974, 0.5230576 ]],
-                    [[0.70779386, 0.14438568, 0.37601195]]]])
-
-        Output is:
-
-        ..  code-block:: python
-
-            array([[[0.7077939 , 0.14438568, 0.37601194]]], dtype=float32)
-```
-
----
-### Dense
-A densely-connected NN layer.
-
-The most common input is 2D.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            Dense(outputDim, init = "glorot_uniform", activation = null, wRegularizer = null, bRegularizer = null, bias = true, inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            Dense(output_dim, init="glorot_uniform", activation=None, W_regularizer=None, b_regularizer=None, bias=True, input_dim=None, input_shape=None, name=None)
-```
-
-**Parameters:**
-
-* `outputDim`: The size of the output dimension.
-* `init`: Initialization method for the weights of the layer. Default is Xavier.You can also pass in corresponding string representations such as 'glorot_uniform' or 'normal', etc. for simple init methods in the factory method.
-* `activation`: Activation function to use. Default is null.You can also pass in corresponding string representations such as 'relu'or 'sigmoid', etc. for simple activations in the factory method.
-* `wRegularizer`: An instance of [Regularizer](https://bigdl-project.github.io/master/#APIGuide/Regularizers/), applied to the input weights matrices. Default is null.
-* `bRegularizer`: An instance of [Regularizer](https://bigdl-project.github.io/master/#APIGuide/Regularizers/), applied to the bias. Default is null.
-* `bias`: Whether to include a bias (i.e. make the layer affine rather than linear). Default is true.
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`Shape`](../keras-api-scala/#shape) object. For Python API, it should be a shape tuple. Batch dimension should be excluded.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.keras.layers.Dense
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            model.add(Dense[Float](5, activation = "relu", inputShape = Shape(4)))
-            val input = Tensor[Float](2, 4).randn()
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            1.4289935       -1.7659454      -0.08306135     -1.0153456
-            1.0191492       0.37392816      1.3076705       -0.19495767
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x4]
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            0.5421522       0.49008092      0.0     0.0     0.0
-            0.07940009      0.0     0.12953377      0.0     0.0
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x5]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            import numpy as np
-            from bigdl.dllib.keras.layers import Dense
-            from bigdl.dllib.keras.models import Sequential
-
-            model = Sequential()
-            model.add(Dense(5, activation="relu", input_shape=(4, )))
-            input = np.random.random([2, 4])
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            array([[0.64593485, 0.67393322, 0.72505368, 0.04654095],
-                [0.19430753, 0.47800889, 0.00743648, 0.6412403 ]])
-
-        Output is
-
-        ..  code-block:: python
-
-            array([[0.        , 0.        , 1.2698183 , 0.        , 0.10656227],
-                [0.        , 0.        , 0.6236721 , 0.00299606, 0.29664695]],
-                dtype=float32)
-```
-
----
-### Negative
-Computes the negative value of each element of the input.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            Negative(inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            Negative(input_shape=None, name=None)
-```
-
-**Parameters:**
-
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`Shape`](../keras-api-scala/#shape) object. For Python API, it should be a shape tuple. Batch dimension should be excluded.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.keras.layers.Negative
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            model.add(Negative[Float](inputShape = Shape(2, 3)))
-            val input = Tensor[Float](2, 2, 3).randn()
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            (1,.,.) =
-            1.031705        -0.5723963      1.998631
-            -0.32908052     2.4069138       -2.4111257
-            (2,.,.) =
-            0.5355049       -1.4404331      -0.38116863
-            -0.45641592     -1.1485358      0.94766915
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3]
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            (1,.,.) =
-            -1.031705       0.5723963       -1.998631
-            0.32908052      -2.4069138      2.4111257
-            (2,.,.) =
-            -0.5355049      1.4404331       0.38116863
-            0.45641592      1.1485358       -0.94766915
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            from bigdl.dllib.keras.layers import Negative
-            from bigdl.dllib.keras.models import Sequential
-            import numpy as np
-
-            model = Sequential()
-            model.add(Negative(input_shape=(2, 3)))
-            input = np.random.random([2, 2, 3])
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            array([[[0.39261261, 0.03164615, 0.32179116],
-                    [0.11969367, 0.61610712, 0.42573733]],
-                [[0.36794656, 0.90912174, 0.540356  ],
-                    [0.42667627, 0.04154093, 0.84692964]]])
-
-        Output is
-
-        ..  code-block:: python
-
-            array([[[-0.3926126 , -0.03164615, -0.32179114],
-                    [-0.11969367, -0.6161071 , -0.42573732]],
-                [[-0.36794657, -0.90912175, -0.540356  ],
-                    [-0.42667627, -0.04154094, -0.84692967]]], dtype=float32)
-```
-
----
-### CAdd
-
-This layer has a bias with given size.
-
-The bias will be added element-wise to the input.
-
-If the element number of the bias matches the input, a simple element-wise addition will be done.
-
-Or the bias will be expanded to the same size of the input.
-
-The expand means repeat on unmatched singleton dimension (if some unmatched dimension isn't a singleton dimension, an error will be raised).
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            CAdd(size, bRegularizer = null, inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            CAdd(size, b_regularizer=None, input_shape=None, name=None)
-```
-
-**Parameters:**
-
-* `size`: the size of the bias
-* `bRegularizer`: An instance of [Regularizer](https://bigdl-project.github.io/master/#APIGuide/Regularizers/), applied to the bias. Default is null.
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`Shape`](../keras-api-scala/#shape) object. For Python API, it should be a shape tuple. Batch dimension should be excluded.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.keras.layers.CAdd
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            model.add(CAdd[Float](Array(2, 3), inputShape = Shape(2, 3)))
-            val input = Tensor[Float](2, 2, 3).rand()
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            (1,.,.) =
-            0.2183351       0.32434112      0.89350265
-            0.3348259       0.78677046      0.24054797
-            (2,.,.) =
-            0.9945844       0.72363794      0.7737936
-            0.05522544      0.3517818       0.7417069
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3]
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            (1,.,.) =
-            0.1358028       0.6956667       1.0837181
-            0.6767027       0.7955346       0.5063505
-            (2,.,.) =
-            0.9120521       1.0949634       0.96400905
-            0.3971022       0.36054593      1.0075095
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            from bigdl.dllib.keras.layers import CAdd
-            from bigdl.dllib.keras.models import Sequential
-            import numpy as np
-
-            model = Sequential()
-            model.add(CAdd([2, 1], input_shape=(2, 3)))
-            input = np.random.rand(2, 2, 3)
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            array([[[0.4122004 , 0.73289359, 0.11500016],
-                    [0.26974491, 0.32166632, 0.91408442]],
-                [[0.66824327, 0.80271314, 0.75981145],
-                    [0.39271431, 0.07312566, 0.4966805 ]]])
-
-        Output is
-
-        ..  code-block:: python
-
-            array([[[ 0.06560206,  0.38629526, -0.23159817],
-                    [ 0.44287407,  0.4947955 ,  1.0872136 ]],
-                [[ 0.32164496,  0.45611483,  0.41321313],
-                    [ 0.56584346,  0.24625483,  0.6698097 ]]], dtype=float32)
-```
-
----
-### RepeatVector
-Repeats the input n times.
-
-The input of this layer should be 2D, i.e. (num_samples, features).
-The output of thi layer should be 3D, i.e. (num_samples, n, features).
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            RepeatVector(n, inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            RepeatVector(n, input_shape=None, name=None)
-```
-
-**Parameters:**
-
-* `n`: Repetition factor. Integer.
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`Shape`](../keras-api-scala/#shape) object. For Python API, it should be a shape tuple. Batch dimension should be excluded.
-* `name`: String to set the name of the layer. If not specified, its name will by default to be a generated string.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.keras.layers.RepeatVector
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            model.add(RepeatVector[Float](4, inputShape = Shape(3)))
-            val input = Tensor[Float](2, 3).randn()
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            -0.31839952 -0.3495366  0.542486
-            -0.54981124 -0.8428188  0.8225184
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x3]
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            (1,.,.) =
-            -0.31839952 -0.3495366  0.542486
-            -0.31839952 -0.3495366  0.542486
-            -0.31839952 -0.3495366  0.542486
-            -0.31839952 -0.3495366  0.542486
-
-            (2,.,.) =
-            -0.54981124 -0.8428188  0.8225184
-            -0.54981124 -0.8428188  0.8225184
-            -0.54981124 -0.8428188  0.8225184
-            -0.54981124 -0.8428188  0.8225184
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x4x3]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            import numpy as np
-            from bigdl.dllib.keras.layers import RepeatVector
-            from bigdl.dllib.keras.models import Sequential
-
-            model = Sequential()
-            model.add(RepeatVector(4, input_shape=(3, )))
-            input = np.random.random([2, 3])
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            array([[ 0.90715922,  0.54594769,  0.53952404],
-                [ 0.08989831,  0.07265549,  0.45830114]])
-
-        Output is
-
-        ..  code-block:: python
-
-            array([[[ 0.90715921,  0.54594767,  0.53952402],
-                    [ 0.90715921,  0.54594767,  0.53952402],
-                    [ 0.90715921,  0.54594767,  0.53952402],
-                    [ 0.90715921,  0.54594767,  0.53952402]],
-
-                [[ 0.08989831,  0.07265549,  0.45830116],
-                    [ 0.08989831,  0.07265549,  0.45830116],
-                    [ 0.08989831,  0.07265549,  0.45830116],
-                    [ 0.08989831,  0.07265549,  0.45830116]]], dtype=float32)
-```
----
-### GaussianSampler
-Takes {mean, log_variance} as input and samples from the Gaussian distribution.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            GaussianSampler(inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            GaussianSampler(input_shape=None, name=None)
-```
-
-**Parameters:**
-
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`MultiShape`](../keras-api-scala/#shape) object that consists of two identical Single Shape. For Python API, it should be a list of two identical shape tuple. Batch dimension should be excluded.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.keras.layers.GaussianSampler
-            import com.intel.analytics.bigdl.utils.{Shape, MultiShape, T}
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            val shape1 = Shape(2, 3)
-            val shape2 = Shape(2, 3)
-            model.add(GaussianSampler[Float](inputShape = MultiShape(List(shape1,shape2))))
-            val input1 = Tensor[Float](2, 2, 3).rand(0, 1)
-            val input2 = Tensor[Float](2, 2, 3).rand(0, 1)
-            val input = T(1 -> input1, 2 -> input2)
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.utils.Table =
-            {
-                    2: (1,.,.) =
-                    0.9996127    0.8964211       0.7424038
-                    0.40628982   0.37035564      0.20108517
-
-                    (2,.,.) =
-                    0.6974727    0.60202897      0.1535999
-                    0.012422224  0.5993025       0.96206
-
-                    [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3]
-                    1: (1,.,.) =
-                    0.21060324   0.576583        0.21633287
-                    0.1484059    0.2730577       0.25317845
-
-                    (2,.,.) =
-                    0.58513683   0.58095694      0.18811373
-                    0.7029449    0.41235915      0.44636542
-
-                    [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3]
-            }
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            (1,.,.) =
-            1.5258198       1.9536011       -1.8591263
-            -1.0618867      -0.751225       0.35412917
-
-            (2,.,.) =
-            1.3334517       -0.60312974     0.7324476
-            0.09502721      0.8094909       0.44807082
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            import numpy as np
-            from bigdl.dllib.keras.models import Sequential
-            from bigdl.dllib.keras.layers import GaussianSampler
-
-            model = Sequential()
-            model.add(GaussianSampler(input_shape=[(3,),(3,)]))
-            input1 = np.random.random([2, 3])
-            input2 = np.random.random([2, 3])
-            input = [input1, input2]
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            [[[0.79941342, 0.87462822, 0.9516901 ],
-            [0.20111287, 0.54634077, 0.83614511]],
-
-            [[0.31886989, 0.22829382, 0.84355419],
-            [0.51186641, 0.28043938, 0.29440057]]]
-
-        Output is
-
-        ..  code-block:: python
-
-            [[ 0.71405387  2.2944303  -0.41778684]
-            [ 0.84234     2.3337283  -0.18952972]]
-```
-
----
-### Exp
-Applies element-wise exp to the input.
-
-When you use this layer as the first layer of a model, you need to provide the argument inputShape (a Single Shape, does not include the batch dimension).
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            Exp(inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            Exp(input_shape=None, name=None)
-```
-
-**Parameters:**
-
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`MultiShape`](../keras-api-scala/#shape) object that consists of two identical Single Shape. For Python API, it should be a list of two identical shape tuple. Batch dimension should be excluded.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.keras.layers.Exp
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            model.add(Exp[Float](inputShape = Shape(2, 3, 4)))
-            val input = Tensor[Float](2, 2, 3, 4).randn()
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            (1,1,.,.) =
-            -1.5841372      -0.13795324     -2.144475       0.09272669
-            1.055668        -1.2310301      1.2145554       -0.6073714
-            0.9296467       0.2923885       1.3364213       0.1652137
-
-            (1,2,.,.) =
-            0.2099718       -0.3856573      -0.92586        -0.5317779
-            0.6618383       -0.9677452      -1.5014665      -0.35464883
-            2.045924        -0.317644       -1.812726       0.95438373
-
-            (2,1,.,.) =
-            -0.4536791      -0.34785584     1.6424289       -0.07981159
-            -0.8022624      -0.4211059      0.3461831       1.9598864
-            -0.84695745     -0.6115283      0.7729755       2.3077402
-
-            (2,2,.,.) =
-            -0.08438411     -0.908458       0.6688936       -0.7292123
-            -0.26337254     0.55425745      -0.14925817     -0.010179609
-            -0.62562865     -1.0517743      -0.23839666     -1.144982
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3x4]
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            (1,1,.,.) =
-            0.20512469      0.8711394       0.11712951      1.0971619
-            2.8738942       0.29199165      3.3687959       0.544781
-            2.533614        1.3396233       3.8054006       1.1796452
-
-            (1,2,.,.) =
-            1.2336433       0.6800035       0.39619055      0.5875594
-            1.9383523       0.37993878      0.22280318      0.7014197
-            7.7363033       0.7278619       0.16320862      2.5970695
-
-            (2,1,.,.) =
-            0.63528657      0.70620066      5.167706        0.92329025
-            0.44831353      0.6563206       1.4136615       7.0985208
-            0.42871734      0.5425211       2.1662023       10.051684
-
-            (2,2,.,.) =
-            0.9190782       0.4031454       1.9520763       0.48228875
-            0.76845556      1.740648        0.8613467       0.98987204
-            0.53492504      0.34931743      0.7878901       0.31822965
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3x4]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            import numpy as np
-            from bigdl.dllib.keras.models import Sequential
-            from bigdl.dllib.keras.layers import Exp
-
-            model = Sequential()
-            model.add(Exp(input_shape=(2, 3, 4)))
-            input = np.random.random([2, 2, 3, 4])
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            [[[[0.93104587 0.94000338 0.84870765 0.98645553]
-            [0.83708846 0.33375541 0.50119834 0.24879265]
-            [0.51966475 0.84514791 0.15496452 0.61538968]]
-
-            [[0.57250337 0.42520832 0.94850757 0.54317573]
-            [0.64228691 0.9904079  0.01008592 0.51365217]
-            [0.78640595 0.7717037  0.51277595 0.24245034]]]
-
-
-            [[[0.82184752 0.92537331 0.20632728 0.47539445]
-            [0.44604637 0.1507692  0.5437313  0.2074501 ]
-            [0.93661363 0.93962609 0.29230559 0.74850958]]
-
-            [[0.11659768 0.76177132 0.33194573 0.20695088]
-            [0.49636212 0.85987328 0.49767861 0.96774006]
-            [0.67669121 0.15542122 0.69981032 0.3349874 ]]]]
-
-        Output is
-
-        ..  code-block:: python
-
-            [[[[2.5371614 2.5599902 2.3366253 2.6817122]
-            [2.3096325 1.3962016 1.6506982 1.2824761]
-            [1.6814638 2.3283222 1.1676165 1.8503776]]
-
-            [[1.7726992 1.5299091 2.5818534 1.721465 ]
-            [1.9008229 2.6923325 1.010137  1.6713842]
-            [2.1954916 2.163449  1.6699204 1.2743679]]]
-
-
-            [[[2.2746985 2.52281   1.2291554 1.6086487]
-            [1.5621239 1.1627283 1.7224218 1.2305363]
-            [2.551327  2.5590243 1.3395122 2.1138473]]
-
-            [[1.1236672 2.1420672 1.3936772 1.2299222]
-            [1.6427343 2.3628614 1.6448984 2.6319895]
-            [1.9673574 1.16815   2.0133708 1.3979228]]]]
-
-```
-
----
-### Square
-Applies an element-wise square operation to the input.
-
-When you use this layer as the first layer of a model, you need to provide the argument inputShape (a Single Shape, does not include the batch dimension).
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            Square(inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            Square(input_shape=None, name=None)
-```
-
-**Parameters:**
-
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`MultiShape`](../keras-api-scala/#shape) object that consists of two identical Single Shape. For Python API, it should be a list of two identical shape tuple. Batch dimension should be excluded.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.keras.layers.Square
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            model.add(Square[Float](inputShape = Shape(2, 3, 4)))
-            val input = Tensor[Float](2, 2, 3, 4).randn()
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            (1,1,.,.) =
-            -0.108013034    1.8879265       1.2232096       -1.5076439
-            1.4895755       -0.37966672     -0.34892964     0.15224025
-            -0.9296686      -1.1523775      0.14153497      -0.26954007
-
-            (1,2,.,.) =
-            -1.0875931      2.190617        -0.6903083      1.0039362
-            -0.1275677      -1.1096588      0.37359753      -0.17367937
-            0.23349741      0.14639114      -0.2330162      0.5343827
-
-            (2,1,.,.) =
-            0.3222191       0.21463287      -1.0157064      -0.22627507
-            1.1714277       0.43371263      1.069315        0.5122436
-            0.1958086       -1.4601041      2.5394423       -0.470833
-
-            (2,2,.,.) =
-            -0.38708544     -0.951611       -0.37234613     0.26813275
-            1.9477026       0.32779223      -1.2308712      -2.2376378
-            0.19652915      0.3304719       -1.7674786      -0.86961496
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3x4]
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            (1,1,.,.) =
-            0.011666816     3.5642662       1.4962418       2.2729902
-            2.218835        0.14414681      0.1217519       0.023177093
-            0.86428374      1.3279738       0.020032147     0.07265185
-
-            (1,2,.,.) =
-            1.1828587       4.7988033       0.47652552      1.0078878
-            0.016273517     1.2313428       0.13957511      0.030164523
-            0.05452104      0.021430366     0.054296546     0.28556487
-
-            (2,1,.,.) =
-            0.10382515      0.046067268     1.0316595       0.05120041
-            1.3722429       0.18810664      1.1434345       0.26239353
-            0.038341008     2.131904        6.448767        0.22168371
-
-            (2,2,.,.) =
-            0.14983514      0.9055635       0.13864164      0.07189517
-            3.7935455       0.10744774      1.5150439       5.007023
-            0.038623706     0.109211676     3.1239805       0.7562302
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3x4]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            import numpy as np
-            from bigdl.dllib.keras.models import Sequential
-            from bigdl.dllib.keras.layers import Square
-
-            model = Sequential()
-            model.add(Square(input_shape=(2, 3, 4)))
-            input = np.random.random([2, 2, 3, 4])
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            [[[[0.8708819  0.2698243  0.55854849 0.71699472]
-            [0.66647234 0.72310216 0.8082119  0.66566951]
-            [0.6714764  0.61394108 0.35063125 0.60473593]]
-
-            [[0.37993365 0.64222557 0.96762005 0.18931697]
-            [0.00529722 0.99133455 0.09786619 0.28988077]
-            [0.60052911 0.83712995 0.59847519 0.54361243]]]
-
-
-            [[[0.32832672 0.83316023 0.41272485 0.01963383]
-            [0.89593955 0.73433713 0.67529323 0.69711912]
-            [0.81251711 0.56755577 0.31958151 0.09795917]]
-
-            [[0.46465895 0.22818875 0.31505317 0.41912166]
-            [0.87865447 0.3799063  0.091204   0.68144165]
-            [0.88274284 0.70479132 0.32074672 0.71771481]]]]
-
-        Output is
-
-        ..  code-block:: python
-
-            [[[[7.5843531e-01 7.2805151e-02 3.1197643e-01 5.1408142e-01]
-            [4.4418535e-01 5.2287674e-01 6.5320653e-01 4.4311589e-01]
-            [4.5088059e-01 3.7692365e-01 1.2294226e-01 3.6570552e-01]]
-
-            [[1.4434958e-01 4.1245368e-01 9.3628860e-01 3.5840917e-02]
-            [2.8060573e-05 9.8274422e-01 9.5777912e-03 8.4030852e-02]
-            [3.6063525e-01 7.0078653e-01 3.5817260e-01 2.9551446e-01]]]
-
-
-            [[[1.0779844e-01 6.9415593e-01 1.7034180e-01 3.8548734e-04]
-            [8.0270761e-01 5.3925103e-01 4.5602092e-01 4.8597506e-01]
-            [6.6018403e-01 3.2211956e-01 1.0213234e-01 9.5959986e-03]]
-
-            [[2.1590793e-01 5.2070107e-02 9.9258497e-02 1.7566296e-01]
-            [7.7203369e-01 1.4432879e-01 8.3181690e-03 4.6436274e-01]
-            [7.7923489e-01 4.9673077e-01 1.0287846e-01 5.1511449e-01]]]]
-
-```
-
----
-### Power
-Applies an element-wise power operation with scale and shift to the input.
-
-f(x) = (shift + scale * x)^power^
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            Power(power, scale = 1, shift = 0, inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            Power(power, scale=1, shift=0, input_shape=None, name=None)
-```
-
-**Parameters:**
-
-* `power`: The exponent
-* `scale`: The scale parameter. Default is 1.
-* `shift`: The shift parameter. Default is 0.
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`Shape`](../keras-api-scala/#shape) object. For Python API, it should be a shape tuple. Batch dimension should be excluded.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.keras.layers.Power
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            model.add(Power[Float](2, inputShape = Shape(2, 3)))
-            val input = Tensor[Float](2, 2, 3).rand()
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            (1,.,.) =
-            0.24691099      0.7588585       0.5785183
-            0.10356348      0.2252714       0.3129436
-
-            (2,.,.) =
-            0.6277785       0.75136995      0.044648796
-            0.46396527      0.9793776       0.92727077
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3]
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            (1,.,.) =
-            0.060965035     0.5758662       0.3346834
-            0.010725395     0.050747205     0.0979337
-
-            (2,.,.) =
-            0.39410582      0.5645568       0.001993515
-            0.21526377      0.95918053      0.8598311
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            from bigdl.dllib.keras.layers import Power
-            from bigdl.dllib.keras.models import Sequential
-            import numpy as np
-
-            model = Sequential()
-            model.add(Power(2, input_shape=(2, 3)))
-            input = np.random.rand(2, 2, 3)
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            array([[[0.5300817 , 0.18128031, 0.19534253],
-                    [0.28380639, 0.78365165, 0.6893    ]],
-
-                [[0.05574091, 0.400077  , 0.77051193],
-                    [0.033559  , 0.61051396, 0.13970227]]])
-
-        Output is
-
-        ..  code-block:: python
-
-            array([[[0.2809866 , 0.03286255, 0.03815871],
-                    [0.08054607, 0.61410993, 0.4751345 ]],
-
-                [[0.00310705, 0.16006161, 0.5936886 ],
-                    [0.00112621, 0.37272733, 0.01951673]]], dtype=float32)
-```
-
----
-### AddConstant
-Add a (non-learnable) scalar constant to the input.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            AddConstant(constant, inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            AddConstant(constant, input_shape=None, name=None)
-```
-
-**Parameters:**
-
-* `constant`: The scalar constant to be added.
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`Shape`](../keras-api-scala/#shape) object. For Python API, it should be a shape tuple. Batch dimension should be excluded.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.keras.layers.AddConstant
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            model.add(AddConstant[Float](1, inputShape = Shape(2, 3)))
-            val input = Tensor[Float](2, 2, 3).rand()
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            (1,.,.) =
-            0.5658301       0.3508225       0.4012322
-            0.1941942       0.18934165      0.6909284
-
-            (2,.,.) =
-            0.5985211       0.5485885       0.778548
-            0.16745302      0.10363362      0.92185616
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3]
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            (1,.,.) =
-            1.5658301       1.3508224       1.4012322
-            1.1941942       1.1893417       1.6909285
-
-            (2,.,.) =
-            1.5985211       1.5485885       1.778548
-            1.167453        1.1036336       1.9218562
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            from bigdl.dllib.keras.layers import AddConstant
-            from bigdl.dllib.keras.models import Sequential
-            import numpy as np
-
-            model = Sequential()
-            model.add(AddConstant(1, input_shape=(2, 3)))
-            input = np.random.rand(2, 2, 3)
-            output = model.forward(input)
-
-
-        Input is:
-
-        ..  code-block:: python
-
-            array([[[0.71730919, 0.07752598, 0.10448237],
-                    [0.52319608, 0.38668494, 0.19588814]],
-
-                [[0.15496092, 0.48405899, 0.41441248],
-                    [0.13792111, 0.7523953 , 0.55991187]]])
-
-        Output is
-
-        ..  code-block:: python
-
-            array([[[1.7173092, 1.077526 , 1.1044824],
-                    [1.5231961, 1.3866849, 1.1958882]],
-
-                [[1.1549609, 1.484059 , 1.4144125],
-                    [1.1379211, 1.7523953, 1.5599118]]], dtype=float32)
-```
-
----
-### Narrow
-Narrow the input with the number of dimensions not being reduced.
-
-The batch dimension needs to be unchanged.
-
-For example, if input is:
-
-[[1 2 3],
- [4 5 6]]
-
-Narrow(1, 1, 2) will give output
-
-[[2 3],
- [5 6]]
-
-Narrow(1, 2, -1) will give output
-
-[3,
- 6]
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            Narrow(dim, offset, length = 1, inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            Narrow(dim, offset, length=1, input_shape=None, name=None)
-```
-
-**Parameters:**
-
-* `dim`: The dimension to narrow. 0-based index. Cannot narrow the batch dimension.
-         -1 means the last dimension of the input.
-* `offset`: Non-negative integer. The start index on the given dimension. 0-based index.
-* `length`: The length to narrow. Default is 1.
-            Can use a negative length such as -1 in the case where input size is unknown.
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`Shape`](../keras-api-scala/#shape) object. For Python API, it should be a shape tuple. Batch dimension should be excluded.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.keras.layers.Narrow
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            model.add(Narrow[Float](1, 1, inputShape = Shape(2, 3, 4)))
-            val input = Tensor[Float](2, 2, 3, 4).rand()
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            (1,1,.,.) =
-            0.13770224      0.63719153      0.7776689       0.46612367
-            0.9026256       0.11982094      0.8282868       0.05095969
-            0.889799        0.6386537       0.35438475      0.298043
-
-            (1,2,.,.) =
-            0.5029727       0.20103335      0.20150806      0.06437344
-            0.2255908       0.5388977       0.59737855      0.5210477
-            0.4055072       0.11848069      0.7118382       0.9796308
-
-            (2,1,.,.) =
-            0.63957494      0.1921936       0.7749439       0.19744827
-            0.91683346      0.16140814      0.9753973       0.8161283
-            0.8481694       0.8802563       0.1233245       0.5732614
-
-            (2,2,.,.) =
-            0.275001        0.35905758      0.15939762      0.09233412
-            0.16610192      0.032060683     0.37298614      0.48936844
-            0.031097537     0.82767457      0.10246291      0.9951448
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x3x4]
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            (1,1,.,.) =
-            0.5029727       0.20103335      0.20150806      0.06437344
-            0.2255908       0.5388977       0.59737855      0.5210477
-            0.4055072       0.11848069      0.7118382       0.9796308
-
-            (2,1,.,.) =
-            0.275001        0.35905758      0.15939762      0.09233412
-            0.16610192      0.032060683     0.37298614      0.48936844
-            0.031097537     0.82767457      0.10246291      0.9951448
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x1x3x4]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            from bigdl.dllib.keras.layers import Narrow
-            from bigdl.dllib.keras.models import Sequential
-            import numpy as np
-
-            model = Sequential()
-            model.add(Narrow(1, 1, input_shape=(2, 3, 4)))
-            input = np.random.rand(2, 2, 3, 4)
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            array([[[[0.74305305, 0.33925069, 0.31289333, 0.43703923],
-                    [0.28316902, 0.3004414 , 0.40298034, 0.37476436],
-                    [0.18825825, 0.38979411, 0.32963262, 0.37783457]],
-
-                    [[0.14824117, 0.43532988, 0.57077087, 0.91535978],
-                    [0.46375725, 0.90511296, 0.18859044, 0.92820822],
-                    [0.13675737, 0.48270908, 0.04260755, 0.97255687]]],
-                [[[0.4836805 , 0.45262542, 0.7233705 , 0.63486529],
-                    [0.07472717, 0.5715716 , 0.57029986, 0.26475783],
-                    [0.56757079, 0.27602746, 0.45799196, 0.74420842]],
-
-                    [[0.89048761, 0.08280716, 0.99030481, 0.35956427],
-                    [0.70802689, 0.14425212, 0.08320864, 0.82271697],
-                    [0.6915224 , 0.70490768, 0.41218963, 0.37024863]]]])
-
-        Output is
-
-        ..  code-block:: python
-
-            array([[[[0.14824118, 0.43532988, 0.57077086, 0.9153598 ],
-                    [0.46375725, 0.905113  , 0.18859044, 0.92820823],
-                    [0.13675737, 0.48270908, 0.04260755, 0.9725569 ]]],
-
-                [[[0.8904876 , 0.08280716, 0.9903048 , 0.35956427],
-                    [0.7080269 , 0.14425212, 0.08320864, 0.82271695],
-                    [0.6915224 , 0.70490766, 0.41218963, 0.37024862]]]],
-                dtype=float32)
-```
-
----
-### Permute
-Permutes the dimensions of the input according to a given pattern.
-
-Useful for connecting RNNs and convnets together.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            Permute(dims, inputShape = null)
-
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            Permute(dims, input_shape=None, name=None)
-```
-
-**Parameters:**
-
-* `dims`: Int array. Permutation pattern, does not include the batch dimension.
-          Indexing starts at 1.
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`Shape`](../keras-api-scala/#shape) object. For Python API, it should be a shape tuple. Batch dimension should be excluded.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.keras.layers.Permute
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential[Float]()
-            model.add(Permute[Float](Array(2, 1, 3), inputShape = Shape(2, 2, 3)))
-            val input = Tensor[Float](2, 2, 2, 3).rand()
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            (1,1,.,.) =
-            0.8451549       0.06361471      0.7324815
-            0.31086245      0.21210302      0.35112163
-
-            (1,2,.,.) =
-            0.61466074      0.50173014      0.8759959
-            0.19090249      0.671227        0.73089105
-            (2,1,.,.) =
-            0.47867084      0.9341955       0.063592255
-            0.24063066      0.502274        0.9114748
-            (2,2,.,.) =
-            0.93335986      0.25173688      0.88615775
-            0.5394321       0.330763        0.89036304
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x2x3]
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            (1,1,.,.) =
-            0.8451549       0.06361471      0.7324815
-            0.61466074      0.50173014      0.8759959
-
-            (1,2,.,.) =
-            0.31086245      0.21210302      0.35112163
-            0.19090249      0.671227        0.73089105
-            (2,1,.,.) =
-            0.47867084      0.9341955       0.063592255
-            0.93335986      0.25173688      0.88615775
-            (2,2,.,.) =
-            0.24063066      0.502274        0.9114748
-            0.5394321       0.330763        0.89036304
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x2x3]
-
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            from bigdl.dllib.keras.layers import Permute
-            from bigdl.dllib.keras.models import Sequential
-            import numpy as np
-
-            model = Sequential()
-            model.add(Permute((2, 1, 3), input_shape=(2, 2, 3)))
-            input = np.random.rand(2, 2, 2, 3)
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            array([[[[0.14016896, 0.7275626 , 0.79087092],
-                    [0.57259566, 0.97387138, 0.70001999]],
-
-                    [[0.9232002 , 0.07644555, 0.24705828],
-                    [0.17257354, 0.93951155, 0.46183983]]],
-                [[[0.79432476, 0.64299062, 0.33959594],
-                    [0.58608318, 0.338014  , 0.92602687]],
-
-                    [[0.32638575, 0.69032582, 0.25168083],
-                    [0.46813027, 0.95118373, 0.13145026]]]])
-
-        Output is:
-
-        ..  code-block:: python
-
-            array([[[[0.14016896, 0.7275626 , 0.7908709 ],
-                    [0.9232002 , 0.07644555, 0.24705827]],
-
-                    [[0.57259566, 0.97387135, 0.70002   ],
-                    [0.17257354, 0.93951154, 0.46183982]]],
-                [[[0.79432476, 0.64299065, 0.33959594],
-                    [0.32638577, 0.6903258 , 0.25168082]],
-                    [[0.5860832 , 0.338014  , 0.9260269 ],
-                    [0.46813026, 0.95118374, 0.13145027]]]], dtype=float32)
-
-```
----
-### ResizeBilinear
-Resize the input image with bilinear interpolation. The input image must be a float tensor with NHWC or NCHW layout.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala API
-
-        ..  code-block:: scala
-
-            ResizeBilinear(outputHeight, outputWidth, alignCorners = false, dimOrdering = "th", inputShape = null)
-
-    .. tab:: Python API
-
-        ..  code-block:: python
-
-            ResizeBilinear(output_height, output_width, align_corner=False, dim_ordering="th", input_shape=(2, 3, 5, 7), name=None)
-```
-
-**Parameters:**
-
-* `outputHeight`: output height
-* `outputWidth`: output width
-* `alignCorners`: align corner or not
-* `dimOrdering`: Format of input data. Either DataFormat.NCHW (dimOrdering='th') or DataFormat.NHWC (dimOrdering='tf'). Default is NCHW.
-* `inputShape`: Only need to specify this argument when you use this layer as the first layer of a model. For Scala API, it should be a [`Shape`](../keras-api-scala/#shape) object. For Python API, it should be a shape tuple. Batch dimension should be excluded.
-
-```eval_rst
-.. tabs::
-
-    .. tab:: Scala Example
-
-        ..  code-block:: scala
-
-            import com.intel.analytics.bigdl.dllib.keras.models.Sequential
-            import com.intel.analytics.bigdl.dllib.keras.layers.ResizeBilinear
-            import com.intel.analytics.bigdl.dllib.utils.Shape
-            import com.intel.analytics.bigdl.tensor.Tensor
-
-            val model = Sequential()
-            model.add(ResizeBilinear[Float](2, 3, inputShape = Shape(2, 3, 5)))
-            val input = Tensor[Float](2, 2, 3, 5).rand()
-            val output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: scala
-
-            input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-            (1,1,.,.) =
-            0.6991891       0.007127314     0.73871046      0.95916307      0.9433856
-            0.41275907      0.37573513      0.99193203      0.06930728      0.5922364
-            0.024281504     0.2592453       0.3898136       0.6635241       0.85888565
-
-            (1,2,.,.) =
-            0.38028112      0.43709648      0.62538666      0.8468501       0.6445014
-            0.45252413      0.48801896      0.59471387      0.013207023     0.3567462
-            0.85187584      0.49279585      0.7973665       0.81287366      0.07852263
-
-            (2,1,.,.) =
-            0.1452374       0.6140467       0.36384684      0.066476084     0.96101314
-            0.54862195      0.66091377      0.86857307      0.6844842       0.7368217
-            0.25342992      0.71737933      0.12789607      0.21691357      0.7543404
-
-            (2,2,.,.) =
-            0.79176855      0.1204049       0.58971256      0.115073755     0.10459962
-            0.5225398       0.742363        0.7612815       0.9881919       0.13359445
-            0.9026869       0.13972941      0.92064524      0.9435532       0.5502235
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of...
-
-        Output is:
-
-        ..  code-block:: scala
-
-            output: com.intel.analytics.bigdl.nn.abstractnn.Activity =
-            (1,1,.,.) =
-            0.6991891       0.4948494       0.9539039
-            0.21852028      0.5664119       0.48613077
-
-            (1,2,.,.) =
-            0.38028112      0.56262326      0.7794005
-            0.6522  0.6274959       0.34790504
-
-            (2,1,.,.) =
-            0.1452374       0.4472468       0.36465502
-            0.40102595      0.5618719       0.54899293
-
-            (2,2,.,.) =
-            0.79176855      0.43327665      0.111582376
-            0.71261334      0.70765764      0.75788474
-
-            [com.intel.analytics.bigdl.tensor.DenseTensor of size 2x2x2x3]
-
-
-    .. tab:: Python Example
-
-        ..  code-block:: python
-
-            from bigdl.dllib.keras.layers import ResizeBilinear
-            from bigdl.dllib.keras.models import Sequential
-            import numpy as np
-
-            model = Sequential()
-            model.add(ResizeBilinear(2, 3, input_shape=(2, 3, 5, 5)))
-            input = np.random.rand(2, 2, 3, 5, 5)
-            output = model.forward(input)
-
-        Input is:
-
-        ..  code-block:: python
-
-            array([[[[0.43790358, 0.41882914, 0.71929122, 0.19673119, 0.36950189],
-                    [0.38808651, 0.34287751, 0.34076998, 0.02581254, 0.42406155],
-                    [0.84648848, 0.18411068, 0.97545126, 0.5468195 , 0.32136674]],
-
-                    [[0.32965599, 0.06883324, 0.17350748, 0.01181338, 0.59180775],
-                    [0.24667588, 0.36422516, 0.59648387, 0.48699443, 0.32323264],
-                    [0.67661373, 0.58779956, 0.55286771, 0.59629101, 0.69727522]]],
-
-
-                [[[0.09462238, 0.35658325, 0.6787812 , 0.78676645, 0.99019452],
-                    [0.81501527, 0.13348641, 0.71749101, 0.40543351, 0.3959018 ],
-                    [0.608378  , 0.10531177, 0.78000335, 0.51679768, 0.65067605]],
-
-                    [[0.12074634, 0.92682843, 0.52227042, 0.98856558, 0.28105255],
-                    [0.78411841, 0.19625097, 0.83108171, 0.03777509, 0.15700493],
-                    [0.95528158, 0.94003855, 0.61092905, 0.68651048, 0.57563719]]]])
-
-        Output is
-
-        ..  code-block:: python
-
-            array([[[[0.43790358, 0.61913717, 0.2543214 ],
-                    [0.6172875 , 0.52657175, 0.3151154 ]],
-
-                    [[0.329656  , 0.13861606, 0.20514478],
-                    [0.46164483, 0.541788  , 0.5311798 ]]],
-
-
-                [[[0.09462238, 0.57138187, 0.8545758 ],
-                    [0.7116966 , 0.5389645 , 0.48184   ]],
-
-                    [[0.12074634, 0.6571231 , 0.752728  ],
-                    [0.86969995, 0.6700518 , 0.36353552]]]], dtype=float32)
-
-```
----
diff --git a/docs/readthedocs/source/doc/PythonAPI/DLlib/freeze.md b/docs/readthedocs/source/doc/PythonAPI/DLlib/freeze.md
deleted file mode 100644
index 28577490..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/DLlib/freeze.md
+++ /dev/null
@@ -1,108 +0,0 @@
-# Model Freeze
-
-To "freeze" a model means to exclude some layers of model from training.
-
-```scala
-model.freeze("layer1", "layer2")
-model.unFreeze("layer1", "layer2")
-```
-* The model can be "freezed" by calling ```freeze()```. If a model is freezed,
-its parameters(weight/bias, if exists) are not changed in training process.
-If model names are passed, then layers that match the given names will be freezed.
-* The whole model can be "unFreezed" by calling ```unFreeze()```.
-If model names are provided, then layers that match the given names will be unFreezed.
-* stop the input gradient of layers that match the given names. Their input gradient are not computed.
-And they will not contributed to the input gradient computation of layers that depend on them.
-
-Original model without "freeze"
-```scala
-val reshape = Reshape(Array(4)).inputs()
-val fc1 = Linear(4, 2).setName("fc1").inputs()
-val fc2 = Linear(4, 2).setName("fc2").inputs(reshape)
-val cadd_1 = CAddTable().setName("cadd").inputs(fc1, fc2)
-val output1_1 = ReLU().inputs(cadd_1)
-val output2_1 = Threshold(10.0).inputs(cadd_1)
-
-val model = Graph(Array(reshape, fc1), Array(output1_1, output2_1))
-
-val input = T(Tensor(T(0.1f, 0.2f, -0.3f, -0.4f)),
-  Tensor(T(0.5f, 0.4f, -0.2f, -0.1f)))
-val gradOutput = T(Tensor(T(1.0f, 2.0f)), Tensor(T(3.0f, 4.0f)))
-
-fc1.element.getParameters()._1.apply1(_ => 1.0f)
-fc2.element.getParameters()._1.apply1(_ => 2.0f)
-model.zeroGradParameters()
-println("output1: \n", model.forward(input))
-model.backward(input, gradOutput)
-model.updateParameters(1)
-println("fc2 weight \n", fc2.element.parameters()._1(0))
-```
-```
-(output1:
-, {
-	2: 0.0
-	   0.0
-	   [com.intel.analytics.bigdl.tensor.DenseTensor of size 2]
-	1: 2.8
-	   2.8
-	   [com.intel.analytics.bigdl.tensor.DenseTensor of size 2]
- })
-(fc2 weight
-,1.9	1.8	2.3	2.4
-1.8	1.6	2.6	2.8
-[com.intel.analytics.bigdl.tensor.DenseTensor of size 2x4])
-```
-
-"Freeze" ```fc2```, the parameters of ```fc2``` is not changed.
-```scala
-fc1.element.getParameters()._1.apply1(_ => 1.0f)
-fc2.element.getParameters()._1.apply1(_ => 2.0f)
-model.zeroGradParameters()
-model.freeze("fc2")
-println("output2: \n", model.forward(input))
-model.backward(input, gradOutput)
-model.updateParameters(1)
-println("fc2 weight \n", fc2.element.parameters()._1(0))
-```
-
-```
-(output2:
-, {
-	2: 0.0
-	   0.0
-	   [com.intel.analytics.bigdl.tensor.DenseTensor of size 2]
-	1: 2.8
-	   2.8
-	   [com.intel.analytics.bigdl.tensor.DenseTensor of size 2]
- })
-(fc2 weight
-,2.0	2.0	2.0	2.0
-2.0	2.0	2.0	2.0
-[com.intel.analytics.bigdl.tensor.DenseTensor of size 2x4])
-```
-"unFreeze" ```fc2```, the parameters of ```fc2``` will be updated.
-```scala
-fc1.element.getParameters()._1.apply1(_ => 1.0f)
-fc2.element.getParameters()._1.apply1(_ => 2.0f)
-model.zeroGradParameters()
-model.unFreeze()
-println("output3: \n", model.forward(input))
-model.backward(input, gradOutput)
-model.updateParameters(1)
-println("fc2 weight \n", fc2.element.parameters()._1(0))
-```
-```
-(output3:
-, {
-	2: 0.0
-	   0.0
-	   [com.intel.analytics.bigdl.tensor.DenseTensor of size 2]
-	1: 2.8
-	   2.8
-	   [com.intel.analytics.bigdl.tensor.DenseTensor of size 2]
- })
-(fc2 weight
-,1.9	1.8	2.3	2.4
-1.8	1.6	2.6	2.8
-[com.intel.analytics.bigdl.tensor.DenseTensor of size 2x4])
-```
diff --git a/docs/readthedocs/source/doc/PythonAPI/DLlib/index.rst b/docs/readthedocs/source/doc/PythonAPI/DLlib/index.rst
deleted file mode 100644
index df15fdd0..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/DLlib/index.rst
+++ /dev/null
@@ -1,13 +0,0 @@
-DLlib API
-==================
-
-.. toctree::
-    :maxdepth: 1
-
-    model.rst
-    core_layers.md
-    optim-Methods.md
-    regularizers.md
-    learningrate-Scheduler.md
-    freeze.md
-    clipping.md
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/PythonAPI/DLlib/learningrate-Scheduler.md b/docs/readthedocs/source/doc/PythonAPI/DLlib/learningrate-Scheduler.md
deleted file mode 100644
index 06e1a3e5..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/DLlib/learningrate-Scheduler.md
+++ /dev/null
@@ -1,441 +0,0 @@
-# Learning Rate Scheduler
-
---------
-
-
-## Poly ##
-
-**Scala:**
-```scala
-val lrScheduler = Poly(power=0.5, maxIteration=1000)
-```
-**Python:**
-```python
-lr_scheduler = Poly(power=0.5, max_iteration=1000, bigdl_type="float")
-```
-
-A learning rate decay policy, where the effective learning rate follows a polynomial decay, to be zero by the max_iteration. Calculation: base_lr (1 - iter/maxIteration) `^` (power)
-
- `power` coeffient of decay, refer to calculation formula
-
- `maxIteration` max iteration when lr becomes zero
-
-**Scala example:**
-```scala
-import com.intel.analytics.bigdl.dllib.optim.SGD._
-import com.intel.analytics.bigdl.dllib.optim._
-import com.intel.analytics.bigdl.dllib.tensor.{Storage, Tensor}
-import com.intel.analytics.bigdl.dllib.tensor.TensorNumericMath.TensorNumeric.NumericFloat
-import com.intel.analytics.bigdl.dllib.utils.T
-
-val optimMethod = new SGD[Double](0.1)
-optimMethod.learningRateSchedule = Poly(3, 100)
-def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
-  return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
-}
-val x = Tensor[Double](Storage(Array(10.0, 10.0)))
-optimMethod.optimize(feval, x)
-> print(optimMethod.learningRateSchedule.currentRate)
--0.1
-optimMethod.optimize(feval, x)
-> print(optimMethod.learningRateSchedule.currentRate)
--0.0970299
-```
-**Python example:**
-```python
-optim_method = SGD(0.1)
-optimMethod.learningRateSchedule = Poly(3, 100)
-```
-
-## Default ##
-
-It is the default learning rate schedule. For each iteration, the learning rate would update with the following formula:
- l_{n + 1} = l / (1 + n * learning_rate_decay) where `l` is the initial learning rate
-
-**Scala:**
-```scala
-val lrScheduler = Default()
-```
-**Python:**
-```python
-lr_scheduler = Default()
-```
-
-**Scala example:**
-```scala
-val optimMethod = new SGD[Double](0.1, 0.1)
-def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
-  return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
-}
-val x = Tensor[Double](Storage(Array(10.0, 10.0)))
-optimMethod.optimize(feval, x)
-> print(optimMethod.learningRateSchedule.currentRate)
--0.1
-optimMethod.optimize(feval, x)
-> print(optimMethod.learningRateSchedule.currentRate)
--0.09090909090909091
-optimMethod.optimize(feval, x)
-> print(optimMethod.learningRateSchedule.currentRate)
--0.08333333333333334
-```
-
-**Python example:**
-```python
-optimMethod = SGD(leaningrate_schedule=Default())
-```
-
-## NaturalExp ##
-
-A learning rate schedule, which rescale the learning rate by exp ( -decay_rate * iter / decay_step ) referring to tensorflow's learning rate decay # natural_exp_decay
-
- `decay_step` how often to apply decay
-
- `gamma` the decay rate. e.g. 0.96
-
-**Scala:**
-```scala
-val learningRateScheduler = NaturalExp(1, 1)
-```
-
-**Scala example:**
-```scala
-val optimMethod = new SGD[Double](0.1)
-optimMethod.learningRateSchedule = NaturalExp(1, 1)
-def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
-  (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
-}
-val x = Tensor[Double](Storage(Array(10.0, 10.0)))
-val state = T("epoch" -> 0, "evalCounter" -> 0)
-optimMethod.state = state
-optimMethod.optimize(feval, x)
-> print(optimMethod.learningRateSchedule.currentRate)
--0.1
-
-optimMethod.optimize(feval, x)
-> print(optimMethod.learningRateSchedule.currentRate)
--0.036787944117144235
-
-optimMethod.optimize(feval, x)
-> print(optimMethod.learningRateSchedule.currentRate)
--0.013533528323661271
-```
-
-## Exponential ##
-
-A learning rate schedule, which rescale the learning rate by lr_{n + 1} = lr * decayRate `^` (iter / decayStep)
-
- `decayStep` the inteval for lr decay
-
- `decayRate` decay rate
-
- `stairCase` if true, iter / decayStep is an integer division and the decayed learning rate follows a staircase function.
-
-**Scala:**
-```scala
-val learningRateSchedule = Exponential(10, 0.96)
-```
-
-**Python:**
-```python
-exponential = Exponential(100, 0.1)
-```
-
-**Scala example:**
-```scala
-val optimMethod = new SGD[Double](0.05)
-optimMethod.learningRateSchedule = Exponential(10, 0.96)
-def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
-  (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
-}
-val x = Tensor[Double](Storage(Array(10.0, 10.0)))
-val state = T("epoch" -> 0, "evalCounter" -> 0)
-optimMethod.state = state
-optimMethod.optimize(feval, x)
-> print(optimMethod.learningRateSchedule.currentRate)
--0.05
-
-optimMethod.optimize(feval, x)
-> print(optimMethod.learningRateSchedule.currentRate)
--0.049796306069892535
-```
-
-**Python example:**
-```python
-optimMethod = SGD(leaningrate_schedule=Exponential(100, 0.1))
-```
-
-## Plateau ##
-
-Plateau is the learning rate schedule when a metric has stopped improving. Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. It monitors a quantity and if no improvement is seen for a 'patience' number of epochs, the learning rate is reduced.
-
- `monitor` quantity to be monitored, can be Loss or score
-
- `factor` factor by which the learning rate will be reduced. new_lr = lr * factor
-
- `patience` number of epochs with no improvement after which learning rate will be reduced.
-
- `mode` one of {min, max}. In min mode, lr will be reduced when the quantity monitored has stopped decreasing;
- in max mode it will be reduced when the quantity monitored has stopped increasing
-
- `epsilon` threshold for measuring the new optimum, to only focus on significant changes.
-
- `cooldown` number of epochs to wait before resuming normal operation after lr has been reduced.
-
- `minLr` lower bound on the learning rate.
-
-**Scala:**
-```scala
-val learningRateSchedule = Plateau(monitor="score", factor=0.1, patience=10, mode="min", epsilon=1e-4f, cooldown=0, minLr=0)
-```
-
-**Python:**
-```python
-plateau = Plateau("score", factor=0.1, patience=10, mode="min", epsilon=1e-4, cooldown=0, minLr=0)
-```
-
-**Scala example:**
-```scala
-val optimMethod = new SGD[Double](0.05)
-optimMethod.learningRateSchedule = Plateau(monitor="score", factor=0.1, patience=10, mode="min", epsilon=1e-4f, cooldown=0, minLr=0)
-def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
-  (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
-}
-val x = Tensor[Double](Storage(Array(10.0, 10.0)))
-val state = T("epoch" -> 0, "evalCounter" -> 0)
-optimMethod.state = state
-optimMethod.optimize(feval, x)
-> print(optimMethod.learningRateSchedule.currentRate)
-
-
-optimMethod.optimize(feval, x)
-> print(optimMethod.learningRateSchedule.currentRate)
-
-```
-
-**Python example:**
-```python
-optimMethod = SGD(leaningrate_schedule=Plateau("score"))
-```
-
-## Warmup ##
-
-A learning rate gradual increase policy, where the effective learning rate increase delta after each iteration. Calculation: base_lr + delta * iteration
-
- `delta` increase amount after each iteration
-
-**Scala:**
-```scala
-val learningRateSchedule = Warmup(delta = 0.05)
-```
-
-**Python:**
-```python
-warmup = Warmup(delta=0.05)
-```
-
-**Scala example:**
-```scala
-val lrSchedules = new SequentialSchedule(100)
-lrSchedules.add(Warmup(0.3), 3).add(Poly(3, 100), 100)
-val optimMethod = new SGD[Double](learningRate = 0.1, learningRateSchedule = lrSchedules)
-
-def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
-  return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
-}
-val x = Tensor[Double](Storage(Array(10.0, 10.0)))
-optimMethod.optimize(feval, x)
-> print(optimMethod.learningRateSchedule.currentRate)
--0.1
-
-optimMethod.optimize(feval, x)
-> print(optimMethod.learningRateSchedule.currentRate)
--0.4
-```
-
-**Python example:**
-```python
-optimMethod = SGD(leaningrate_schedule=Warmup(0.05))
-```
-
-## SequentialSchedule ##
-
-A learning rate scheduler which can stack several learning rate schedulers.
-
- `iterationPerEpoch` iteration numbers per epoch
-
-**Scala:**
-```scala
-val learningRateSchedule = SequentialSchedule(iterationPerEpoch=100)
-```
-
-**Python:**
-```python
-sequentialSchedule = SequentialSchedule(iteration_per_epoch=5)
-```
-
-**Scala example:**
-```scala
-val lrSchedules = new SequentialSchedule(100)
-lrSchedules.add(Warmup(0.3), 3).add(Poly(3, 100), 100)
-val optimMethod = new SGD[Double](learningRate = 0.1, learningRateSchedule = lrSchedules)
-
-def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
-  return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
-}
-val x = Tensor[Double](Storage(Array(10.0, 10.0)))
-optimMethod.optimize(feval, x)
-> print(optimMethod.learningRateSchedule.currentRate)
--0.1
-
-optimMethod.optimize(feval, x)
-> print(optimMethod.learningRateSchedule.currentRate)
--0.4
-
-optimMethod.optimize(feval, x)
-> print(optimMethod.learningRateSchedule.currentRate)
--0.7
-
-optimMethod.optimize(feval, x)
-> print(optimMethod.learningRateSchedule.currentRate)
--1.0
-
-optimMethod.optimize(feval, x)
-> print(optimMethod.learningRateSchedule.currentRate)
--0.9702989999999999
-```
-
-**Python example:**
-```python
-sequentialSchedule = SequentialSchedule(5)
-poly = Poly(0.5, 2)
-sequentialSchedule.add(poly, 5)
-```
-
-## EpochDecay ##
-
-**Scala:**
-```scala
-def decay(epoch: Int): Double =
-  if (epoch >= 1) 2.0 else if (epoch >= 2) 1.0 else 0.0
-
-val learningRateSchedule = EpochDecay(decay)
-```
-
-It is an epoch decay learning rate schedule. The learning rate decays through a function argument on number of run epochs l_{n + 1} = l_{n} * 0.1 `^` decayType(epoch)
-
- `decayType` is a function with number of run epochs as the argument
-
-**Scala example:**
-```scala
-def decay(epoch: Int): Double =
-  if (epoch == 1) 2.0 else if (epoch == 2) 1.0 else 0.0
-
-val optimMethod = new SGD[Double](1000)
-optimMethod.learningRateSchedule = EpochDecay(decay)
-def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
-  return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
-}
-val x = Tensor[Double](Storage(Array(10.0, 10.0)))
-val state = T("epoch" -> 0)
-for(e <- 1 to 3) {
-  state("epoch") = e
-  optimMethod.state = state
-  optimMethod.optimize(feval, x)
-  if(e <= 1) {
-    assert(optimMethod.learningRateSchedule.currentRate==10)
-  } else if (e <= 2) {
-    assert(optimMethod.learningRateSchedule.currentRate==100)
-  } else {
-    assert(optimMethod.learningRateSchedule.currentRate==1000)
-  }
-}
-```
-
-## Regime ##
-
-A structure to specify hyper parameters by start epoch and end epoch. Usually work with [[EpochSchedule]].
-
- `startEpoch` start epoch
-
- `endEpoch` end epoch
-
- `config` config table contains hyper parameters
-
-## EpochSchedule ##
-
-A learning rate schedule which configure the learning rate according to some pre-defined [[Regime]]. If the running epoch is within the interval of a regime `r` [r.startEpoch, r.endEpoch], then the learning
- rate will take the "learningRate" in r.config.
-
- `regimes` an array of pre-defined [[Regime]].
-
-**Scala:**
-```scala
-val regimes: Array[Regime] = Array(
-  Regime(1, 3, T("learningRate" -> 1e-2, "weightDecay" -> 2e-4)),
-  Regime(4, 7, T("learningRate" -> 5e-3, "weightDecay" -> 2e-4)),
-  Regime(8, 10, T("learningRate" -> 1e-3, "weightDecay" -> 0.0))
-)
-val learningRateScheduler = EpochSchedule(regimes)
-```
-
-**Scala example:**
-```scala
-val regimes: Array[Regime] = Array(
-  Regime(1, 3, T("learningRate" -> 1e-2, "weightDecay" -> 2e-4)),
-  Regime(4, 7, T("learningRate" -> 5e-3, "weightDecay" -> 2e-4)),
-  Regime(8, 10, T("learningRate" -> 1e-3, "weightDecay" -> 0.0))
-)
-
-val state = T("epoch" -> 0)
-val optimMethod = new SGD[Double](0.1)
-optimMethod.learningRateSchedule = EpochSchedule(regimes)
-def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
-  return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
-}
-val x = Tensor[Double](Storage(Array(10.0, 10.0)))
-for(e <- 1 to 10) {
-  state("epoch") = e
-  optimMethod.state = state
-  optimMethod.optimize(feval, x)
-  if(e <= 3) {
-    assert(optimMethod.learningRateSchedule.currentRate==-1e-2)
-    assert(optimMethod.weightDecay==2e-4)
-  } else if (e <= 7) {
-    assert(optimMethod.learningRateSchedule.currentRate==-5e-3)
-    assert(optimMethod.weightDecay==2e-4)
-  } else if (e <= 10) {
-    assert(optimMethod.learningRateSchedule.currentRate==-1e-3)
-    assert(optimMethod.weightDecay==0.0)
-  }
-}
-```
-
-## EpochStep ##
-
-A learning rate schedule which rescale the learning rate by `gamma` for each `stepSize` epochs.
-
- `stepSize` For how many epochs to update the learning rate once
-
- `gamma` the rescale factor
-
- **Scala:**
- ```scala
- val learningRateScheduler = EpochStep(1, 0.5)
- ```
-
- **Scala example:**
- ```scala
- val optimMethod = new SGD[Double](0.1)
- optimMethod.learningRateSchedule = EpochStep(1, 0.5)
- def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
-   (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
- }
- val x = Tensor[Double](Storage(Array(10.0, 10.0)))
- val state = T("epoch" -> 0)
- for(e <- 1 to 10) {
-   state("epoch") = e
-   optimMethod.state = state
-   optimMethod.optimize(feval, x)
-   assert(optimMethod.learningRateSchedule.currentRate==(-0.1 * Math.pow(0.5, e)))
- }
- ```
diff --git a/docs/readthedocs/source/doc/PythonAPI/DLlib/model.rst b/docs/readthedocs/source/doc/PythonAPI/DLlib/model.rst
deleted file mode 100644
index 0a7c2182..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/DLlib/model.rst
+++ /dev/null
@@ -1,17 +0,0 @@
-Model/Sequential
-==================
-
-dllib.keras.models.Model
----------------------------
-
-.. autoclass:: bigdl.dllib.keras.models.Model
-    :members:
-    :undoc-members:
-
-
-dllib.keras.models.Sequential
----------------------------
-
-.. autoclass:: bigdl.dllib.keras.models.Sequential
-    :members:
-    :undoc-members:
diff --git a/docs/readthedocs/source/doc/PythonAPI/DLlib/optim-Methods.md b/docs/readthedocs/source/doc/PythonAPI/DLlib/optim-Methods.md
deleted file mode 100644
index 305e085d..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/DLlib/optim-Methods.md
+++ /dev/null
@@ -1,380 +0,0 @@
-# Optimizer
-
---------
-
-## Adam ##
-
-**Scala:**
-```scala
-val optim = new Adam(learningRate=1e-3, learningRateDecay=0.0, beta1=0.9, beta2=0.999, Epsilon=1e-8)
-```
-**Python:**
-```python
-optim = Adam(learningrate=1e-3, learningrate_decay=0.0, beta1=0.9, beta2=0.999, epsilon=1e-8, bigdl_type="float")
-```
-
-An implementation of Adam optimization, first-order gradient-based optimization of stochastic  objective  functions. http://arxiv.org/pdf/1412.6980.pdf
-
- `learningRate` learning rate. Default value is 1e-3.
-
- `learningRateDecay` learning rate decay. Default value is 0.0.
-
- `beta1` first moment coefficient. Default value is 0.9.
-
- `beta2` second moment coefficient. Default value is 0.999.
-
- `Epsilon` for numerical stability. Default value is 1e-8.
-
-
-**Scala example:**
-```scala
-import com.intel.analytics.bigdl.dllib.optim._
-import com.intel.analytics.bigdl.dllib.tensor.Tensor
-import com.intel.analytics.bigdl.dllib.tensor.TensorNumericMath.TensorNumeric.NumericFloat
-import com.intel.analytics.bigdl.dllib.utils.T
-
-val optm = new Adam(learningRate=0.002)
-def rosenBrock(x: Tensor[Float]): (Float, Tensor[Float]) = {
-    // (1) compute f(x)
-    val d = x.size(1)
-
-    // x1 = x(i)
-    val x1 = Tensor[Float](d - 1).copy(x.narrow(1, 1, d - 1))
-    // x(i + 1) - x(i)^2
-    x1.cmul(x1).mul(-1).add(x.narrow(1, 2, d - 1))
-    // 100 * (x(i + 1) - x(i)^2)^2
-    x1.cmul(x1).mul(100)
-
-    // x0 = x(i)
-    val x0 = Tensor[Float](d - 1).copy(x.narrow(1, 1, d - 1))
-    // 1-x(i)
-    x0.mul(-1).add(1)
-    x0.cmul(x0)
-    // 100*(x(i+1) - x(i)^2)^2 + (1-x(i))^2
-    x1.add(x0)
-
-    val fout = x1.sum()
-
-    // (2) compute f(x)/dx
-    val dxout = Tensor[Float]().resizeAs(x).zero()
-    // df(1:D-1) = - 400*x(1:D-1).*(x(2:D)-x(1:D-1).^2) - 2*(1-x(1:D-1));
-    x1.copy(x.narrow(1, 1, d - 1))
-    x1.cmul(x1).mul(-1).add(x.narrow(1, 2, d - 1)).cmul(x.narrow(1, 1, d - 1)).mul(-400)
-    x0.copy(x.narrow(1, 1, d - 1)).mul(-1).add(1).mul(-2)
-    x1.add(x0)
-    dxout.narrow(1, 1, d - 1).copy(x1)
-
-    // df(2:D) = df(2:D) + 200*(x(2:D)-x(1:D-1).^2);
-    x0.copy(x.narrow(1, 1, d - 1))
-    x0.cmul(x0).mul(-1).add(x.narrow(1, 2, d - 1)).mul(200)
-    dxout.narrow(1, 2, d - 1).add(x0)
-
-    (fout, dxout)
-  }
-val x = Tensor(2).fill(0)
-> print(optm.optimize(rosenBrock, x))
-(0.0019999996
-0.0
-[com.intel.analytics.bigdl.tensor.DenseTensor$mcD$sp of size 2],[D@302d88d8)
-```
-**Python example:**
-```python
-optim_method = Adam(learningrate=0.002)
-
-optimizer = Optimizer(
-    model=mlp_model,
-    training_rdd=train_data,
-    criterion=ClassNLLCriterion(),
-    optim_method=optim_method,
-    end_trigger=MaxEpoch(20),
-    batch_size=32)
-
-```
-## SGD ##
-
-**Scala:**
-```scala
-val optimMethod = new SGD(learningRate= 1e-3,learningRateDecay=0.0,
-                      weightDecay=0.0,momentum=0.0,dampening=Double.MaxValue,
-                      nesterov=false,learningRateSchedule=Default(),
-                      learningRates=null,weightDecays=null)
-```
-
-**Python:**
-```python
-optim_method = SGD(learningrate=1e-3,learningrate_decay=0.0,weightdecay=0.0,
-                   momentum=0.0,dampening=DOUBLEMAX,nesterov=False,
-                   leaningrate_schedule=None,learningrates=None,
-                   weightdecays=None,bigdl_type="float")
-```
-
-A plain implementation of SGD which provides optimize method. After setting
-optimization method when create Optimize, Optimize will call optimization method at the end of
-each iteration.
-
-**Scala example:**
-```scala
-val optimMethod = new SGD[Float](learningRate= 1e-3,learningRateDecay=0.0,
-                               weightDecay=0.0,momentum=0.0,dampening=Double.MaxValue,
-                               nesterov=false,learningRateSchedule=Default(),
-                               learningRates=null,weightDecays=null)
-optimizer.setOptimMethod(optimMethod)
-```
-
-**Python example:**
-```python
-optim_method = SGD(learningrate=1e-3,learningrate_decay=0.0,weightdecay=0.0,
-                  momentum=0.0,dampening=DOUBLEMAX,nesterov=False,
-                  leaningrate_schedule=None,learningrates=None,
-                  weightdecays=None,bigdl_type="float")
-
-optimizer = Optimizer(
-    model=mlp_model,
-    training_rdd=train_data,
-    criterion=ClassNLLCriterion(),
-    optim_method=optim_method,
-    end_trigger=MaxEpoch(20),
-    batch_size=32)
-```
-
-## Adadelta ##
-
-
-*AdaDelta* implementation for *SGD*
-It has been proposed in `ADADELTA: An Adaptive Learning Rate Method`.
-http://arxiv.org/abs/1212.5701.
-
-**Scala:**
-```scala
-val optimMethod = new Adadelta(decayRate = 0.9, Epsilon = 1e-10)
-```
-**Python:**
-```python
-optim_method = AdaDelta(decayrate = 0.9, epsilon = 1e-10)
-```
-
-
-**Scala example:**
-```scala
-optimizer.setOptimMethod(new Adadelta(0.9, 1e-10))
-
-```
-
-
-**Python example:**
-```python
-optimizer = Optimizer(
-    model=mlp_model,
-    training_rdd=train_data,
-    criterion=ClassNLLCriterion(),
-    optim_method=Adadelta(0.9, 0.00001),
-    end_trigger=MaxEpoch(20),
-    batch_size=32)
-```
-
-## RMSprop ##
-
-An implementation of RMSprop (Reference: http://arxiv.org/pdf/1308.0850v5.pdf, Sec 4.2)
-
-* learningRate : learning rate
-* learningRateDecay : learning rate decay
-* decayRate : decayRate, also called rho
-* Epsilon : for numerical stability
-
-## Adamax ##
-
-An implementation of Adamax http://arxiv.org/pdf/1412.6980.pdf
-
-Arguments:
-
-* learningRate : learning rate
-* beta1 : first moment coefficient
-* beta2 : second moment coefficient
-* Epsilon : for numerical stability
-
-Returns:
-
-the new x vector and the function list {fx}, evaluated before the update
-
-## Adagrad ##
-
-**Scala:**
-```scala
-val adagrad = new Adagrad(learningRate = 1e-3,
-                          learningRateDecay = 0.0,
-                          weightDecay = 0.0)
-
-```
-
- An implementation of Adagrad. See the original paper:
- <http://jmlr.org/papers/volume12/duchi11a/duchi11a.pdf>
-
-**Scala example:**
-```scala
-import com.intel.analytics.bigdl.dllib.tensor.TensorNumericMath.TensorNumeric.NumericFloat
-import com.intel.analytics.bigdl.dllib.optim._
-import com.intel.analytics.bigdl.dllib.tensor._
-import com.intel.analytics.bigdl.dllib.utils.T
-
-val adagrad = new Adagrad(0.01, 0.0, 0.0)
-    def feval(x: Tensor[Float]): (Float, Tensor[Float]) = {
-      // (1) compute f(x)
-      val d = x.size(1)
-      // x1 = x(i)
-      val x1 = Tensor[Float](d - 1).copy(x.narrow(1, 1, d - 1))
-      // x(i + 1) - x(i)^2
-      x1.cmul(x1).mul(-1).add(x.narrow(1, 2, d - 1))
-      // 100 * (x(i + 1) - x(i)^2)^2
-      x1.cmul(x1).mul(100)
-      // x0 = x(i)
-      val x0 = Tensor[Float](d - 1).copy(x.narrow(1, 1, d - 1))
-      // 1-x(i)
-      x0.mul(-1).add(1)
-      x0.cmul(x0)
-      // 100*(x(i+1) - x(i)^2)^2 + (1-x(i))^2
-      x1.add(x0)
-      val fout = x1.sum()
-      // (2) compute f(x)/dx
-      val dxout = Tensor[Float]().resizeAs(x).zero()
-      // df(1:D-1) = - 400*x(1:D-1).*(x(2:D)-x(1:D-1).^2) - 2*(1-x(1:D-1));
-      x1.copy(x.narrow(1, 1, d - 1))
-      x1.cmul(x1).mul(-1).add(x.narrow(1, 2, d - 1)).cmul(x.narrow(1, 1, d - 1)).mul(-400)
-      x0.copy(x.narrow(1, 1, d - 1)).mul(-1).add(1).mul(-2)
-      x1.add(x0)
-      dxout.narrow(1, 1, d - 1).copy(x1)
-      // df(2:D) = df(2:D) + 200*(x(2:D)-x(1:D-1).^2);
-      x0.copy(x.narrow(1, 1, d - 1))
-      x0.cmul(x0).mul(-1).add(x.narrow(1, 2, d - 1)).mul(200)
-      dxout.narrow(1, 2, d - 1).add(x0)
-      (fout, dxout)
-    }
-val x = Tensor(2).fill(0)
-val config = T("learningRate" -> 1e-1)
-for (i <- 1 to 10) {
-  adagrad.optimize(feval, x, config, config)
-}
-x after optimize: 0.27779138
-0.07226955
-[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 2]
-```
-
-## LBFGS ##
-
-**Scala:**
-```scala
-val optimMethod = new LBFGS(maxIter=20, maxEval=Double.MaxValue,
-                            tolFun=1e-5, tolX=1e-9, nCorrection=100,
-                            learningRate=1.0, lineSearch=None, lineSearchOptions=None)
-```
-
-**Python:**
-```python
-optim_method = LBFGS(max_iter=20, max_eval=Double.MaxValue, \
-                 tol_fun=1e-5, tol_x=1e-9, n_correction=100, \
-                 learning_rate=1.0, line_search=None, line_search_options=None)
-```
-
-This implementation of L-BFGS relies on a user-provided line search function
-(state.lineSearch). If this function is not provided, then a simple learningRate
-is used to produce fixed size steps. Fixed size steps are much less costly than line
-searches, and can be useful for stochastic problems.
-
-The learning rate is used even when a line search is provided.This is also useful for
-large-scale stochastic problems, where opfunc is a noisy approximation of f(x). In that
-case, the learning rate allows a reduction of confidence in the step size.
-
-**Parameters:**
-
-* maxIter - Maximum number of iterations allowed. Default: 20
-* maxEval - Maximum number of function evaluations. Default: Double.MaxValue
-* tolFun - Termination tolerance on the first-order optimality. Default: 1e-5
-* tolX - Termination tol on progress in terms of func/param changes. Default: 1e-9
-* learningRate - the learning rate. Default: 1.0
-* lineSearch - A line search function. Default: None
-* lineSearchOptions - If no line search provided, then a fixed step size is used. Default: None
-
-**Scala example:**
-```scala
-val optimMethod = new LBFGS(maxIter=20, maxEval=Double.MaxValue,
-                            tolFun=1e-5, tolX=1e-9, nCorrection=100,
-                            learningRate=1.0, lineSearch=None, lineSearchOptions=None)
-optimizer.setOptimMethod(optimMethod)
-```
-
-**Python example:**
-```python
-optim_method = LBFGS(max_iter=20, max_eval=DOUBLEMAX, \
-                 tol_fun=1e-5, tol_x=1e-9, n_correction=100, \
-                 learning_rate=1.0, line_search=None, line_search_options=None)
-
-optimizer = Optimizer(
-    model=mlp_model,
-    training_rdd=train_data,
-    criterion=ClassNLLCriterion(),
-    optim_method=optim_method,
-    end_trigger=MaxEpoch(20),
-    batch_size=32)
-```
-
-## Ftrl ##
-
-**Scala:**
-```scala
-val optimMethod = new Ftrl(
-  learningRate = 1e-3, learningRatePower = -0.5,
-  initialAccumulatorValue = 0.1, l1RegularizationStrength = 0.0,
-  l2RegularizationStrength = 0.0, l2ShrinkageRegularizationStrength = 0.0)
-```
-
-**Python:**
-```python
-optim_method = Ftrl(learningrate = 1e-3, learningrate_power = -0.5, \
-                 initial_accumulator_value = 0.1, l1_regularization_strength = 0.0, \
-                 l2_regularization_strength = 0.0, l2_shrinkage_regularization_strength = 0.0)
-```
-
-An implementation of (Ftrl)[https://www.eecs.tufts.edu/~dsculley/papers/ad-click-prediction.pdf.]
-Support L1 penalty, L2 penalty and shrinkage-type L2 penalty.
-
-**Parameters:**
-
-* learningRate: learning rate
-* learningRatePower: double, must be less or equal to zero. Default is -0.5.
-* initialAccumulatorValue: double, the starting value for accumulators, require zero or positive values. Default is 0.1.
-* l1RegularizationStrength: double, must be greater or equal to zero. Default is zero.
-* l2RegularizationStrength: double, must be greater or equal to zero. Default is zero.
-* l2ShrinkageRegularizationStrength: double, must be greater or equal to zero. Default is zero. This differs from l2RegularizationStrength above. L2 above is a stabilization penalty, whereas this one is a magnitude penalty.
-
-**Scala example:**
-```scala
-val optimMethod = new Ftrl(learningRate = 5e-3, learningRatePower = -0.5,
-  initialAccumulatorValue = 0.01)
-optimizer.setOptimMethod(optimMethod)
-```
-
-**Python example:**
-```python
-optim_method = Ftrl(learningrate = 5e-3, \
-    learningrate_power = -0.5, \
-    initial_accumulator_value = 0.01)
-
-optimizer = Optimizer(
-    model=mlp_model,
-    training_rdd=train_data,
-    criterion=ClassNLLCriterion(),
-    optim_method=optim_method,
-    end_trigger=MaxEpoch(20),
-    batch_size=32)
-```
-
-## ParallelAdam ##
-Multi-Thread version of [Adam](#adam).
-
-**Scala:**
-```scala
-val optim = new ParallelAdam(learningRate=1e-3, learningRateDecay=0.0, beta1=0.9, beta2=0.999, Epsilon=1e-8, parallelNum=Engine.coreNumber())
-```
-**Python:**
-```python
-optim = ParallelAdam(learningrate=1e-3, learningrate_decay=0.0, beta1=0.9, beta2=0.999, epsilon=1e-8, parallel_num=get_node_and_core_number()[1], bigdl_type="float")
-```
diff --git a/docs/readthedocs/source/doc/PythonAPI/DLlib/regularizers.md b/docs/readthedocs/source/doc/PythonAPI/DLlib/regularizers.md
deleted file mode 100644
index 636f73c8..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/DLlib/regularizers.md
+++ /dev/null
@@ -1,270 +0,0 @@
-# Regularizer
-
---------
-
-## L1 Regularizer ##
-
-**Scala:**
-```scala
-val l1Regularizer = L1Regularizer(rate)
-```
-**Python:**
-```python
-regularizerl1 = L1Regularizer(rate)
-```
-
-L1 regularizer is used to add penalty to the gradWeight to avoid overfitting.
-
-In our code implementation, gradWeight = gradWeight + alpha * abs(weight)
-
-For more details, please refer to [wiki](https://en.wikipedia.org/wiki/Regularization_(mathematics)).
-
-**Scala example:**
-```scala
-
-import com.intel.analytics.bigdl.dllib.utils.RandomGenerator.RNG
-import com.intel.analytics.bigdl.dllib.tensor._
-import com.intel.analytics.bigdl.dllib.optim._
-import com.intel.analytics.bigdl.numeric.NumericFloat
-import com.intel.analytics.bigdl.dllib.nn._
-
-RNG.setSeed(100)
-
-val input = Tensor(3, 5).rand
-val gradOutput = Tensor(3, 5).rand
-val linear = Linear(5, 5, wRegularizer = L1Regularizer(0.2), bRegularizer = L1Regularizer(0.2))
-
-val output = linear.forward(input)
-val gradInput = linear.backward(input, gradOutput)
-
-scala> input
-input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-0.54340494      0.67115563      0.2783694       0.4120464       0.4245176
-0.52638245      0.84477615      0.14860484      0.004718862     0.15671109
-0.12156912      0.18646719      0.67074907      0.21010774      0.82585275
-[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]
-
-scala> gradOutput
-gradOutput: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-0.4527399       0.13670659      0.87014264      0.5750933       0.063681036
-0.89132196      0.62431186      0.20920213      0.52334774      0.18532822
-0.5622963       0.10837689      0.0058171963    0.21969749      0.3074232
-[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]
-
-scala> linear.gradWeight
-res2: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-0.9835552       1.3616763       0.83564335      0.108898684     0.59625006
-0.21608911      0.8393639       0.0035243928    -0.11795368     0.4453743
-0.38366735      0.9618148       0.47721142      0.5607486       0.6069793
-0.81469804      0.6690552       0.18522228      0.08559488      0.7075894
--0.030468717    0.056625083     0.051471338     0.2917061       0.109963015
-[com.intel.analytics.bigdl.tensor.DenseTensor of size 5x5]
-
-```
-
-**Python example:**
-```python
-
-from bigdl.dllib.nn.layer import *
-from bigdl.dllib.nn.criterion import *
-from bigdl.dllib.optim.optimizer import *
-from bigdl.dllib.util.common import *
-
-input = np.random.uniform(0, 1, (3, 5)).astype("float32")
-gradOutput = np.random.uniform(0, 1, (3, 5)).astype("float32")
-linear = Linear(5, 5, wRegularizer = L1Regularizer(0.2), bRegularizer = L1Regularizer(0.2))
-output = linear.forward(input)
-gradInput = linear.backward(input, gradOutput)
-
-> linear.parameters()
-{u'Linear@596d857b': {u'bias': array([ 0.3185505 , -0.02004393,  0.34620118, -0.09206461,  0.40776938], dtype=float32),
-  u'gradBias': array([ 2.14087653,  1.82181644,  1.90674937,  1.37307787,  0.81534696], dtype=float32),
-  u'gradWeight': array([[ 0.34909648,  0.85083449,  1.44904375,  0.90150446,  0.57136625],
-         [ 0.3745544 ,  0.42218602,  1.53656614,  1.1836741 ,  1.00702667],
-         [ 0.30529332,  0.26813674,  0.85559171,  0.61224306,  0.34721529],
-         [ 0.22859855,  0.8535381 ,  1.19809723,  1.37248564,  0.50041491],
-         [ 0.36197871,  0.03069445,  0.64837945,  0.12765063,  0.12872688]], dtype=float32),
-  u'weight': array([[-0.12423037,  0.35694697,  0.39038274, -0.34970999, -0.08283543],
-         [-0.4186025 , -0.33235055,  0.34948507,  0.39953214,  0.16294235],
-         [-0.25171402, -0.28955361, -0.32243955, -0.19771226, -0.29320192],
-         [-0.39263198,  0.37766701,  0.14673658,  0.24882999, -0.0779015 ],
-         [ 0.0323218 , -0.31266898,  0.31543773, -0.0898933 , -0.33485892]], dtype=float32)}}
-```
-
-
-
-
-## L2 Regularizer ##
-
-**Scala:**
-```scala
-val l2Regularizer = L2Regularizer(rate)
-```
-**Python:**
-```python
-regularizerl2 = L2Regularizer(rate)
-```
-
-L2 regularizer is used to add penalty to the gradWeight to avoid overfitting.
-
-In our code implementation, gradWeight = gradWeight + alpha * weight * weight
-
-For more details, please refer to [wiki](https://en.wikipedia.org/wiki/Regularization_(mathematics)).
-
-**Scala example:**
-```scala
-
-import com.intel.analytics.bigdl.dllib.utils.RandomGenerator.RNG
-import com.intel.analytics.bigdl.dllib.tensor._
-import com.intel.analytics.bigdl.dllib.optim._
-import com.intel.analytics.bigdl.numeric.NumericFloat
-import com.intel.analytics.bigdl.dllib.nn._
-
-RNG.setSeed(100)
-
-val input = Tensor(3, 5).rand
-val gradOutput = Tensor(3, 5).rand
-val linear = Linear(5, 5, wRegularizer = L2Regularizer(0.2), bRegularizer = L2Regularizer(0.2))
-
-val output = linear.forward(input)
-val gradInput = linear.backward(input, gradOutput)
-
-scala> input
-input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-0.54340494      0.67115563      0.2783694       0.4120464       0.4245176
-0.52638245      0.84477615      0.14860484      0.004718862     0.15671109
-0.12156912      0.18646719      0.67074907      0.21010774      0.82585275
-[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]
-
-scala> gradOutput
-gradOutput: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-0.4527399       0.13670659      0.87014264      0.5750933       0.063681036
-0.89132196      0.62431186      0.20920213      0.52334774      0.18532822
-0.5622963       0.10837689      0.0058171963    0.21969749      0.3074232
-[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]
-
-scala> linear.gradWeight
-res0: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-1.0329735       0.047239657     0.8979603       0.53614384      1.2781229
-0.5621818       0.29772854      0.69706535      0.30559152      0.8352279
-1.3044653       0.43065858      0.9896795       0.7435816       1.6003494
-0.94218314      0.6793372       0.97101355      0.62892824      1.3458569
-0.73134506      0.5975239       0.9109101       0.59374434      1.1656629
-[com.intel.analytics.bigdl.tensor.DenseTensor of size 5x5]
-
-```
-
-**Python example:**
-```python
-from bigdl.dllib.nn.layer import *
-from bigdl.dllib.nn.criterion import *
-from bigdl.dllib.optim.optimizer import *
-from bigdl.dllib.util.common import *
-
-input = np.random.uniform(0, 1, (3, 5)).astype("float32")
-gradOutput = np.random.uniform(0, 1, (3, 5)).astype("float32")
-linear = Linear(5, 5, wRegularizer = L2Regularizer(0.2), bRegularizer = L2Regularizer(0.2))
-output = linear.forward(input)
-gradInput = linear.backward(input, gradOutput)
-
-> linear.parameters()
-{u'Linear@787aab5e': {u'bias': array([-0.43960261, -0.12444571,  0.22857292, -0.43216187,  0.27770036], dtype=float32),
-  u'gradBias': array([ 0.51726723,  1.32883406,  0.57567948,  1.7791357 ,  1.2887038 ], dtype=float32),
-  u'gradWeight': array([[ 0.45477036,  0.22262168,  0.21923628,  0.26152173,  0.19836383],
-         [ 1.12261093,  0.72921795,  0.08405925,  0.78192139,  0.48798928],
-         [ 0.34581488,  0.21195598,  0.26357424,  0.18987852,  0.2465664 ],
-         [ 1.18659711,  1.11271608,  0.72589797,  1.19098675,  0.33769298],
-         [ 0.82314551,  0.71177536,  0.4428404 ,  0.764337  ,  0.3500182 ]], dtype=float32),
-  u'weight': array([[ 0.03727285, -0.39697152,  0.42733836, -0.34291714, -0.13833708],
-         [ 0.09232076, -0.09720675, -0.33625153,  0.06477787, -0.34739712],
-         [ 0.17145753,  0.10128133,  0.16679128, -0.33541158,  0.40437087],
-         [-0.03005157, -0.36412898,  0.0629965 ,  0.13443278, -0.38414535],
-         [-0.16630849,  0.06934392,  0.40328237,  0.22299488, -0.1178569 ]], dtype=float32)}}
-```
-
-## L1L2 Regularizer ##
-
-**Scala:**
-```scala
-val l1l2Regularizer = L1L2Regularizer(l1rate, l2rate)
-```
-**Python:**
-```python
-regularizerl1l2 = L1L2Regularizer(l1rate, l2rate)
-```
-
-L1L2 regularizer is used to add penalty to the gradWeight to avoid overfitting.
-
-In our code implementation, we will apply L1regularizer and L2regularizer sequentially.
-
-For more details, please refer to [wiki](https://en.wikipedia.org/wiki/Regularization_(mathematics)).
-
-**Scala example:**
-```scala
-
-import com.intel.analytics.bigdl.dllib.utils.RandomGenerator.RNG
-import com.intel.analytics.bigdl.dllib.tensor._
-import com.intel.analytics.bigdl.dllib.optim._
-import com.intel.analytics.bigdl.numeric.NumericFloat
-import com.intel.analytics.bigdl.dllib.nn._
-
-RNG.setSeed(100)
-
-val input = Tensor(3, 5).rand
-val gradOutput = Tensor(3, 5).rand
-val linear = Linear(5, 5, wRegularizer = L1L2Regularizer(0.2, 0.2), bRegularizer = L1L2Regularizer(0.2, 0.2))
-
-val output = linear.forward(input)
-val gradInput = linear.backward(input, gradOutput)
-
-scala> input
-input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-0.54340494      0.67115563      0.2783694       0.4120464       0.4245176
-0.52638245      0.84477615      0.14860484      0.004718862     0.15671109
-0.12156912      0.18646719      0.67074907      0.21010774      0.82585275
-[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]
-
-scala> gradOutput
-gradOutput: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-0.4527399       0.13670659      0.87014264      0.5750933       0.063681036
-0.89132196      0.62431186      0.20920213      0.52334774      0.18532822
-0.5622963       0.10837689      0.0058171963    0.21969749      0.3074232
-[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]
-
-scala> linear.gradWeight
-res1: com.intel.analytics.bigdl.tensor.Tensor[Float] =
-1.069174        1.4422078       0.8913989       0.042112567     0.53756505
-0.14077617      0.8959319       -0.030221784    -0.1583686      0.4690558
-0.37145022      0.99747723      0.5559263       0.58614403      0.66380215
-0.88983417      0.639738        0.14924419      0.027530536     0.71988696
--0.053217214    -8.643427E-4    -0.036953792    0.29753304      0.06567569
-[com.intel.analytics.bigdl.tensor.DenseTensor of size 5x5]
-```
-
-**Python example:**
-```python
-from bigdl.dllib.nn.layer import *
-from bigdl.dllib.nn.criterion import *
-from bigdl.dllib.optim.optimizer import *
-from bigdl.dllib.util.common import *
-
-input = np.random.uniform(0, 1, (3, 5)).astype("float32")
-gradOutput = np.random.uniform(0, 1, (3, 5)).astype("float32")
-linear = Linear(5, 5, wRegularizer = L1L2Regularizer(0.2, 0.2), bRegularizer = L1L2Regularizer(0.2, 0.2))
-output = linear.forward(input)
-gradInput = linear.backward(input, gradOutput)
-
-> linear.parameters()
-{u'Linear@1356aa91': {u'bias': array([-0.05799473, -0.0548001 ,  0.00408955, -0.22004321, -0.07143869], dtype=float32),
-  u'gradBias': array([ 0.89119786,  1.09953558,  1.03394508,  1.19511735,  2.02241182], dtype=float32),
-  u'gradWeight': array([[ 0.89061081,  0.58810186, -0.10087357,  0.19108151,  0.60029608],
-         [ 0.95275503,  0.2333075 ,  0.46897018,  0.74429053,  1.16038764],
-         [ 0.22894514,  0.60031962,  0.3836292 ,  0.15895618,  0.83136207],
-         [ 0.49079862,  0.80913013,  0.55491877,  0.69608945,  0.80458677],
-         [ 0.98890561,  0.49226439,  0.14861123,  1.37666655,  1.47615671]], dtype=float32),
-  u'weight': array([[ 0.44654208,  0.16320795, -0.36029238, -0.25365737, -0.41974261],
-         [ 0.18809238, -0.28065765,  0.27677274, -0.29904234,  0.41338971],
-         [-0.03731538,  0.22493915,  0.10021331, -0.19495697,  0.25470355],
-         [-0.30836752,  0.12083009,  0.3773002 ,  0.24059358, -0.40325543],
-         [-0.13601269, -0.39310011, -0.05292636,  0.20001481, -0.08444868]], dtype=float32)}}
-```
diff --git a/docs/readthedocs/source/doc/PythonAPI/Friesian/feature.rst b/docs/readthedocs/source/doc/PythonAPI/Friesian/feature.rst
deleted file mode 100644
index b382d03a..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/Friesian/feature.rst
+++ /dev/null
@@ -1,11 +0,0 @@
-Friesian Feature API
-=====================
-
-friesian.feature.table
----------------------------
-
-.. automodule:: bigdl.friesian.feature.table
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
diff --git a/docs/readthedocs/source/doc/PythonAPI/Friesian/index.rst b/docs/readthedocs/source/doc/PythonAPI/Friesian/index.rst
deleted file mode 100644
index 43ec676b..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/Friesian/index.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-Friesian API
-==================
-
-.. toctree::
-    :maxdepth: 2
-
-    feature.rst
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/PythonAPI/LLM/langchain.rst b/docs/readthedocs/source/doc/PythonAPI/LLM/langchain.rst
index 307c6d14..77add609 100644
--- a/docs/readthedocs/source/doc/PythonAPI/LLM/langchain.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/LLM/langchain.rst
@@ -37,7 +37,7 @@ For ``llama``/``chatglm``/``bloom``/``gptneox``/``starcoder`` model families, yo
 
     .. tab:: Llama
 
-        .. autoclass:: ipex_llm.langchain.llms.ipexllm.LlamaLLM
+        .. autoclass:: ipex_llm.langchain.llms.LlamaLLM
             :members:
             :undoc-members:
             :show-inheritance:
@@ -49,7 +49,7 @@ For ``llama``/``chatglm``/``bloom``/``gptneox``/``starcoder`` model families, yo
 
     .. tab:: ChatGLM
 
-        .. autoclass:: ipex_llm.langchain.llms.ipexllm.ChatGLMLLM
+        .. autoclass:: ipex_llm.langchain.llms.ChatGLMLLM
             :members:
             :undoc-members:
             :show-inheritance:
@@ -61,7 +61,7 @@ For ``llama``/``chatglm``/``bloom``/``gptneox``/``starcoder`` model families, yo
 
     .. tab:: Bloom
 
-        .. autoclass:: ipex_llm.langchain.llms.ipexllm.BloomLLM
+        .. autoclass:: ipex_llm.langchain.llms.BloomLLM
             :members:
             :undoc-members:
             :show-inheritance:
@@ -73,7 +73,7 @@ For ``llama``/``chatglm``/``bloom``/``gptneox``/``starcoder`` model families, yo
 
     .. tab:: Gptneox
 
-        .. autoclass:: ipex_llm.langchain.llms.ipexllm.GptneoxLLM
+        .. autoclass:: ipex_llm.langchain.llms.GptneoxLLM
             :members:
             :undoc-members:
             :show-inheritance:
@@ -85,7 +85,7 @@ For ``llama``/``chatglm``/``bloom``/``gptneox``/``starcoder`` model families, yo
 
     .. tab:: Starcoder
 
-        .. autoclass:: ipex_llm.langchain.llms.ipexllm.StarcoderLLM
+        .. autoclass:: ipex_llm.langchain.llms.StarcoderLLM
             :members:
             :undoc-members:
             :show-inheritance:
@@ -117,7 +117,7 @@ For ``llama``/``bloom``/``gptneox``/``starcoder`` model families, you could also
 
     .. tab:: Llama
 
-        .. autoclass:: ipex_llm.langchain.embeddings.ipexllm.LlamaEmbeddings
+        .. autoclass:: ipex_llm.langchain.embeddings.LlamaEmbeddings
             :members:
             :undoc-members:
             :show-inheritance:
@@ -129,7 +129,7 @@ For ``llama``/``bloom``/``gptneox``/``starcoder`` model families, you could also
 
     .. tab:: Bloom
 
-        .. autoclass:: ipex_llm.langchain.embeddings.ipexllm.BloomEmbeddings
+        .. autoclass:: ipex_llm.langchain.embeddings.BloomEmbeddings
             :members:
             :undoc-members:
             :show-inheritance:
@@ -141,7 +141,7 @@ For ``llama``/``bloom``/``gptneox``/``starcoder`` model families, you could also
 
     .. tab:: Gptneox
 
-        .. autoclass:: ipex_llm.langchain.embeddings.ipexllm.GptneoxEmbeddings
+        .. autoclass:: ipex_llm.langchain.embeddings.GptneoxEmbeddings
             :members:
             :undoc-members:
             :show-inheritance:
@@ -153,7 +153,7 @@ For ``llama``/``bloom``/``gptneox``/``starcoder`` model families, you could also
 
     .. tab:: Starcoder
 
-        .. autoclass:: ipex_llm.langchain.embeddings.ipexllm.StarcoderEmbeddings
+        .. autoclass:: ipex_llm.langchain.embeddings.StarcoderEmbeddings
             :members:
             :undoc-members:
             :show-inheritance:
diff --git a/docs/readthedocs/source/doc/PythonAPI/Nano/hpo_api.rst b/docs/readthedocs/source/doc/PythonAPI/Nano/hpo_api.rst
deleted file mode 100644
index e19d43ba..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/Nano/hpo_api.rst
+++ /dev/null
@@ -1,42 +0,0 @@
-Nano HPO API
-==================
-
-Search Space
----------------------------
-
-.. autoclass:: bigdl.nano.automl.hpo.space.Categorical
-
-.. autoclass:: bigdl.nano.automl.hpo.space.Real
-
-.. autoclass:: bigdl.nano.automl.hpo.space.Int
-
-.. autoclass:: bigdl.nano.automl.hpo.space.Bool
-
-
-HPO for Tensorflow
----------------------------
-
-bigdl.nano.automl.tf.keras.Model
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. autoclass:: bigdl.nano.automl.tf.keras.Model
-    :members: search, search_summary
-
-
-bigdl.nano.automl.tf.keras.Sequential
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. autoclass:: bigdl.nano.automl.tf.keras.Sequential
-    :members: search, search_summary
-
-
-HPO for PyTorch
----------------------------
-
-bigdl.nano.pytorch.Trainer
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. autoclass:: bigdl.nano.pytorch.Trainer
-    :members: search, search_summary
-    :undoc-members:
-
diff --git a/docs/readthedocs/source/doc/PythonAPI/Nano/index.rst b/docs/readthedocs/source/doc/PythonAPI/Nano/index.rst
deleted file mode 100644
index 4fefc0e0..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/Nano/index.rst
+++ /dev/null
@@ -1,19 +0,0 @@
-Nano API
-==================
-
-.. toctree::
-    :maxdepth: 2
-
-    pytorch.rst
-
-
-.. toctree::
-    :maxdepth: 2
-    
-    tensorflow.rst
-
-
-.. toctree::
-    :maxdepth: 3
-
-    hpo_api.rst
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/PythonAPI/Nano/pytorch.rst b/docs/readthedocs/source/doc/PythonAPI/Nano/pytorch.rst
deleted file mode 100644
index c267da73..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/Nano/pytorch.rst
+++ /dev/null
@@ -1,44 +0,0 @@
-Nano PyTorch API
-==================
-
-bigdl.nano.pytorch.Trainer
----------------------------
-
-.. autoclass:: bigdl.nano.pytorch.Trainer
-    :members:
-    :undoc-members:
-    :exclude-members: accelerator_connector, checkpoint_connector, reload_dataloaders_every_n_epochs, limit_val_batches, logger, logger_connector, state
-
-bigdl.nano.pytorch.InferenceOptimizer
----------------------------------------
-
-.. autoclass:: bigdl.nano.pytorch.InferenceOptimizer
-    :members:
-    :undoc-members:
-    :exclude-members: ALL_INFERENCE_ACCELERATION_METHOD, DEFAULT_INFERENCE_ACCELERATION_METHOD, method
-    :inherited-members:
-
-TorchNano API
----------------------------
-
-.. autoclass:: bigdl.nano.pytorch.TorchNano
-    :members:
-    :undoc-members:
-    :exclude-members: run
-
-.. autofunction:: bigdl.nano.pytorch.nano
-
-Patch API
----------------------------
-
-.. autofunction:: bigdl.nano.pytorch.patch_torch
-
-.. autofunction:: bigdl.nano.pytorch.unpatch_torch
-
-.. autofunction:: bigdl.nano.pytorch.patching.patch_cuda
-
-.. autofunction:: bigdl.nano.pytorch.patching.unpatch_cuda
-
-.. autofunction:: bigdl.nano.pytorch.patching.patch_dtype
-    
-.. autofunction:: bigdl.nano.pytorch.patching.patch_encryption
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/PythonAPI/Nano/tensorflow.rst b/docs/readthedocs/source/doc/PythonAPI/Nano/tensorflow.rst
deleted file mode 100644
index c484700a..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/Nano/tensorflow.rst
+++ /dev/null
@@ -1,41 +0,0 @@
-Nano Tensorflow API
-==================
-
-bigdl.nano.tf.keras
----------------------------
-
-.. autoclass:: bigdl.nano.tf.keras.Model
-    :members: fit, quantize, trace
-    :undoc-members:
-
-.. autoclass:: bigdl.nano.tf.keras.Sequential
-    :members:
-    :undoc-members:
-    :inherited-members: Sequential
-
-.. autoclass:: bigdl.nano.tf.keras.layers.Embedding
-    :members:
-    :undoc-members:
-
-bigdl.nano.tf.optimizers
----------------------------
-.. autoclass:: bigdl.nano.tf.optimizers.SparseAdam
-    :members: 
-    :undoc-members:
-
-bigdl.nano.tf.keras.InferenceOptimizer
----------------------------------------
-.. autoclass:: bigdl.nano.tf.keras.InferenceOptimizer
-    :members:
-    :undoc-members:
-    :exclude-members: ALL_INFERENCE_ACCELERATION_METHOD
-    :inherited-members:
-
-Patch API
----------------------------
-
-.. autofunction:: bigdl.nano.tf.patch_tensorflow
-
-.. autofunction:: bigdl.nano.tf.unpatch_tensorflow
-
-.. autofunction:: bigdl.nano.tf.keras.nano_bf16
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/PythonAPI/Orca/automl.rst b/docs/readthedocs/source/doc/PythonAPI/Orca/automl.rst
deleted file mode 100644
index f5f0cf95..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/Orca/automl.rst
+++ /dev/null
@@ -1,41 +0,0 @@
-Orca AutoML
-============================
-
-orca.automl.auto_estimator
----------------------------
-
-A general estimator supports automatic model tuning. It allows users to fit and search the best hyperparameter for their model.
-
-.. automodule:: bigdl.orca.automl.auto_estimator
-    :members:
-    :show-inheritance:
-
-
-orca.automl.hp
-----------------------------------------
-
-Sampling specs to be used in search space configuration.
-
-.. automodule:: bigdl.orca.automl.hp
-    :members:
-    :show-inheritance:
-
-orca.automl.metrics
-----------------------------
-
-Evaluate unscaled metrics between y true value and y predicted value.
-
-.. automodule:: bigdl.orca.automl.metrics
-    :members:
-    :show-inheritance:
-
-orca.automl.auto_xgb
----------------------------
-
-Automatic hyperparameter optimization for XGBoost models.
-
-AutoXGBoost is inherited from AutoEstimator. You could refer to `AutoEstimator API Guide <https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/AutoML/automl.html#orca-automl-auto-estimator>`__ for more APIs.
-
-.. automodule:: bigdl.orca.automl.xgboost.auto_xgb
-    :members:
-    :show-inheritance:
diff --git a/docs/readthedocs/source/doc/PythonAPI/Orca/context.rst b/docs/readthedocs/source/doc/PythonAPI/Orca/context.rst
deleted file mode 100644
index b0bc3687..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/Orca/context.rst
+++ /dev/null
@@ -1,15 +0,0 @@
-Orca Context
-=========
-
-orca.init_orca_context
--------------------------
-
-
-.. automodule:: bigdl.orca.common
-    :members: init_orca_context
-    :undoc-members:
-    :show-inheritance:
-
-
-
-
diff --git a/docs/readthedocs/source/doc/PythonAPI/Orca/data.rst b/docs/readthedocs/source/doc/PythonAPI/Orca/data.rst
deleted file mode 100644
index b4935178..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/Orca/data.rst
+++ /dev/null
@@ -1,20 +0,0 @@
-Orca Data
-=========
-
-orca.data.XShards
----------------------------
-
-.. autoclass:: bigdl.orca.data.XShards
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-
-orca.data.pandas
----------------------------
-
-.. automodule:: bigdl.orca.data.pandas.preprocessing
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
diff --git a/docs/readthedocs/source/doc/PythonAPI/Orca/index.rst b/docs/readthedocs/source/doc/PythonAPI/Orca/index.rst
deleted file mode 100644
index 80a9e1df..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/Orca/index.rst
+++ /dev/null
@@ -1,10 +0,0 @@
-Orca API
-==================
-
-.. toctree::
-    :maxdepth: 2
-
-    context.rst
-    data.rst
-    orca.rst
-    automl.rst
diff --git a/docs/readthedocs/source/doc/PythonAPI/Orca/orca.rst b/docs/readthedocs/source/doc/PythonAPI/Orca/orca.rst
deleted file mode 100644
index bc525446..00000000
--- a/docs/readthedocs/source/doc/PythonAPI/Orca/orca.rst
+++ /dev/null
@@ -1,87 +0,0 @@
-Orca Learn
-=========
-
-orca.learn.bigdl.estimator
----------------------------
-
-.. automodule:: bigdl.orca.learn.bigdl.estimator
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-
-orca.learn.tf.estimator
-------------------------
-
-.. automodule:: bigdl.orca.learn.tf.estimator
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-
-orca.learn.tf2.estimator
--------------------------
-
-.. automodule:: bigdl.orca.learn.tf2.estimator
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-
-orca.learn.tf2.tf2_ray_estimator
-^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Orca TF2Estimator with backend of "horovod" or "ray".
-
-.. autoclass:: bigdl.orca.learn.tf2.ray_estimator.TensorFlow2Estimator
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-
-orca.learn.tf2.tf2_spark_estimator
-^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Orca TF2Estimator with backend of "spark".
-
-.. autoclass:: bigdl.orca.learn.tf2.pyspark_estimator.SparkTFEstimator
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-
-orca.learn.pytorch.estimator
------------------------------
-
-.. automodule:: bigdl.orca.learn.pytorch.estimator
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-
-orca.learn.pytorch.pytorch_ray_estimator
-^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Orca Pytorch Estimator with backend of "horovod" or "ray".
-
-.. autoclass:: bigdl.orca.learn.pytorch.pytorch_ray_estimator.PyTorchRayEstimator
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-
-orca.learn.openvino.estimator
-------------------------------
-
-.. automodule:: bigdl.orca.learn.openvino.estimator
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-orca.learn.mpi.mpi_estimator
-------------------------------
-
-.. autoclass:: bigdl.orca.learn.mpi.MPIEstimator
-    :members:
-    :undoc-members:
-    :show-inheritance:
diff --git a/docs/readthedocs/source/index.rst b/docs/readthedocs/source/index.rst
index e22b0cf7..f664973a 100644
--- a/docs/readthedocs/source/index.rst
+++ b/docs/readthedocs/source/index.rst
@@ -2,14 +2,8 @@
    :google-site-verification: S66K6GAclKw1RroxU0Rka_2d1LZFVe27M0gRneEsIVI
 
 ################################################
-The IPEX-LLM Project
-################################################
-
-------
-
-************************************************
 IPEX-LLM
-************************************************
+################################################
 
 .. raw:: html
 
@@ -21,9 +15,9 @@ IPEX-LLM
 
    It is built on top of the excellent work of `llama.cpp <https://github.com/ggerganov/llama.cpp>`_, `gptq <https://github.com/IST-DASLab/gptq>`_, `bitsandbytes <https://github.com/TimDettmers/bitsandbytes>`_, `qlora <https://github.com/artidoro/qlora>`_, etc.
 
-============================================
+************************************************
 Latest update 🔥
-============================================
+************************************************
 - [2024/03] **LangChain** added support for ``ipex-llm``; see the details `here <https://python.langchain.com/docs/integrations/llms/ipex>`_.
 - [2024/02] ``ipex-llm`` now supports directly loading model from `ModelScope <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/ModelScope-Models>`_ (`魔搭 <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/ModelScope-Models>`_).
 - [2024/02] ``ipex-llm`` added inital **INT2** support (based on llama.cpp `IQ2 <https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF-IQ2>`_ mechanism), which makes it possible to run large-size LLM (e.g., Mixtral-8x7B) on Intel GPU with 16GB VRAM.
@@ -44,9 +38,9 @@ Latest update 🔥
 - [2023/09] ``ipex-llm`` `tutorial <https://github.com/intel-analytics/ipex-llm-tutorial>`_ is released.
 - Over 30 models have been verified on ``ipex-llm``, including *LLaMA/LLaMA2, ChatGLM2/ChatGLM3, Mistral, Falcon, MPT, LLaVA, WizardCoder, Dolly, Whisper, Baichuan/Baichuan2, InternLM, Skywork, QWen/Qwen-VL, Aquila, MOSS* and more; see the complete list `here <https://github.com/intel-analytics/ipex#verified-models>`_.
 
-============================================
+************************************************
 ``ipex-llm`` demos
-============================================
+************************************************
 
 See the **optimized performance** of ``chatglm2-6b`` and ``llama-2-13b-chat`` models on 12th Gen Intel Core CPU and Intel Arc GPU below.
 
@@ -79,9 +73,9 @@ See the **optimized performance** of ``chatglm2-6b`` and ``llama-2-13b-chat`` mo
       </tr>
    </table>
 
-============================================
+************************************************
 ``ipex-llm`` quickstart
-============================================
+************************************************
 
 - `Windows GPU installation <doc/LLM/Quickstart/install_windows_gpu.html>`_
 - `Run IPEX-LLM in Text-Generation-WebUI <doc/LLM/Quickstart/webui_quickstart.html>`_
@@ -89,9 +83,9 @@ See the **optimized performance** of ``chatglm2-6b`` and ``llama-2-13b-chat`` mo
 - `CPU quickstart <#cpu-quickstart>`_
 - `GPU quickstart <#gpu-quickstart>`_
 
---------------------------------------------
+============================================
 CPU Quickstart
---------------------------------------------
+============================================
 
 You may install ``ipex-llm`` on Intel CPU as follows as follows:
 
@@ -122,9 +116,9 @@ You can then apply INT4 optimizations to any Hugging Face *Transformers* models
    output_ids = model.generate(input_ids, ...)
    output = tokenizer.batch_decode(output_ids)
 
---------------------------------------------
+============================================
 GPU Quickstart
---------------------------------------------
+============================================
 
 You may install ``ipex-llm`` on Intel GPU as follows as follows: