diff --git a/README.md b/README.md
index e0009776..0bb7b964 100644
--- a/README.md
+++ b/README.md
@@ -18,6 +18,8 @@ BigDL makes it easy for data scientists and data engineers to build end-to-end,
  * [Chronos](#getting-started-with-chronos): scalable time series analysis using AutoML
  
  * [PPML](#ppml-privacy-preserving-machine-learning): privacy preserving big data analysis and machine learning (*experimental*)
+ 
+ * [Serving](#getting-started-with-serving): Distributed and Automated Model Inference on Big Data Streaming Frameworks
 
 For more information, you may [read the docs](https://bigdl.readthedocs.io/).
 
@@ -199,6 +201,13 @@ See the Chronos [user guide](https://bigdl.readthedocs.io/en/latest/doc/Chronos/
 
 See the [PPML user guide](https://bigdl.readthedocs.io/en/latest/doc/PPML/Overview/ppml.html) for more details. 
 
+## Getting Started with Serving
+BigDL Cluster Serving is an end-to-end pipeline to scale-out local applications. We recommend you to refer to following example to kick-off.
+
+[Migrate Keras application to Serving](https://bigdl.readthedocs.io/en/latest/doc/Serving/Example/keras-to-cluster-serving-example.ipynb)
+
+See the Serving [user guide](https://bigdl.readthedocs.io/en/latest/doc/Serving/Overview/serving.html) and [quickstart](https://bigdl.readthedocs.io/en/latest/doc/Serving/QuickStart/serving-quickstart.html) for more details.
+
 ## More information
 
 - [Document Website](https://bigdl.readthedocs.io/)
diff --git a/docs/readthedocs/source/conf.py b/docs/readthedocs/source/conf.py
index 2f44b320..ff627567 100644
--- a/docs/readthedocs/source/conf.py
+++ b/docs/readthedocs/source/conf.py
@@ -25,6 +25,8 @@ sys.path.insert(0, os.path.abspath("../../../python/friesian/src/"))
 sys.path.insert(0, os.path.abspath("../../../python/chronos/src/"))
 sys.path.insert(0, os.path.abspath("../../../python/dllib/src/"))
 sys.path.insert(0, os.path.abspath("../../../python/orca/src/"))
+sys.path.insert(0, os.path.abspath("../../../python/serving/src/"))
+
 
 
 # -- Project information -----------------------------------------------------
diff --git a/docs/readthedocs/source/doc/Serving/Example/example.md b/docs/readthedocs/source/doc/Serving/Example/example.md
index 94ec156b..743a7b1e 100644
--- a/docs/readthedocs/source/doc/Serving/Example/example.md
+++ b/docs/readthedocs/source/doc/Serving/Example/example.md
@@ -1,4 +1,4 @@
-# BigDL Cluster Serving Example
+# Cluster Serving Example
 
 There are some examples provided for new user or existing Tensorflow user.
 ## End-to-end Example
diff --git a/docs/readthedocs/source/doc/Serving/FAQ/contribute-guide.md b/docs/readthedocs/source/doc/Serving/FAQ/contribute-guide.md
new file mode 100644
index 00000000..99436daf
--- /dev/null
+++ b/docs/readthedocs/source/doc/Serving/FAQ/contribute-guide.md
@@ -0,0 +1,118 @@
+# Contribute to Cluster Serving
+
+This is the guide to contribute your code to Cluster Serving.
+
+Cluster Serving takes advantage of Analytics Zoo core with integration of Deep Learning Frameworks, e.g. Tensorflow, OpenVINO, PyTorch, and implements the inference logic on top of it, and parallelize the computation with Flink and Redis by default. To contribute more features to Cluster Serving, you could refer to following sections accordingly.
+## Dev Environment
+
+### Get Code and Prepare Branch
+Go to Analytics Zoo main repo https://github.com/intel-analytics/analytics-zoo, press Fork to your github repo, and git clone the forked repo to local. Use `git checkout -b your_branch_name` to create a new branch, and you could start to write code and pull request to Analytics Zoo from this branch.
+### Environment Set up
+You could refer to [Analytics Zoo Scala Developer Guide](https://analytics-zoo.readthedocs.io/en/latest/doc/UserGuide/develop.html#scala) to set up develop environment. Cluster Serving is an Analytics Zoo Scala module.
+
+### Debug in IDE
+Cluster Serving depends on Flink and Redis. To install Redis and start Redis server,
+```
+$ export REDIS_VERSION=5.0.5
+$ wget http://download.redis.io/releases/redis-${REDIS_VERSION}.tar.gz && \
+    tar xzf redis-${REDIS_VERSION}.tar.gz && \
+    rm redis-${REDIS_VERSION}.tar.gz && \
+    cd redis-${REDIS_VERSION} && \
+    make
+$ ./src/redis-server
+```
+in IDE, embedded Flink would be used so that no dependency is needed.
+
+Once set up, you could copy the `/path/to/analytics-zoo/scripts/cluster-serving/config.yaml` to `/path/to/analytics-zoo/config.yaml`, and run `zoo/src/main/scala/com/intel/analytics/zoo/serving/ClusterServing.scala` in IDE. Since IDE consider `/path/to/analytics-zoo/` as the current directory, it would read the config file in it.
+
+Run `zoo/src/main/scala/com/intel/analytics/zoo/serving/http/Frontend2.scala` if you use HTTP frontend.
+ 
+Once started, you could run python client code to finish an end-to-end test just as you run Cluster Serving in [Programming Guide](https://github.com/intel-analytics/analytics-zoo/blob/master/docs/docs/ClusterServingGuide/ProgrammingGuide.md#4-model-inference).
+### Test Package
+Once you write the code and complete the test in IDE, you can package the jar and test.
+
+To package,
+```
+cd /path/to/analytics-zoo/zoo
+./make-dist.sh
+```
+Then, in `target` folder, copy `analytics-zoo-xxx-flink-udf.jar` to your test directory, and rename it as `zoo.jar`, and also copy the `config.yaml` to your test directory.
+
+You could copy `/path/to/analytics-zoo/scripts/cluster-serving/cluster-serving-start` to start Cluster Serving, this scripts will start Redis server for you and submit Flink job. If you prefer not to control Redis, you could use the command in it `${FLINK_HOME}/bin/flink run -c com.intel.analytics.zoo.serving.ClusterServing zoo.jar` to start Cluster Serving.
+
+To run frontend, call `java -cp zoo.jar com.intel.analytics.zoo.serving.http.Frontend2`.
+
+The rest are the same with test in IDE.
+
+## Add Features
+### Data Connector
+Data connector is the producer of Cluster Serving. The remote clients put data into data pipeline
+#### Scala code (The Server)
+
+To define a new data connector to, e.g. Kafka, Redis, or other database, you have to define a Flink Source first.
+
+You could refer to `com/intel/analytics/zoo/serving/engine/FlinkRedisSource.scala` as an example.
+
+```
+class FlinkRedisSource(params: ClusterServingHelper)
+  extends RichParallelSourceFunction[List[(String, String)]] {
+  @volatile var isRunning = true
+
+  override def open(parameters: Configuration): Unit = {
+    // initlalize the connector
+  }
+
+  override def run(sourceContext: SourceFunction
+    .SourceContext[List[(String, String)]]): Unit = while (isRunning) {
+    // get data from data pipeline
+  }
+
+  override def cancel(): Unit = {
+    // close the connector
+  }
+}
+```
+Then you could refer to `com/intel/analytics/zoo/serving/engine/FlinkInference.scala` as the inference method to your new connector. Usually it could be directly used without new implementation. However, you could still define your new method if you need.
+
+Finally, you have to define a Flink Sink, to write data back to data pipeline.
+
+You could refer to `com/intel/analytics/zoo/serving/engine/FlinkRedisSink.scala` as an example.
+
+```
+class FlinkRedisSink(params: ClusterServingHelper)
+  extends RichSinkFunction[List[(String, String)]] {
+  
+  override def open(parameters: Configuration): Unit = {
+    // initialize the connector
+  }
+
+  override def close(): Unit = {
+    // close the connector
+  }
+
+  override def invoke(value: List[(String, String)], context: SinkFunction.Context[_]): Unit = {
+    // write data to data pipeline
+  }
+}
+```
+Please note that normally you should do the space (memory or disk) control of your data pipeline in your code.
+
+
+Please locate Flink Source and Flink Sink code to `com/intel/analytics/zoo/serving/engine/`
+
+If you have some method which need to be wrapped as a class, you could locate them in `com/intel/analytics/zoo/serving/pipeline/`
+#### Python Code (The Client)
+You could refer to `pyzoo/zoo/serving/client.py` to define your client code according to your data connector.
+
+Please locate this part of code in `pyzoo/zoo/serving/data_pipeline_name/`, e.g. `pyzoo/zoo/serving/kafka/` if you create a Kafka connector.
+##### put to data pipeline
+It is recommended to refer to `InputQueue.enqueue()` and `InputQueue.predict()` method. This method calls `self.data_to_b64` method first and add data to data pipeline. You could define a similar enqueue method to work with your data connector.
+##### get from data pipeline
+It is recommended to refer to `OutputQueue.query()` and `OutputQueue.dequeue()` method. This method gets result from data pipeline and calls `self.get_ndarray_from_b64` method to decode. You could define a similar dequeue method to work with your data connector.
+
+## Benchmark Test
+You could use `zoo/src/main/scala/com/intel/analytics/zoo/serving/engine/Operations.scala` to test the inference time of your model. 
+
+The script takes two arguments, run it with `-m modelPath` and `-j jsonPath` to indicate the path to the model and the path to the prepared json format operation template of the model.
+
+The model will output the inference time stats of preprocessing, prediction and postprocessing processes, which varies with the different preprocessing/postprocessing time and thread numbers.
diff --git a/docs/readthedocs/source/doc/Serving/FAQ/faq.md b/docs/readthedocs/source/doc/Serving/FAQ/faq.md
index 812d07e3..350331eb 100644
--- a/docs/readthedocs/source/doc/Serving/FAQ/faq.md
+++ b/docs/readthedocs/source/doc/Serving/FAQ/faq.md
@@ -1,4 +1,4 @@
-# BigDL Cluster Serving FAQ
+# Cluster Serving FAQ
 
 ## General Debug Guide
 You could use following guide to debug if serving is not working properly.
diff --git a/docs/readthedocs/source/doc/Serving/Overview/serving-overview.md b/docs/readthedocs/source/doc/Serving/Overview/serving.md
similarity index 69%
rename from docs/readthedocs/source/doc/Serving/Overview/serving-overview.md
rename to docs/readthedocs/source/doc/Serving/Overview/serving.md
index f2d8757b..39326ae6 100644
--- a/docs/readthedocs/source/doc/Serving/Overview/serving-overview.md
+++ b/docs/readthedocs/source/doc/Serving/Overview/serving.md
@@ -1,4 +1,4 @@
-# BigDL Cluster Serving Overview
+# Cluster Serving User Guide
 BigDL Cluster Serving is a lightweight distributed, real-time serving solution that supports a wide range of deep learning models (such as TensorFlow, PyTorch, Caffe, BigDL and OpenVINO models). It provides a simple pub/sub API, so that the users can easily send their inference requests to the input queue (using a simple Python API); Cluster Serving will then automatically manage the scale-out and real-time model inference across a large cluster (using distributed streaming frameworks such as Apache Spark Streaming, Apache Flink, etc.) 
 
 The overall architecture of BigDL Cluster Serving solution is illustrated as below: 
@@ -26,3 +26,24 @@ You can launch the Cluster Serving service by running the startup script on the
 Cluster Serving provides a simple pub/sub API to the users, so that you can easily send the inference requests to an input queue (currently Redis Streams is used) using a simple Python API.
 
 Cluster Serving will then read the requests from the Redis stream, run the distributed real-time inference across the cluster (using Flink), and return the results back through Redis. As a result, you may get the inference results again using a simple Python API.
+
+## Next Steps
+### Deploy Cluster Serving
+To deploy Cluster Serving, follow steps below
+
+[1. Install Cluster Serving](https://bigdl.readthedocs.io/en/latest/doc/Serving/ProgrammingGuide/serving-installation.html)
+
+[2. Start Cluster Serving](https://bigdl.readthedocs.io/en/latest/doc/Serving/ProgrammingGuide/serving-start.html)
+
+[3. Inference by Cluster Serving](https://bigdl.readthedocs.io/en/latest/doc/Serving/ProgrammingGuide/serving-inference.html)
+
+### Examples
+You could find some end-to-end examples about how to build a serving application from scratch or how to migrate an existed local application to serving.
+
+[Exammple link](https://bigdl.readthedocs.io/en/latest/doc/Serving/Example/example.html) 
+### Trouble Shooting
+Some frequently asked questions are at [FAQ](https://bigdl.readthedocs.io/en/latest/doc/Serving/FAQ/faq.html)
+
+
+### Contribute Guide
+For contributors, check [Contribute Guide](https://bigdl.readthedocs.io/en/latest/doc/Serving/FAQ/contribute-guide.html)
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-inference.md b/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-inference.md
index 097d1e00..da92b872 100644
--- a/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-inference.md
+++ b/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-inference.md
@@ -1,4 +1,4 @@
-# BigDL Cluster Serving Programming Guide
+# Inference by Cluster Serving
 
 ## Model Inference
 Once you finish the installation and service launch, you could do inference using Cluster Serving client API.
diff --git a/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-installation.md b/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-installation.md
index 70029f7a..85443568 100644
--- a/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-installation.md
+++ b/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-installation.md
@@ -1,4 +1,4 @@
-# BigDL Cluster Serving Programming Guide
+# Install Cluster Serving
 
 ## Installation
 It is recommended to install Cluster Serving by pulling the pre-built Docker image to your local node, which have packaged all the required dependencies. Alternatively, you may also manually install Cluster Serving (through either pip or direct downloading), Redis on the local node.
diff --git a/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-start.md b/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-start.md
index 195355c1..f1906a9f 100644
--- a/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-start.md
+++ b/docs/readthedocs/source/doc/Serving/ProgrammingGuide/serving-start.md
@@ -1,4 +1,4 @@
-# BigDL Cluster Serving Programming Guide
+# Start Cluster Serving
   
 ## Launching Service of Serving
 
diff --git a/docs/readthedocs/source/doc/Serving/QuickStart/serving-quickstart.md b/docs/readthedocs/source/doc/Serving/QuickStart/serving-quickstart.md
index a25e67bd..930757ea 100644
--- a/docs/readthedocs/source/doc/Serving/QuickStart/serving-quickstart.md
+++ b/docs/readthedocs/source/doc/Serving/QuickStart/serving-quickstart.md
@@ -1,4 +1,4 @@
-# BigDL Cluster Serving Quick Start
+# Cluster Serving Quick Start
 
 This section provides a quick start example for you to run BigDL Cluster Serving. To simplify the example, we use docker to run Cluster Serving. If you do not have docker installed, [install docker](https://docs.docker.com/install/) first. The quick start example contains all the necessary components so the first time users can get it up and running within minutes:
 
diff --git a/docs/readthedocs/source/index.rst b/docs/readthedocs/source/index.rst
index 92a1ee0b..5021a662 100644
--- a/docs/readthedocs/source/index.rst
+++ b/docs/readthedocs/source/index.rst
@@ -10,7 +10,7 @@ BigDL Documentation
 * `RayOnSpark <doc/Ray/Overview/ray.html>`_: run Ray programs directly on Big Data clusters
 * `Chronos <doc/Chronos/Overview/chronos.html>`_: scalable time series analysis using AutoML
 * `PPML <doc/PPML/Overview/ppml.html>`_: privacy preserving big data analysis and machine learning (*experimental*)
-
+* `Serving <doc/Serving/Overview/serving.html>`_: distributed and automated model inference on Big Data streaming frameworks
  
 -------
 
@@ -85,6 +85,19 @@ BigDL Documentation
    doc/PPML/QuickStart/deploy_intel_sgx_device_plugin_for_kubernetes.md
    doc/PPML/QuickStart/trusted-serving-on-k8s-guide.md
 
+.. toctree::
+   :maxdepth: 1
+   :caption: Serving Overview
+
+   doc/Serving/Overview/serving.md
+   doc/Serving/QuickStart/serving-quickstart.md
+   doc/Serving/ProgrammingGuide/serving-installation.md
+   doc/Serving/ProgrammingGuide/serving-start.md
+   doc/Serving/ProgrammingGuide/serving-inference.md
+   doc/Serving/Example/example.md
+   doc/Serving/FAQ/faq.md
+   doc/Serving/FAQ/contribute-guide.md
+
 .. toctree::
    :maxdepth: 1
    :caption: Common Use Case