diff --git a/docs/readthedocs/source/_toc.yml b/docs/readthedocs/source/_toc.yml
index 5cd7a972..27ce4f9f 100644
--- a/docs/readthedocs/source/_toc.yml
+++ b/docs/readthedocs/source/_toc.yml
@@ -115,6 +115,7 @@ subtrees:
                   - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_onnx
                   - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_openvino
                   - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_jit_ipex
+                  - file: doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference
                   - file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_inc
                   - file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_pot
                   - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_ipex
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference.nblink
new file mode 100644
index 00000000..878428b7
--- /dev/null
+++ b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference.nblink
@@ -0,0 +1,3 @@
+{
+    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/multi_instance_pytorch_inference.ipynb"
+}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/index.rst b/docs/readthedocs/source/doc/Nano/Howto/index.rst
index e7f74736..1c78af3f 100644
--- a/docs/readthedocs/source/doc/Nano/Howto/index.rst
+++ b/docs/readthedocs/source/doc/Nano/Howto/index.rst
@@ -50,6 +50,7 @@ PyTorch
 * `How to accelerate a PyTorch inference pipeline through ONNXRuntime <Inference/PyTorch/accelerate_pytorch_inference_onnx.html>`_
 * `How to accelerate a PyTorch inference pipeline through OpenVINO <Inference/PyTorch/accelerate_pytorch_inference_openvino.html>`_
 * `How to accelerate a PyTorch inference pipeline through JIT/IPEX <Inference/PyTorch/accelerate_pytorch_inference_jit_ipex.html>`_
+* `How to accelerate a PyTorch inference pipeline through multiple instances <Inference/PyTorch/multi_instance_pytorch_inference.html>`_
 * `How to quantize your PyTorch model for inference using Intel Neural Compressor <Inference/PyTorch/quantize_pytorch_inference_inc.html>`_
 * `How to quantize your PyTorch model for inference using OpenVINO Post-training Optimization Tools <Inference/PyTorch/quantize_pytorch_inference_pot.html>`_
 * `How to save and load optimized IPEX model <Inference/PyTorch/pytorch_save_and_load_ipex.html>`_
diff --git a/docs/readthedocs/source/doc/Nano/Overview/pytorch_inference.md b/docs/readthedocs/source/doc/Nano/Overview/pytorch_inference.md
index 04371ee2..2e0151fa 100644
--- a/docs/readthedocs/source/doc/Nano/Overview/pytorch_inference.md
+++ b/docs/readthedocs/source/doc/Nano/Overview/pytorch_inference.md
@@ -227,4 +227,41 @@ InferenceOptimizer.quantize(model,
                             approach='static',
                             max_trials=10,
                             ):
-```
\ No newline at end of file
+```
+
+## Multi-instance Acceleration
+
+BigDL-Nano also provides multi-instance inference. To use it, you should call `multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=n)` first, where `num_processes` specifies the number of processes you want to use.
+
+After calling it, `multi_model` will receive a `DataLoader` or a list of batches instead of a batch, and produce a list of inference result instead of a single result. You can use it as following:
+
+```python
+multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=4)
+
+# predict a DataLoader
+y_hat_list = multi_model(dataloader)
+
+# or predict a list of batches instead of entire DataLoader
+it = iter(dataloader)
+batch_list = []
+for i in range(10):
+    batch = next(it)
+    batch_list.append(batch)
+y_hat_list = multi_model(batch_list)
+
+# y_hat_list is a list of inference result, you can use it like this
+for y_hat in y_hat_list:
+    do_something(y_hat)
+```
+
+`InferenceOptimizer.to_multi_instance` also has a parameter named `cores_per_process` to specify the number of CPU cores used by each process, and a parameter named `cpu_for_each_process` to specify the CPU cores used by each process. Normally you don't need to set them manually, BigDL-Nano will find the best configuration automatically. But if you want, you can use them as following:
+```python
+# Use 4 processes to run inference,
+# each process will use 2 CPU cores
+multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=4, cores_per_process=2)
+
+# Use 4 processes to run inference,
+# the first process will use core 0, the second process will use core 1,
+# the third process will use core 2 and 3, the fourth process will use core 4 and 5
+multi_model = InferenceOptimizer.to_multi_instance(model, cpu_for_each_process=[[0], [1], [2,3], [4,5]])
+```