diff --git a/docs/readthedocs/source/_toc.yml b/docs/readthedocs/source/_toc.yml
index 27ce4f9f..a9528341 100644
--- a/docs/readthedocs/source/_toc.yml
+++ b/docs/readthedocs/source/_toc.yml
@@ -118,6 +118,7 @@ subtrees:
                   - file: doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference
                   - file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_inc
                   - file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_pot
+                  - file: doc/Nano/Howto/Inference/PyTorch/pytorch_context_manager
                   - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_ipex
                   - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_jit
                   - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_onnx
diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/pytorch_context_manager.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/pytorch_context_manager.nblink
new file mode 100644
index 00000000..57788ed6
--- /dev/null
+++ b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/pytorch_context_manager.nblink
@@ -0,0 +1,3 @@
+{
+    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/pytorch_context_manager.ipynb"
+}
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/Howto/index.rst b/docs/readthedocs/source/doc/Nano/Howto/index.rst
index 1c78af3f..a9cd4c17 100644
--- a/docs/readthedocs/source/doc/Nano/Howto/index.rst
+++ b/docs/readthedocs/source/doc/Nano/Howto/index.rst
@@ -53,12 +53,16 @@ PyTorch
 * `How to accelerate a PyTorch inference pipeline through multiple instances <Inference/PyTorch/multi_instance_pytorch_inference.html>`_
 * `How to quantize your PyTorch model for inference using Intel Neural Compressor <Inference/PyTorch/quantize_pytorch_inference_inc.html>`_
 * `How to quantize your PyTorch model for inference using OpenVINO Post-training Optimization Tools <Inference/PyTorch/quantize_pytorch_inference_pot.html>`_
+* |pytorch_inference_context_manager_link|_
 * `How to save and load optimized IPEX model <Inference/PyTorch/pytorch_save_and_load_ipex.html>`_
 * `How to save and load optimized JIT model <Inference/PyTorch/pytorch_save_and_load_jit.html>`_
 * `How to save and load optimized ONNXRuntime model <Inference/PyTorch/pytorch_save_and_load_onnx.html>`_
 * `How to save and load optimized OpenVINO model <Inference/PyTorch/pytorch_save_and_load_openvino.html>`_
 * `How to find accelerated method with minimal latency using InferenceOptimizer <Inference/PyTorch/inference_optimizer_optimize.html>`_
 
+.. |pytorch_inference_context_manager_link| replace:: How to use context manager through ``get_context``
+.. _pytorch_inference_context_manager_link: Inference/PyTorch/pytorch_context_manager.html
+
 Install
 -------------------------
 * `How to install BigDL-Nano in Google Colab <install_in_colab.html>`_
diff --git a/docs/readthedocs/source/doc/Nano/Overview/pytorch_inference.md b/docs/readthedocs/source/doc/Nano/Overview/pytorch_inference.md
index 2e0151fa..5e669164 100644
--- a/docs/readthedocs/source/doc/Nano/Overview/pytorch_inference.md
+++ b/docs/readthedocs/source/doc/Nano/Overview/pytorch_inference.md
@@ -265,3 +265,46 @@ multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=4, cores
 # the third process will use core 2 and 3, the fourth process will use core 4 and 5
 multi_model = InferenceOptimizer.to_multi_instance(model, cpu_for_each_process=[[0], [1], [2,3], [4,5]])
 ```
+
+## Automatic Context Management
+BigDL-Nano provides ``InferenceOptimizer.get_context(model=...)`` API to enable automatic context management for PyTorch inference. With only one line of code change, BigDL-Nano will automatically provide suitable context management for each accelerated model, it usually contains part of or all of following three types of context managers:
+
+1. ``torch.no_grad()`` to disable gradients, which will be used for all model
+   
+2. ``torch.cpu.amp.autocast(dtype=torch.bfloat16)`` to run in mixed precision, which will be provided for bf16 related model
+   
+3. ``torch.set_num_threads()`` to control thread number, which will be used only if you specify thread_num when applying ``InferenceOptimizer.trace``/``quantize``/``optimize``
+
+For model accelerated by ``InferenceOptimizer.trace``, usage now looks like below codes, here we just take ``ipex`` for example:
+```python
+from bigdl.nano.pytorch import InferenceOptimizer
+ipex_model = InferenceOptimizer.trace(model,
+                                      use_ipex=True,
+                                      thread_num=4)
+
+with InferenceOptimizer.get_context(ipex_model):
+    output = ipex_model(x)
+    assert torch.get_num_threads() == 4  # this line just to let you know Nano has provided thread control automatically : )
+```
+
+For ``InferenceOptimizer.quantize`` and ``InferenceOptimizer.optimize``, usage is the same.
+
+``InferenceOptimizer.get_context(model=...)`` can be used for muitiple models. If you have a model pipeline, you can also get a common context manager by passing multiple models to `get_context`.
+```python
+from torch import nn
+class Classifier(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.linear = nn.Linear(1000, 1)
+    
+    def forward(self, x):
+        return self.linear(x)
+
+classifer = Classifier()
+
+with InferenceOptimizer.get_context(ipex_model, classifer):
+    # a pipeline consists of backbone and classifier
+    x = ipex_model(input_sample)
+    output = classifer(x) 
+    assert torch.get_num_threads() == 4  # this line just to let you know Nano has provided thread control automatically : )
+```