diff --git a/docs/readthedocs/source/_toc.yml b/docs/readthedocs/source/_toc.yml index 27ce4f9f..a9528341 100644 --- a/docs/readthedocs/source/_toc.yml +++ b/docs/readthedocs/source/_toc.yml @@ -118,6 +118,7 @@ subtrees: - file: doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference - file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_inc - file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_pot + - file: doc/Nano/Howto/Inference/PyTorch/pytorch_context_manager - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_ipex - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_jit - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_onnx diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/pytorch_context_manager.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/pytorch_context_manager.nblink new file mode 100644 index 00000000..57788ed6 --- /dev/null +++ b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/pytorch_context_manager.nblink @@ -0,0 +1,3 @@ +{ + "path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/pytorch_context_manager.ipynb" +} \ No newline at end of file diff --git a/docs/readthedocs/source/doc/Nano/Howto/index.rst b/docs/readthedocs/source/doc/Nano/Howto/index.rst index 1c78af3f..a9cd4c17 100644 --- a/docs/readthedocs/source/doc/Nano/Howto/index.rst +++ b/docs/readthedocs/source/doc/Nano/Howto/index.rst @@ -53,12 +53,16 @@ PyTorch * `How to accelerate a PyTorch inference pipeline through multiple instances `_ * `How to quantize your PyTorch model for inference using Intel Neural Compressor `_ * `How to quantize your PyTorch model for inference using OpenVINO Post-training Optimization Tools `_ +* |pytorch_inference_context_manager_link|_ * `How to save and load optimized IPEX model `_ * `How to save and load optimized JIT model `_ * `How to save and load optimized ONNXRuntime model `_ * `How to save and load optimized OpenVINO model `_ * `How to find accelerated method with minimal latency using InferenceOptimizer `_ +.. |pytorch_inference_context_manager_link| replace:: How to use context manager through ``get_context`` +.. _pytorch_inference_context_manager_link: Inference/PyTorch/pytorch_context_manager.html + Install ------------------------- * `How to install BigDL-Nano in Google Colab `_ diff --git a/docs/readthedocs/source/doc/Nano/Overview/pytorch_inference.md b/docs/readthedocs/source/doc/Nano/Overview/pytorch_inference.md index 2e0151fa..5e669164 100644 --- a/docs/readthedocs/source/doc/Nano/Overview/pytorch_inference.md +++ b/docs/readthedocs/source/doc/Nano/Overview/pytorch_inference.md @@ -265,3 +265,46 @@ multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=4, cores # the third process will use core 2 and 3, the fourth process will use core 4 and 5 multi_model = InferenceOptimizer.to_multi_instance(model, cpu_for_each_process=[[0], [1], [2,3], [4,5]]) ``` + +## Automatic Context Management +BigDL-Nano provides ``InferenceOptimizer.get_context(model=...)`` API to enable automatic context management for PyTorch inference. With only one line of code change, BigDL-Nano will automatically provide suitable context management for each accelerated model, it usually contains part of or all of following three types of context managers: + +1. ``torch.no_grad()`` to disable gradients, which will be used for all model + +2. ``torch.cpu.amp.autocast(dtype=torch.bfloat16)`` to run in mixed precision, which will be provided for bf16 related model + +3. ``torch.set_num_threads()`` to control thread number, which will be used only if you specify thread_num when applying ``InferenceOptimizer.trace``/``quantize``/``optimize`` + +For model accelerated by ``InferenceOptimizer.trace``, usage now looks like below codes, here we just take ``ipex`` for example: +```python +from bigdl.nano.pytorch import InferenceOptimizer +ipex_model = InferenceOptimizer.trace(model, + use_ipex=True, + thread_num=4) + +with InferenceOptimizer.get_context(ipex_model): + output = ipex_model(x) + assert torch.get_num_threads() == 4 # this line just to let you know Nano has provided thread control automatically : ) +``` + +For ``InferenceOptimizer.quantize`` and ``InferenceOptimizer.optimize``, usage is the same. + +``InferenceOptimizer.get_context(model=...)`` can be used for muitiple models. If you have a model pipeline, you can also get a common context manager by passing multiple models to `get_context`. +```python +from torch import nn +class Classifier(nn.Module): + def __init__(self): + super().__init__() + self.linear = nn.Linear(1000, 1) + + def forward(self, x): + return self.linear(x) + +classifer = Classifier() + +with InferenceOptimizer.get_context(ipex_model, classifer): + # a pipeline consists of backbone and classifier + x = ipex_model(input_sample) + output = classifer(x) + assert torch.get_num_threads() == 4 # this line just to let you know Nano has provided thread control automatically : ) +```