Nano: add key feature and how to guide for context manager (#6897)

* add key feature and how to guide for context manager

* update key feature for multi models

* update based on comment

* update

* update based on comments

* update

* update
This commit is contained in:
Ruonan Wang 2022-12-13 16:57:52 +08:00 committed by GitHub
parent de88f1e6b3
commit b3feb53c4a
4 changed files with 51 additions and 0 deletions

View file

@ -118,6 +118,7 @@ subtrees:
- file: doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference
- file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_inc
- file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_pot
- file: doc/Nano/Howto/Inference/PyTorch/pytorch_context_manager
- file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_ipex
- file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_jit
- file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_onnx

View file

@ -0,0 +1,3 @@
{
"path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/pytorch_context_manager.ipynb"
}

View file

@ -53,12 +53,16 @@ PyTorch
* `How to accelerate a PyTorch inference pipeline through multiple instances <Inference/PyTorch/multi_instance_pytorch_inference.html>`_
* `How to quantize your PyTorch model for inference using Intel Neural Compressor <Inference/PyTorch/quantize_pytorch_inference_inc.html>`_
* `How to quantize your PyTorch model for inference using OpenVINO Post-training Optimization Tools <Inference/PyTorch/quantize_pytorch_inference_pot.html>`_
* |pytorch_inference_context_manager_link|_
* `How to save and load optimized IPEX model <Inference/PyTorch/pytorch_save_and_load_ipex.html>`_
* `How to save and load optimized JIT model <Inference/PyTorch/pytorch_save_and_load_jit.html>`_
* `How to save and load optimized ONNXRuntime model <Inference/PyTorch/pytorch_save_and_load_onnx.html>`_
* `How to save and load optimized OpenVINO model <Inference/PyTorch/pytorch_save_and_load_openvino.html>`_
* `How to find accelerated method with minimal latency using InferenceOptimizer <Inference/PyTorch/inference_optimizer_optimize.html>`_
.. |pytorch_inference_context_manager_link| replace:: How to use context manager through ``get_context``
.. _pytorch_inference_context_manager_link: Inference/PyTorch/pytorch_context_manager.html
Install
-------------------------
* `How to install BigDL-Nano in Google Colab <install_in_colab.html>`_

View file

@ -265,3 +265,46 @@ multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=4, cores
# the third process will use core 2 and 3, the fourth process will use core 4 and 5
multi_model = InferenceOptimizer.to_multi_instance(model, cpu_for_each_process=[[0], [1], [2,3], [4,5]])
```
## Automatic Context Management
BigDL-Nano provides ``InferenceOptimizer.get_context(model=...)`` API to enable automatic context management for PyTorch inference. With only one line of code change, BigDL-Nano will automatically provide suitable context management for each accelerated model, it usually contains part of or all of following three types of context managers:
1. ``torch.no_grad()`` to disable gradients, which will be used for all model
2. ``torch.cpu.amp.autocast(dtype=torch.bfloat16)`` to run in mixed precision, which will be provided for bf16 related model
3. ``torch.set_num_threads()`` to control thread number, which will be used only if you specify thread_num when applying ``InferenceOptimizer.trace``/``quantize``/``optimize``
For model accelerated by ``InferenceOptimizer.trace``, usage now looks like below codes, here we just take ``ipex`` for example:
```python
from bigdl.nano.pytorch import InferenceOptimizer
ipex_model = InferenceOptimizer.trace(model,
use_ipex=True,
thread_num=4)
with InferenceOptimizer.get_context(ipex_model):
output = ipex_model(x)
assert torch.get_num_threads() == 4 # this line just to let you know Nano has provided thread control automatically : )
```
For ``InferenceOptimizer.quantize`` and ``InferenceOptimizer.optimize``, usage is the same.
``InferenceOptimizer.get_context(model=...)`` can be used for muitiple models. If you have a model pipeline, you can also get a common context manager by passing multiple models to `get_context`.
```python
from torch import nn
class Classifier(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(1000, 1)
def forward(self, x):
return self.linear(x)
classifer = Classifier()
with InferenceOptimizer.get_context(ipex_model, classifer):
# a pipeline consists of backbone and classifier
x = ipex_model(input_sample)
output = classifer(x)
assert torch.get_num_threads() == 4 # this line just to let you know Nano has provided thread control automatically : )
```