Nano: add key feature and how to guide for context manager (#6897)
* add key feature and how to guide for context manager * update key feature for multi models * update based on comment * update * update based on comments * update * update
This commit is contained in:
parent
de88f1e6b3
commit
b3feb53c4a
4 changed files with 51 additions and 0 deletions
|
|
@ -118,6 +118,7 @@ subtrees:
|
|||
- file: doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference
|
||||
- file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_inc
|
||||
- file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_pot
|
||||
- file: doc/Nano/Howto/Inference/PyTorch/pytorch_context_manager
|
||||
- file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_ipex
|
||||
- file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_jit
|
||||
- file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_onnx
|
||||
|
|
|
|||
|
|
@ -0,0 +1,3 @@
|
|||
{
|
||||
"path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/pytorch_context_manager.ipynb"
|
||||
}
|
||||
|
|
@ -53,12 +53,16 @@ PyTorch
|
|||
* `How to accelerate a PyTorch inference pipeline through multiple instances <Inference/PyTorch/multi_instance_pytorch_inference.html>`_
|
||||
* `How to quantize your PyTorch model for inference using Intel Neural Compressor <Inference/PyTorch/quantize_pytorch_inference_inc.html>`_
|
||||
* `How to quantize your PyTorch model for inference using OpenVINO Post-training Optimization Tools <Inference/PyTorch/quantize_pytorch_inference_pot.html>`_
|
||||
* |pytorch_inference_context_manager_link|_
|
||||
* `How to save and load optimized IPEX model <Inference/PyTorch/pytorch_save_and_load_ipex.html>`_
|
||||
* `How to save and load optimized JIT model <Inference/PyTorch/pytorch_save_and_load_jit.html>`_
|
||||
* `How to save and load optimized ONNXRuntime model <Inference/PyTorch/pytorch_save_and_load_onnx.html>`_
|
||||
* `How to save and load optimized OpenVINO model <Inference/PyTorch/pytorch_save_and_load_openvino.html>`_
|
||||
* `How to find accelerated method with minimal latency using InferenceOptimizer <Inference/PyTorch/inference_optimizer_optimize.html>`_
|
||||
|
||||
.. |pytorch_inference_context_manager_link| replace:: How to use context manager through ``get_context``
|
||||
.. _pytorch_inference_context_manager_link: Inference/PyTorch/pytorch_context_manager.html
|
||||
|
||||
Install
|
||||
-------------------------
|
||||
* `How to install BigDL-Nano in Google Colab <install_in_colab.html>`_
|
||||
|
|
|
|||
|
|
@ -265,3 +265,46 @@ multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=4, cores
|
|||
# the third process will use core 2 and 3, the fourth process will use core 4 and 5
|
||||
multi_model = InferenceOptimizer.to_multi_instance(model, cpu_for_each_process=[[0], [1], [2,3], [4,5]])
|
||||
```
|
||||
|
||||
## Automatic Context Management
|
||||
BigDL-Nano provides ``InferenceOptimizer.get_context(model=...)`` API to enable automatic context management for PyTorch inference. With only one line of code change, BigDL-Nano will automatically provide suitable context management for each accelerated model, it usually contains part of or all of following three types of context managers:
|
||||
|
||||
1. ``torch.no_grad()`` to disable gradients, which will be used for all model
|
||||
|
||||
2. ``torch.cpu.amp.autocast(dtype=torch.bfloat16)`` to run in mixed precision, which will be provided for bf16 related model
|
||||
|
||||
3. ``torch.set_num_threads()`` to control thread number, which will be used only if you specify thread_num when applying ``InferenceOptimizer.trace``/``quantize``/``optimize``
|
||||
|
||||
For model accelerated by ``InferenceOptimizer.trace``, usage now looks like below codes, here we just take ``ipex`` for example:
|
||||
```python
|
||||
from bigdl.nano.pytorch import InferenceOptimizer
|
||||
ipex_model = InferenceOptimizer.trace(model,
|
||||
use_ipex=True,
|
||||
thread_num=4)
|
||||
|
||||
with InferenceOptimizer.get_context(ipex_model):
|
||||
output = ipex_model(x)
|
||||
assert torch.get_num_threads() == 4 # this line just to let you know Nano has provided thread control automatically : )
|
||||
```
|
||||
|
||||
For ``InferenceOptimizer.quantize`` and ``InferenceOptimizer.optimize``, usage is the same.
|
||||
|
||||
``InferenceOptimizer.get_context(model=...)`` can be used for muitiple models. If you have a model pipeline, you can also get a common context manager by passing multiple models to `get_context`.
|
||||
```python
|
||||
from torch import nn
|
||||
class Classifier(nn.Module):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.linear = nn.Linear(1000, 1)
|
||||
|
||||
def forward(self, x):
|
||||
return self.linear(x)
|
||||
|
||||
classifer = Classifier()
|
||||
|
||||
with InferenceOptimizer.get_context(ipex_model, classifer):
|
||||
# a pipeline consists of backbone and classifier
|
||||
x = ipex_model(input_sample)
|
||||
output = classifer(x)
|
||||
assert torch.get_num_threads() == 4 # this line just to let you know Nano has provided thread control automatically : )
|
||||
```
|
||||
|
|
|
|||
Loading…
Reference in a new issue