Nano: add key feature and how to guide for context manager (#6897)
* add key feature and how to guide for context manager * update key feature for multi models * update based on comment * update * update based on comments * update * update
This commit is contained in:
parent
de88f1e6b3
commit
b3feb53c4a
4 changed files with 51 additions and 0 deletions
|
|
@ -118,6 +118,7 @@ subtrees:
|
||||||
- file: doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference
|
- file: doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference
|
||||||
- file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_inc
|
- file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_inc
|
||||||
- file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_pot
|
- file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_pot
|
||||||
|
- file: doc/Nano/Howto/Inference/PyTorch/pytorch_context_manager
|
||||||
- file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_ipex
|
- file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_ipex
|
||||||
- file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_jit
|
- file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_jit
|
||||||
- file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_onnx
|
- file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_onnx
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,3 @@
|
||||||
|
{
|
||||||
|
"path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/pytorch_context_manager.ipynb"
|
||||||
|
}
|
||||||
|
|
@ -53,12 +53,16 @@ PyTorch
|
||||||
* `How to accelerate a PyTorch inference pipeline through multiple instances <Inference/PyTorch/multi_instance_pytorch_inference.html>`_
|
* `How to accelerate a PyTorch inference pipeline through multiple instances <Inference/PyTorch/multi_instance_pytorch_inference.html>`_
|
||||||
* `How to quantize your PyTorch model for inference using Intel Neural Compressor <Inference/PyTorch/quantize_pytorch_inference_inc.html>`_
|
* `How to quantize your PyTorch model for inference using Intel Neural Compressor <Inference/PyTorch/quantize_pytorch_inference_inc.html>`_
|
||||||
* `How to quantize your PyTorch model for inference using OpenVINO Post-training Optimization Tools <Inference/PyTorch/quantize_pytorch_inference_pot.html>`_
|
* `How to quantize your PyTorch model for inference using OpenVINO Post-training Optimization Tools <Inference/PyTorch/quantize_pytorch_inference_pot.html>`_
|
||||||
|
* |pytorch_inference_context_manager_link|_
|
||||||
* `How to save and load optimized IPEX model <Inference/PyTorch/pytorch_save_and_load_ipex.html>`_
|
* `How to save and load optimized IPEX model <Inference/PyTorch/pytorch_save_and_load_ipex.html>`_
|
||||||
* `How to save and load optimized JIT model <Inference/PyTorch/pytorch_save_and_load_jit.html>`_
|
* `How to save and load optimized JIT model <Inference/PyTorch/pytorch_save_and_load_jit.html>`_
|
||||||
* `How to save and load optimized ONNXRuntime model <Inference/PyTorch/pytorch_save_and_load_onnx.html>`_
|
* `How to save and load optimized ONNXRuntime model <Inference/PyTorch/pytorch_save_and_load_onnx.html>`_
|
||||||
* `How to save and load optimized OpenVINO model <Inference/PyTorch/pytorch_save_and_load_openvino.html>`_
|
* `How to save and load optimized OpenVINO model <Inference/PyTorch/pytorch_save_and_load_openvino.html>`_
|
||||||
* `How to find accelerated method with minimal latency using InferenceOptimizer <Inference/PyTorch/inference_optimizer_optimize.html>`_
|
* `How to find accelerated method with minimal latency using InferenceOptimizer <Inference/PyTorch/inference_optimizer_optimize.html>`_
|
||||||
|
|
||||||
|
.. |pytorch_inference_context_manager_link| replace:: How to use context manager through ``get_context``
|
||||||
|
.. _pytorch_inference_context_manager_link: Inference/PyTorch/pytorch_context_manager.html
|
||||||
|
|
||||||
Install
|
Install
|
||||||
-------------------------
|
-------------------------
|
||||||
* `How to install BigDL-Nano in Google Colab <install_in_colab.html>`_
|
* `How to install BigDL-Nano in Google Colab <install_in_colab.html>`_
|
||||||
|
|
|
||||||
|
|
@ -265,3 +265,46 @@ multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=4, cores
|
||||||
# the third process will use core 2 and 3, the fourth process will use core 4 and 5
|
# the third process will use core 2 and 3, the fourth process will use core 4 and 5
|
||||||
multi_model = InferenceOptimizer.to_multi_instance(model, cpu_for_each_process=[[0], [1], [2,3], [4,5]])
|
multi_model = InferenceOptimizer.to_multi_instance(model, cpu_for_each_process=[[0], [1], [2,3], [4,5]])
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Automatic Context Management
|
||||||
|
BigDL-Nano provides ``InferenceOptimizer.get_context(model=...)`` API to enable automatic context management for PyTorch inference. With only one line of code change, BigDL-Nano will automatically provide suitable context management for each accelerated model, it usually contains part of or all of following three types of context managers:
|
||||||
|
|
||||||
|
1. ``torch.no_grad()`` to disable gradients, which will be used for all model
|
||||||
|
|
||||||
|
2. ``torch.cpu.amp.autocast(dtype=torch.bfloat16)`` to run in mixed precision, which will be provided for bf16 related model
|
||||||
|
|
||||||
|
3. ``torch.set_num_threads()`` to control thread number, which will be used only if you specify thread_num when applying ``InferenceOptimizer.trace``/``quantize``/``optimize``
|
||||||
|
|
||||||
|
For model accelerated by ``InferenceOptimizer.trace``, usage now looks like below codes, here we just take ``ipex`` for example:
|
||||||
|
```python
|
||||||
|
from bigdl.nano.pytorch import InferenceOptimizer
|
||||||
|
ipex_model = InferenceOptimizer.trace(model,
|
||||||
|
use_ipex=True,
|
||||||
|
thread_num=4)
|
||||||
|
|
||||||
|
with InferenceOptimizer.get_context(ipex_model):
|
||||||
|
output = ipex_model(x)
|
||||||
|
assert torch.get_num_threads() == 4 # this line just to let you know Nano has provided thread control automatically : )
|
||||||
|
```
|
||||||
|
|
||||||
|
For ``InferenceOptimizer.quantize`` and ``InferenceOptimizer.optimize``, usage is the same.
|
||||||
|
|
||||||
|
``InferenceOptimizer.get_context(model=...)`` can be used for muitiple models. If you have a model pipeline, you can also get a common context manager by passing multiple models to `get_context`.
|
||||||
|
```python
|
||||||
|
from torch import nn
|
||||||
|
class Classifier(nn.Module):
|
||||||
|
def __init__(self):
|
||||||
|
super().__init__()
|
||||||
|
self.linear = nn.Linear(1000, 1)
|
||||||
|
|
||||||
|
def forward(self, x):
|
||||||
|
return self.linear(x)
|
||||||
|
|
||||||
|
classifer = Classifier()
|
||||||
|
|
||||||
|
with InferenceOptimizer.get_context(ipex_model, classifer):
|
||||||
|
# a pipeline consists of backbone and classifier
|
||||||
|
x = ipex_model(input_sample)
|
||||||
|
output = classifer(x)
|
||||||
|
assert torch.get_num_threads() == 4 # this line just to let you know Nano has provided thread control automatically : )
|
||||||
|
```
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue