Nano: add key feature and how to guide for context manager (#6897)
* add key feature and how to guide for context manager * update key feature for multi models * update based on comment * update * update based on comments * update * update
This commit is contained in:
		
							parent
							
								
									de88f1e6b3
								
							
						
					
					
						commit
						b3feb53c4a
					
				
					 4 changed files with 51 additions and 0 deletions
				
			
		| 
						 | 
					@ -118,6 +118,7 @@ subtrees:
 | 
				
			||||||
                  - file: doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference
 | 
					                  - file: doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference
 | 
				
			||||||
                  - file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_inc
 | 
					                  - file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_inc
 | 
				
			||||||
                  - file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_pot
 | 
					                  - file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_pot
 | 
				
			||||||
 | 
					                  - file: doc/Nano/Howto/Inference/PyTorch/pytorch_context_manager
 | 
				
			||||||
                  - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_ipex
 | 
					                  - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_ipex
 | 
				
			||||||
                  - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_jit
 | 
					                  - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_jit
 | 
				
			||||||
                  - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_onnx
 | 
					                  - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_onnx
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -0,0 +1,3 @@
 | 
				
			||||||
 | 
					{
 | 
				
			||||||
 | 
					    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/pytorch_context_manager.ipynb"
 | 
				
			||||||
 | 
					}
 | 
				
			||||||
| 
						 | 
					@ -53,12 +53,16 @@ PyTorch
 | 
				
			||||||
* `How to accelerate a PyTorch inference pipeline through multiple instances <Inference/PyTorch/multi_instance_pytorch_inference.html>`_
 | 
					* `How to accelerate a PyTorch inference pipeline through multiple instances <Inference/PyTorch/multi_instance_pytorch_inference.html>`_
 | 
				
			||||||
* `How to quantize your PyTorch model for inference using Intel Neural Compressor <Inference/PyTorch/quantize_pytorch_inference_inc.html>`_
 | 
					* `How to quantize your PyTorch model for inference using Intel Neural Compressor <Inference/PyTorch/quantize_pytorch_inference_inc.html>`_
 | 
				
			||||||
* `How to quantize your PyTorch model for inference using OpenVINO Post-training Optimization Tools <Inference/PyTorch/quantize_pytorch_inference_pot.html>`_
 | 
					* `How to quantize your PyTorch model for inference using OpenVINO Post-training Optimization Tools <Inference/PyTorch/quantize_pytorch_inference_pot.html>`_
 | 
				
			||||||
 | 
					* |pytorch_inference_context_manager_link|_
 | 
				
			||||||
* `How to save and load optimized IPEX model <Inference/PyTorch/pytorch_save_and_load_ipex.html>`_
 | 
					* `How to save and load optimized IPEX model <Inference/PyTorch/pytorch_save_and_load_ipex.html>`_
 | 
				
			||||||
* `How to save and load optimized JIT model <Inference/PyTorch/pytorch_save_and_load_jit.html>`_
 | 
					* `How to save and load optimized JIT model <Inference/PyTorch/pytorch_save_and_load_jit.html>`_
 | 
				
			||||||
* `How to save and load optimized ONNXRuntime model <Inference/PyTorch/pytorch_save_and_load_onnx.html>`_
 | 
					* `How to save and load optimized ONNXRuntime model <Inference/PyTorch/pytorch_save_and_load_onnx.html>`_
 | 
				
			||||||
* `How to save and load optimized OpenVINO model <Inference/PyTorch/pytorch_save_and_load_openvino.html>`_
 | 
					* `How to save and load optimized OpenVINO model <Inference/PyTorch/pytorch_save_and_load_openvino.html>`_
 | 
				
			||||||
* `How to find accelerated method with minimal latency using InferenceOptimizer <Inference/PyTorch/inference_optimizer_optimize.html>`_
 | 
					* `How to find accelerated method with minimal latency using InferenceOptimizer <Inference/PyTorch/inference_optimizer_optimize.html>`_
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. |pytorch_inference_context_manager_link| replace:: How to use context manager through ``get_context``
 | 
				
			||||||
 | 
					.. _pytorch_inference_context_manager_link: Inference/PyTorch/pytorch_context_manager.html
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Install
 | 
					Install
 | 
				
			||||||
-------------------------
 | 
					-------------------------
 | 
				
			||||||
* `How to install BigDL-Nano in Google Colab <install_in_colab.html>`_
 | 
					* `How to install BigDL-Nano in Google Colab <install_in_colab.html>`_
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -265,3 +265,46 @@ multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=4, cores
 | 
				
			||||||
# the third process will use core 2 and 3, the fourth process will use core 4 and 5
 | 
					# the third process will use core 2 and 3, the fourth process will use core 4 and 5
 | 
				
			||||||
multi_model = InferenceOptimizer.to_multi_instance(model, cpu_for_each_process=[[0], [1], [2,3], [4,5]])
 | 
					multi_model = InferenceOptimizer.to_multi_instance(model, cpu_for_each_process=[[0], [1], [2,3], [4,5]])
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Automatic Context Management
 | 
				
			||||||
 | 
					BigDL-Nano provides ``InferenceOptimizer.get_context(model=...)`` API to enable automatic context management for PyTorch inference. With only one line of code change, BigDL-Nano will automatically provide suitable context management for each accelerated model, it usually contains part of or all of following three types of context managers:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					1. ``torch.no_grad()`` to disable gradients, which will be used for all model
 | 
				
			||||||
 | 
					   
 | 
				
			||||||
 | 
					2. ``torch.cpu.amp.autocast(dtype=torch.bfloat16)`` to run in mixed precision, which will be provided for bf16 related model
 | 
				
			||||||
 | 
					   
 | 
				
			||||||
 | 
					3. ``torch.set_num_threads()`` to control thread number, which will be used only if you specify thread_num when applying ``InferenceOptimizer.trace``/``quantize``/``optimize``
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					For model accelerated by ``InferenceOptimizer.trace``, usage now looks like below codes, here we just take ``ipex`` for example:
 | 
				
			||||||
 | 
					```python
 | 
				
			||||||
 | 
					from bigdl.nano.pytorch import InferenceOptimizer
 | 
				
			||||||
 | 
					ipex_model = InferenceOptimizer.trace(model,
 | 
				
			||||||
 | 
					                                      use_ipex=True,
 | 
				
			||||||
 | 
					                                      thread_num=4)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					with InferenceOptimizer.get_context(ipex_model):
 | 
				
			||||||
 | 
					    output = ipex_model(x)
 | 
				
			||||||
 | 
					    assert torch.get_num_threads() == 4  # this line just to let you know Nano has provided thread control automatically : )
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					For ``InferenceOptimizer.quantize`` and ``InferenceOptimizer.optimize``, usage is the same.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					``InferenceOptimizer.get_context(model=...)`` can be used for muitiple models. If you have a model pipeline, you can also get a common context manager by passing multiple models to `get_context`.
 | 
				
			||||||
 | 
					```python
 | 
				
			||||||
 | 
					from torch import nn
 | 
				
			||||||
 | 
					class Classifier(nn.Module):
 | 
				
			||||||
 | 
					    def __init__(self):
 | 
				
			||||||
 | 
					        super().__init__()
 | 
				
			||||||
 | 
					        self.linear = nn.Linear(1000, 1)
 | 
				
			||||||
 | 
					    
 | 
				
			||||||
 | 
					    def forward(self, x):
 | 
				
			||||||
 | 
					        return self.linear(x)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					classifer = Classifier()
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					with InferenceOptimizer.get_context(ipex_model, classifer):
 | 
				
			||||||
 | 
					    # a pipeline consists of backbone and classifier
 | 
				
			||||||
 | 
					    x = ipex_model(input_sample)
 | 
				
			||||||
 | 
					    output = classifer(x) 
 | 
				
			||||||
 | 
					    assert torch.get_num_threads() == 4  # this line just to let you know Nano has provided thread control automatically : )
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in a new issue