Nano: Add key feature and how to guide for multi-instance inference (#6901)
This commit is contained in:
		
							parent
							
								
									99d6decfeb
								
							
						
					
					
						commit
						664dfbe7ef
					
				
					 4 changed files with 43 additions and 1 deletions
				
			
		| 
						 | 
					@ -115,6 +115,7 @@ subtrees:
 | 
				
			||||||
                  - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_onnx
 | 
					                  - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_onnx
 | 
				
			||||||
                  - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_openvino
 | 
					                  - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_openvino
 | 
				
			||||||
                  - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_jit_ipex
 | 
					                  - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_jit_ipex
 | 
				
			||||||
 | 
					                  - file: doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference
 | 
				
			||||||
                  - file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_inc
 | 
					                  - file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_inc
 | 
				
			||||||
                  - file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_pot
 | 
					                  - file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_pot
 | 
				
			||||||
                  - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_ipex
 | 
					                  - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_ipex
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -0,0 +1,3 @@
 | 
				
			||||||
 | 
					{
 | 
				
			||||||
 | 
					    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/multi_instance_pytorch_inference.ipynb"
 | 
				
			||||||
 | 
					}
 | 
				
			||||||
| 
						 | 
					@ -50,6 +50,7 @@ PyTorch
 | 
				
			||||||
* `How to accelerate a PyTorch inference pipeline through ONNXRuntime <Inference/PyTorch/accelerate_pytorch_inference_onnx.html>`_
 | 
					* `How to accelerate a PyTorch inference pipeline through ONNXRuntime <Inference/PyTorch/accelerate_pytorch_inference_onnx.html>`_
 | 
				
			||||||
* `How to accelerate a PyTorch inference pipeline through OpenVINO <Inference/PyTorch/accelerate_pytorch_inference_openvino.html>`_
 | 
					* `How to accelerate a PyTorch inference pipeline through OpenVINO <Inference/PyTorch/accelerate_pytorch_inference_openvino.html>`_
 | 
				
			||||||
* `How to accelerate a PyTorch inference pipeline through JIT/IPEX <Inference/PyTorch/accelerate_pytorch_inference_jit_ipex.html>`_
 | 
					* `How to accelerate a PyTorch inference pipeline through JIT/IPEX <Inference/PyTorch/accelerate_pytorch_inference_jit_ipex.html>`_
 | 
				
			||||||
 | 
					* `How to accelerate a PyTorch inference pipeline through multiple instances <Inference/PyTorch/multi_instance_pytorch_inference.html>`_
 | 
				
			||||||
* `How to quantize your PyTorch model for inference using Intel Neural Compressor <Inference/PyTorch/quantize_pytorch_inference_inc.html>`_
 | 
					* `How to quantize your PyTorch model for inference using Intel Neural Compressor <Inference/PyTorch/quantize_pytorch_inference_inc.html>`_
 | 
				
			||||||
* `How to quantize your PyTorch model for inference using OpenVINO Post-training Optimization Tools <Inference/PyTorch/quantize_pytorch_inference_pot.html>`_
 | 
					* `How to quantize your PyTorch model for inference using OpenVINO Post-training Optimization Tools <Inference/PyTorch/quantize_pytorch_inference_pot.html>`_
 | 
				
			||||||
* `How to save and load optimized IPEX model <Inference/PyTorch/pytorch_save_and_load_ipex.html>`_
 | 
					* `How to save and load optimized IPEX model <Inference/PyTorch/pytorch_save_and_load_ipex.html>`_
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -228,3 +228,40 @@ InferenceOptimizer.quantize(model,
 | 
				
			||||||
                            max_trials=10,
 | 
					                            max_trials=10,
 | 
				
			||||||
                            ):
 | 
					                            ):
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Multi-instance Acceleration
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					BigDL-Nano also provides multi-instance inference. To use it, you should call `multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=n)` first, where `num_processes` specifies the number of processes you want to use.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					After calling it, `multi_model` will receive a `DataLoader` or a list of batches instead of a batch, and produce a list of inference result instead of a single result. You can use it as following:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```python
 | 
				
			||||||
 | 
					multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=4)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					# predict a DataLoader
 | 
				
			||||||
 | 
					y_hat_list = multi_model(dataloader)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					# or predict a list of batches instead of entire DataLoader
 | 
				
			||||||
 | 
					it = iter(dataloader)
 | 
				
			||||||
 | 
					batch_list = []
 | 
				
			||||||
 | 
					for i in range(10):
 | 
				
			||||||
 | 
					    batch = next(it)
 | 
				
			||||||
 | 
					    batch_list.append(batch)
 | 
				
			||||||
 | 
					y_hat_list = multi_model(batch_list)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					# y_hat_list is a list of inference result, you can use it like this
 | 
				
			||||||
 | 
					for y_hat in y_hat_list:
 | 
				
			||||||
 | 
					    do_something(y_hat)
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					`InferenceOptimizer.to_multi_instance` also has a parameter named `cores_per_process` to specify the number of CPU cores used by each process, and a parameter named `cpu_for_each_process` to specify the CPU cores used by each process. Normally you don't need to set them manually, BigDL-Nano will find the best configuration automatically. But if you want, you can use them as following:
 | 
				
			||||||
 | 
					```python
 | 
				
			||||||
 | 
					# Use 4 processes to run inference,
 | 
				
			||||||
 | 
					# each process will use 2 CPU cores
 | 
				
			||||||
 | 
					multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=4, cores_per_process=2)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					# Use 4 processes to run inference,
 | 
				
			||||||
 | 
					# the first process will use core 0, the second process will use core 1,
 | 
				
			||||||
 | 
					# the third process will use core 2 and 3, the fourth process will use core 4 and 5
 | 
				
			||||||
 | 
					multi_model = InferenceOptimizer.to_multi_instance(model, cpu_for_each_process=[[0], [1], [2,3], [4,5]])
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in a new issue