Nano: Add key feature and how to guide for multi-instance inference (#6901)
This commit is contained in:
parent
99d6decfeb
commit
664dfbe7ef
4 changed files with 43 additions and 1 deletions
|
|
@ -115,6 +115,7 @@ subtrees:
|
|||
- file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_onnx
|
||||
- file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_openvino
|
||||
- file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_jit_ipex
|
||||
- file: doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference
|
||||
- file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_inc
|
||||
- file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_pot
|
||||
- file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_ipex
|
||||
|
|
|
|||
|
|
@ -0,0 +1,3 @@
|
|||
{
|
||||
"path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/multi_instance_pytorch_inference.ipynb"
|
||||
}
|
||||
|
|
@ -50,6 +50,7 @@ PyTorch
|
|||
* `How to accelerate a PyTorch inference pipeline through ONNXRuntime <Inference/PyTorch/accelerate_pytorch_inference_onnx.html>`_
|
||||
* `How to accelerate a PyTorch inference pipeline through OpenVINO <Inference/PyTorch/accelerate_pytorch_inference_openvino.html>`_
|
||||
* `How to accelerate a PyTorch inference pipeline through JIT/IPEX <Inference/PyTorch/accelerate_pytorch_inference_jit_ipex.html>`_
|
||||
* `How to accelerate a PyTorch inference pipeline through multiple instances <Inference/PyTorch/multi_instance_pytorch_inference.html>`_
|
||||
* `How to quantize your PyTorch model for inference using Intel Neural Compressor <Inference/PyTorch/quantize_pytorch_inference_inc.html>`_
|
||||
* `How to quantize your PyTorch model for inference using OpenVINO Post-training Optimization Tools <Inference/PyTorch/quantize_pytorch_inference_pot.html>`_
|
||||
* `How to save and load optimized IPEX model <Inference/PyTorch/pytorch_save_and_load_ipex.html>`_
|
||||
|
|
|
|||
|
|
@ -227,4 +227,41 @@ InferenceOptimizer.quantize(model,
|
|||
approach='static',
|
||||
max_trials=10,
|
||||
):
|
||||
```
|
||||
```
|
||||
|
||||
## Multi-instance Acceleration
|
||||
|
||||
BigDL-Nano also provides multi-instance inference. To use it, you should call `multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=n)` first, where `num_processes` specifies the number of processes you want to use.
|
||||
|
||||
After calling it, `multi_model` will receive a `DataLoader` or a list of batches instead of a batch, and produce a list of inference result instead of a single result. You can use it as following:
|
||||
|
||||
```python
|
||||
multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=4)
|
||||
|
||||
# predict a DataLoader
|
||||
y_hat_list = multi_model(dataloader)
|
||||
|
||||
# or predict a list of batches instead of entire DataLoader
|
||||
it = iter(dataloader)
|
||||
batch_list = []
|
||||
for i in range(10):
|
||||
batch = next(it)
|
||||
batch_list.append(batch)
|
||||
y_hat_list = multi_model(batch_list)
|
||||
|
||||
# y_hat_list is a list of inference result, you can use it like this
|
||||
for y_hat in y_hat_list:
|
||||
do_something(y_hat)
|
||||
```
|
||||
|
||||
`InferenceOptimizer.to_multi_instance` also has a parameter named `cores_per_process` to specify the number of CPU cores used by each process, and a parameter named `cpu_for_each_process` to specify the CPU cores used by each process. Normally you don't need to set them manually, BigDL-Nano will find the best configuration automatically. But if you want, you can use them as following:
|
||||
```python
|
||||
# Use 4 processes to run inference,
|
||||
# each process will use 2 CPU cores
|
||||
multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=4, cores_per_process=2)
|
||||
|
||||
# Use 4 processes to run inference,
|
||||
# the first process will use core 0, the second process will use core 1,
|
||||
# the third process will use core 2 and 3, the fourth process will use core 4 and 5
|
||||
multi_model = InferenceOptimizer.to_multi_instance(model, cpu_for_each_process=[[0], [1], [2,3], [4,5]])
|
||||
```
|
||||
|
|
|
|||
Loading…
Reference in a new issue