Nano: Add key feature and how to guide for multi-instance inference (#6901)

2022-12-13 14:18:26 +08:00 · 2022-12-13 14:18:26 +08:00 · 664dfbe7ef
commit 664dfbe7ef
parent 99d6decfeb
4 changed files with 43 additions and 1 deletions
--- a/docs/readthedocs/source/_toc.yml
+++ b/docs/readthedocs/source/_toc.yml
@ -115,6 +115,7 @@ subtrees:
                  - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_onnx
                  - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_openvino
                  - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_jit_ipex
+                  - file: doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference
                  - file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_inc
                  - file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_pot
                  - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_ipex
--- a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference.nblink
+++ b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference.nblink
@ -0,0 +1,3 @@
+{
+    "path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/multi_instance_pytorch_inference.ipynb"
+}
--- a/docs/readthedocs/source/doc/Nano/Howto/index.rst
+++ b/docs/readthedocs/source/doc/Nano/Howto/index.rst
@ -50,6 +50,7 @@ PyTorch
 * `How to accelerate a PyTorch inference pipeline through ONNXRuntime <Inference/PyTorch/accelerate_pytorch_inference_onnx.html>`_
 * `How to accelerate a PyTorch inference pipeline through OpenVINO <Inference/PyTorch/accelerate_pytorch_inference_openvino.html>`_
 * `How to accelerate a PyTorch inference pipeline through JIT/IPEX <Inference/PyTorch/accelerate_pytorch_inference_jit_ipex.html>`_
+* `How to accelerate a PyTorch inference pipeline through multiple instances <Inference/PyTorch/multi_instance_pytorch_inference.html>`_
 * `How to quantize your PyTorch model for inference using Intel Neural Compressor <Inference/PyTorch/quantize_pytorch_inference_inc.html>`_
 * `How to quantize your PyTorch model for inference using OpenVINO Post-training Optimization Tools <Inference/PyTorch/quantize_pytorch_inference_pot.html>`_
 * `How to save and load optimized IPEX model <Inference/PyTorch/pytorch_save_and_load_ipex.html>`_
--- a/docs/readthedocs/source/doc/Nano/Overview/pytorch_inference.md
+++ b/docs/readthedocs/source/doc/Nano/Overview/pytorch_inference.md
@ -227,4 +227,41 @@ InferenceOptimizer.quantize(model,
                            approach='static',
                            max_trials=10,
                            ):
-```
+```
+
+## Multi-instance Acceleration
+
+BigDL-Nano also provides multi-instance inference. To use it, you should call `multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=n)` first, where `num_processes` specifies the number of processes you want to use.
+
+After calling it, `multi_model` will receive a `DataLoader` or a list of batches instead of a batch, and produce a list of inference result instead of a single result. You can use it as following:
+
+```python
+multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=4)
+
+# predict a DataLoader
+y_hat_list = multi_model(dataloader)
+
+# or predict a list of batches instead of entire DataLoader
+it = iter(dataloader)
+batch_list = []
+for i in range(10):
+    batch = next(it)
+    batch_list.append(batch)
+y_hat_list = multi_model(batch_list)
+
+# y_hat_list is a list of inference result, you can use it like this
+for y_hat in y_hat_list:
+    do_something(y_hat)
+```
+
+`InferenceOptimizer.to_multi_instance` also has a parameter named `cores_per_process` to specify the number of CPU cores used by each process, and a parameter named `cpu_for_each_process` to specify the CPU cores used by each process. Normally you don't need to set them manually, BigDL-Nano will find the best configuration automatically. But if you want, you can use them as following:
+```python
+# Use 4 processes to run inference,
+# each process will use 2 CPU cores
+multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=4, cores_per_process=2)
+
+# Use 4 processes to run inference,
+# the first process will use core 0, the second process will use core 1,
+# the third process will use core 2 and 3, the fourth process will use core 4 and 5
+multi_model = InferenceOptimizer.to_multi_instance(model, cpu_for_each_process=[[0], [1], [2,3], [4,5]])
+```