diff --git a/docs/readthedocs/source/_toc.yml b/docs/readthedocs/source/_toc.yml index 5cd7a972..27ce4f9f 100644 --- a/docs/readthedocs/source/_toc.yml +++ b/docs/readthedocs/source/_toc.yml @@ -115,6 +115,7 @@ subtrees: - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_onnx - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_openvino - file: doc/Nano/Howto/Inference/PyTorch/accelerate_pytorch_inference_jit_ipex + - file: doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference - file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_inc - file: doc/Nano/Howto/Inference/PyTorch/quantize_pytorch_inference_pot - file: doc/Nano/Howto/Inference/PyTorch/pytorch_save_and_load_ipex diff --git a/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference.nblink b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference.nblink new file mode 100644 index 00000000..878428b7 --- /dev/null +++ b/docs/readthedocs/source/doc/Nano/Howto/Inference/PyTorch/multi_instance_pytorch_inference.nblink @@ -0,0 +1,3 @@ +{ + "path": "../../../../../../../../python/nano/tutorial/notebook/inference/pytorch/multi_instance_pytorch_inference.ipynb" +} \ No newline at end of file diff --git a/docs/readthedocs/source/doc/Nano/Howto/index.rst b/docs/readthedocs/source/doc/Nano/Howto/index.rst index e7f74736..1c78af3f 100644 --- a/docs/readthedocs/source/doc/Nano/Howto/index.rst +++ b/docs/readthedocs/source/doc/Nano/Howto/index.rst @@ -50,6 +50,7 @@ PyTorch * `How to accelerate a PyTorch inference pipeline through ONNXRuntime `_ * `How to accelerate a PyTorch inference pipeline through OpenVINO `_ * `How to accelerate a PyTorch inference pipeline through JIT/IPEX `_ +* `How to accelerate a PyTorch inference pipeline through multiple instances `_ * `How to quantize your PyTorch model for inference using Intel Neural Compressor `_ * `How to quantize your PyTorch model for inference using OpenVINO Post-training Optimization Tools `_ * `How to save and load optimized IPEX model `_ diff --git a/docs/readthedocs/source/doc/Nano/Overview/pytorch_inference.md b/docs/readthedocs/source/doc/Nano/Overview/pytorch_inference.md index 04371ee2..2e0151fa 100644 --- a/docs/readthedocs/source/doc/Nano/Overview/pytorch_inference.md +++ b/docs/readthedocs/source/doc/Nano/Overview/pytorch_inference.md @@ -227,4 +227,41 @@ InferenceOptimizer.quantize(model, approach='static', max_trials=10, ): -``` \ No newline at end of file +``` + +## Multi-instance Acceleration + +BigDL-Nano also provides multi-instance inference. To use it, you should call `multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=n)` first, where `num_processes` specifies the number of processes you want to use. + +After calling it, `multi_model` will receive a `DataLoader` or a list of batches instead of a batch, and produce a list of inference result instead of a single result. You can use it as following: + +```python +multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=4) + +# predict a DataLoader +y_hat_list = multi_model(dataloader) + +# or predict a list of batches instead of entire DataLoader +it = iter(dataloader) +batch_list = [] +for i in range(10): + batch = next(it) + batch_list.append(batch) +y_hat_list = multi_model(batch_list) + +# y_hat_list is a list of inference result, you can use it like this +for y_hat in y_hat_list: + do_something(y_hat) +``` + +`InferenceOptimizer.to_multi_instance` also has a parameter named `cores_per_process` to specify the number of CPU cores used by each process, and a parameter named `cpu_for_each_process` to specify the CPU cores used by each process. Normally you don't need to set them manually, BigDL-Nano will find the best configuration automatically. But if you want, you can use them as following: +```python +# Use 4 processes to run inference, +# each process will use 2 CPU cores +multi_model = InferenceOptimizer.to_multi_instance(model, num_processes=4, cores_per_process=2) + +# Use 4 processes to run inference, +# the first process will use core 0, the second process will use core 1, +# the third process will use core 2 and 3, the fourth process will use core 4 and 5 +multi_model = InferenceOptimizer.to_multi_instance(model, cpu_for_each_process=[[0], [1], [2,3], [4,5]]) +```