From 210677a29538dd3da622beab5714653ee80a2e92 Mon Sep 17 00:00:00 2001
From: Yang Wang <yang3.wang@intel.com>
Date: Mon, 16 May 2022 14:14:30 +0800
Subject: [PATCH] Add nano documentation for pytorch training quickstart
 (#4577)

* Add nano documentation for pytorch training quickstart

* fix some typo

* break pytorch training and inference to two files

* simplify trainer import

* fix train doc
---
 .../source/doc/Nano/Overview/nano.md          |  2 +-
 .../source/doc/Nano/QuickStart/pytorch.md     | 27 ------
 .../doc/Nano/QuickStart/pytorch_inference.md  |  8 ++
 .../doc/Nano/QuickStart/pytorch_train.md      | 83 +++++++++++++++++++
 4 files changed, 92 insertions(+), 28 deletions(-)
 delete mode 100644 docs/readthedocs/source/doc/Nano/QuickStart/pytorch.md
 create mode 100644 docs/readthedocs/source/doc/Nano/QuickStart/pytorch_inference.md
 create mode 100644 docs/readthedocs/source/doc/Nano/QuickStart/pytorch_train.md

diff --git a/docs/readthedocs/source/doc/Nano/Overview/nano.md b/docs/readthedocs/source/doc/Nano/Overview/nano.md
index 6f164b28..12f10933 100644
--- a/docs/readthedocs/source/doc/Nano/Overview/nano.md
+++ b/docs/readthedocs/source/doc/Nano/Overview/nano.md
@@ -59,7 +59,7 @@ trainer = Trainer(max_epochs=1, use_ipex=True, num_processes=4)
 trainer.fit(net, train_loader)
 ```
 
-For more details on the BigDL-Nano's PyTorch usage, please refer to the [PyTorch](../QuickStart/pytorch.md) page.
+For more details on the BigDL-Nano's PyTorch usage, please refer to the [PyTorch Training](../QuickStart/pytorch_train.md) and [PyTorch Inference](../QuickStart/pytorch_inference.md) page.
 
 ### **3.2 TensorFlow**
 
diff --git a/docs/readthedocs/source/doc/Nano/QuickStart/pytorch.md b/docs/readthedocs/source/doc/Nano/QuickStart/pytorch.md
deleted file mode 100644
index 76965009..00000000
--- a/docs/readthedocs/source/doc/Nano/QuickStart/pytorch.md
+++ /dev/null
@@ -1,27 +0,0 @@
-# BigDL-Nano PyTorch Overview
-
-BigDL-Nano can be used to accelerate PyTorch or PyTorch-Lightning applications on both training and inference workloads. The optimizations in BigDL-Nano are delivered through a extended version of PyTorch-Lightning `Trainer`. These optimizations are either enabled by default, or can be easily turned on by setting a parameter or calling a method.
-
-## PyTorch Training
-
-### Best Known Configurations
-
-### BigDL-Nano PyTorch Trainer
-
-#### Intel® Extension for PyTorch
-
-#### Multi-instance Training
-
-### Optimized Data pipeline
-
-### Optimizers
-
-### Notebooks
-
-## PyTorch Inference
-
-### Runtime Acceleration
-
-### Quantization
-
-### Notebooks
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_inference.md b/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_inference.md
new file mode 100644
index 00000000..4ad70fa5
--- /dev/null
+++ b/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_inference.md
@@ -0,0 +1,8 @@
+# BigDL-Nano PyTorch Inference Overview
+
+add a link for examples here.
+
+###  Runtime Acceleration
+onnx runtime, openvino
+
+### Quantization
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_train.md b/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_train.md
new file mode 100644
index 00000000..60549be5
--- /dev/null
+++ b/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_train.md
@@ -0,0 +1,83 @@
+# BigDL-Nano PyTorch Training Overview
+
+BigDL-Nano can be used to accelerate PyTorch or PyTorch-Lightning applications on training workloads. The optimizations in BigDL-Nano are delivered through an extended version of PyTorch-Lightning `Trainer`. These optimizations are either enabled by default or can be easily turned on by setting a parameter or calling a method.
+
+We will briefly describe here the major features in BigDL-Nano for PyTorch training. You can find complete examples here [links to be added]().
+
+### Best Known Configurations
+
+When you run `source bigdl-nano-init`, BigDL-Nano will export a few environment variables, such as OMP_NUM_THREADS and KMP_AFFINITY, according to your current hardware. Empirically, these environment variables work best for most PyTorch applications. After setting these environment variables, you can just run your applications as usual (`python app.py`) and no additional changes are required.
+
+### BigDL-Nano PyTorch Trainer
+
+The PyTorch Trainer (`bigdl.nano.pytorch.Trainer`) is the place where we integrate most optimizations. It extends PyTorch Lightning's Trainer and has a few more parameters and methods specific to BigDL-Nano. The Trainer can be directly used to train a `LightningModule`.
+
+For example,
+
+```python
+from pytorch_lightning import LightningModule
+from bigdl.nano.pytorch import Trainer
+
+class MyModule(LightningModule):
+    #  LightningModule definition
+
+from bigdl.nano.pytorch import Trainer
+lightning_module = MyModule()
+trainer = Trainer(max_epoch=10)
+trainer.fit(lightning_module, train_loader)
+```
+
+For regular PyTorch modules, we also provide a "compile" method, that takes in a PyTorch module, an optimizer, and other PyTorch objects and "compiles" them into a `LightningModule`.
+
+For example,
+
+```python
+from bigdl.nano.pytorch import Trainer
+lightning_module = Trainer.compile(pytorch_module, loss, optimizer, scheduler)
+trainer = Trainer(max_epoch=10)
+trainer.fit(lightning_module, train_loader)
+```
+
+#### Intel® Extension for PyTorch
+
+Intel Extension for Pytorch (a.k.a. IPEX) extends PyTorch with optimizations for an extra performance boost on Intel hardware. BigDL-Nano integrates IPEX through the `Trainer`. Users can turn on IPEX by setting `use_ipex=True`.
+
+```python
+from bigdl.nano.pytorch import Trainer
+trainer = Trainer(max_epoch=10, use_ipex=True)
+```
+
+#### Multi-instance Training
+
+When training on a server with dozens of CPU cores, it is often beneficial to use multiple training instances in a data-parallel fashion to make full use of the CPU cores. However, using PyTorch's DDP API is a little cumbersome and error-prone, and if not configured correctly, it will make the training even slow.
+
+BigDL-Nano makes it very easy to conduct multi-instance training. You can just set the `num_processes` parameter in the `Trainer` constructor and BigDL-Nano will launch the specific number of processes to perform data-parallel training. Each process will be automatically pinned to a different subset of CPU cores to avoid conflict and maximize training throughput.
+
+```python
+from bigdl.nano.pytorch import Trainer
+trainer = Trainer(max_epoch=10, num_processes=4)
+```
+
+Note that the effective batch size multi-instance training is the `batch_size` in your `dataloader` times `num_processes` so the number of iterations of each epoch will be reduced `num_processes` fold. A common practice to compensate for that is to gradually increase the learning rate to `num_processes` times. You can find more details of this trick in the [Facebook paper](https://arxiv.org/abs/1706.02677).
+
+### Optimized Data pipeline
+
+Computer Vision task often needs a data processing pipeline that sometimes constitutes a non-trivial part of the whole training pipeline. Leveraging OpenCV and libjpeg-turbo, BigDL-Nano can accelerate computer vision data pipelines by providing a drop-in replacement of torch_vision's `datasets` and `transforms`.
+
+```python
+from bigdl.nano.pytorch.vision.datasets import ImageFolder
+from bigdl.nano.pytorch.vision import transforms
+
+data_transform = transforms.Compose([
+    transforms.Resize(256),
+    transforms.ColorJitter(),
+    transforms.RandomCrop(224),
+    transforms.RandomHorizontalFlip(),
+    transforms.Resize(128),
+    transforms.ToTensor()
+])
+
+train_set = ImageFolder(train_path, data_transform)
+train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True)
+trainer.fit(module, train_loader)
+```
\ No newline at end of file