From 0f78a568c74a4148416490918e35c71d281bab52 Mon Sep 17 00:00:00 2001
From: Yishuo Wang <yishuo.wang@intel.com>
Date: Tue, 2 Aug 2022 09:15:50 +0800
Subject: [PATCH] add tutorial, notebook and API doc for LightningLite (#5114)

* add tutorial, notebook and API doc for LightningLite
* increase the timeout when running tutorial notebook
* change docs and tutorial to use new API
---
 .../source/doc/Nano/Overview/nano.md          |  15 +-
 .../source/doc/Nano/QuickStart/index.md       |  13 +-
 .../doc/Nano/QuickStart/pytorch_nano.md       | 174 ++++++++++++++++++
 .../doc/Nano/QuickStart/pytorch_train.md      |  32 ++++
 .../QuickStart/pytorch_train_quickstart.md    |   4 +-
 .../source/doc/PythonAPI/Nano/pytorch.rst     |  10 +-
 6 files changed, 242 insertions(+), 6 deletions(-)
 create mode 100644 docs/readthedocs/source/doc/Nano/QuickStart/pytorch_nano.md

diff --git a/docs/readthedocs/source/doc/Nano/Overview/nano.md b/docs/readthedocs/source/doc/Nano/Overview/nano.md
index 7b7e9594..ec072373 100644
--- a/docs/readthedocs/source/doc/Nano/Overview/nano.md
+++ b/docs/readthedocs/source/doc/Nano/Overview/nano.md
@@ -49,7 +49,7 @@ BigDL-Nano supports both PyTorch and PyTorch Lightning models and most optimizat
 
 BigDL-Nano uses a extended version of PyTorch Lightning trainer for integrating our optimizations.
 
-For example, if you are using a LightingModule, you can use the following code enable intel-extension-for-pytorch and multi-instance training.
+For example, if you are using a LightingModule, you can use the following code to enable intel-extension-for-pytorch and multi-instance training.
 
 ```python
 from bigdl.nano.pytorch import Trainer
@@ -59,6 +59,19 @@ trainer = Trainer(max_epochs=1, use_ipex=True, num_processes=4)
 trainer.fit(net, train_loader)
 ```
 
+If you are using custom training loop, you can use the following code to enable intel-extension-for-pytorch, multi-instance training and other nano's optimizations.
+
+```python
+from bigdl.nano.pytorch import TorchNano
+
+class MyNano(TorchNano):
+    def train(...):
+      # copy your train loop here and make a few changes
+      ...
+
+MyNano(use_ipex=True, num_processes=2).train()
+```
+
 For more details on the BigDL-Nano's PyTorch usage, please refer to the [PyTorch Training](../QuickStart/pytorch_train.md) and [PyTorch Inference](../QuickStart/pytorch_inference.md) page.
 
 ### **3.2 TensorFlow**
diff --git a/docs/readthedocs/source/doc/Nano/QuickStart/index.md b/docs/readthedocs/source/doc/Nano/QuickStart/index.md
index f2c9bd73..7417a562 100644
--- a/docs/readthedocs/source/doc/Nano/QuickStart/index.md
+++ b/docs/readthedocs/source/doc/Nano/QuickStart/index.md
@@ -1,9 +1,17 @@
 # Nano Tutorial
-- [**BigDL-Nano PyTorch Training Quickstart**](./pytorch_train_quickstart.html)
+- [**BigDL-Nano PyTorch Trainer Quickstart**](./pytorch_train_quickstart.html)
 
     > ![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub][Nano_pytorch_training]
 
-    In this guide we will describe how to scale out PyTorch programs using Nano
+    In this guide we will describe how to scale out PyTorch programs using Nano Trainer
+
+---------------------------
+
+- [**BigDL-Nano PyTorch TorchNano Quickstart**](./pytorch_nano.html)
+
+    > ![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub][Nano_pytorch_nano]
+
+    In this guide we'll describe how to use BigDL-Nano to accelerate custom training loop easily with very few changes
 
 ---------------------------
 
@@ -66,6 +74,7 @@
 
 
 [Nano_pytorch_training]: <https://github.com/intel-analytics/BigDL/blob/main/python/nano/notebooks/pytorch/tutorial/pytorch_train.ipynb>
+[Nano_pytorch_nano]: <https://github.com/intel-analytics/BigDL/blob/main/python/nano/notebooks/pytorch/tutorial/pytorch_nano.ipynb>
 [Nano_pytorch_onnxruntime]: <https://github.com/intel-analytics/BigDL/blob/main/python/nano/notebooks/pytorch/tutorial/pytorch_inference_onnx.ipynb>
 [Nano_pytorch_openvino]: <https://github.com/intel-analytics/BigDL/blob/main/python/nano/notebooks/pytorch/tutorial/pytorch_inference_openvino.ipynb>
 [Nano_pytorch_Quantization_inc]: <https://github.com/intel-analytics/BigDL/blob/main/python/nano/notebooks/pytorch/tutorial/pytorch_quantization_inc.ipynb>
diff --git a/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_nano.md b/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_nano.md
new file mode 100644
index 00000000..613c9dc5
--- /dev/null
+++ b/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_nano.md
@@ -0,0 +1,174 @@
+# BigDL-Nano Pytorch TorchNano Quickstart
+
+**In this guide we'll demonstrate how to use BigDL-Nano to accelerate custom train loop easily with very few changes.**
+
+### **Step 0: Prepare Environment**
+
+We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../../UserGuide/python.md) for more details.
+
+```bash
+conda create py37 python==3.7.10 setuptools==58.0.4
+conda activate py37
+# nightly bulit version
+pip install --pre --upgrade bigdl-nano[pytorch]
+# set env variables for your conda environment
+source bigdl-nano-init
+```
+
+### **Step 1: Load the Data**
+
+Import Cifar10 dataset from torch_vision and modify the train transform. You could access [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) for a view of the whole dataset.
+
+Leveraging OpenCV and libjpeg-turbo, BigDL-Nano can accelerate computer vision data pipelines by providing a drop-in replacement of torch_vision's `datasets` and `transforms`.
+
+```python
+from torch.utils.data import DataLoader, Subset
+
+from bigdl.nano.pytorch.vision import transforms
+from bigdl.nano.pytorch.vision.datasets import CIFAR10
+
+def create_dataloader(data_path, batch_size):
+    train_transform = transforms.Compose([
+        transforms.Resize(256),
+        transforms.ColorJitter(),
+        transforms.RandomCrop(224),
+        transforms.RandomHorizontalFlip(),
+        transforms.Resize(128),
+        transforms.ToTensor()
+    ])
+
+    full_dataset = CIFAR10(root=data_path, train=True,
+                           download=True, transform=train_transform)
+
+    # use a subset of full dataset to shorten the training time
+    train_dataset = Subset(dataset=full_dataset, indices=list(range(len(full_dataset) // 40)))
+
+    train_loader = DataLoader(train_dataset, batch_size=batch_size,
+                              shuffle=True, num_workers=0)
+
+    return train_loader
+```
+
+### **Step 2: Define the Model**
+
+You may define your model in the same way as the standard PyTorch models.
+
+```python
+from torch import nn
+
+from bigdl.nano.pytorch.vision.models import vision
+
+class ResNet18(nn.Module):
+    def __init__(self, num_classes, pretrained=True, include_top=False, freeze=True):
+        super().__init__()
+        backbone = vision.resnet18(pretrained=pretrained, include_top=include_top, freeze=freeze)
+        output_size = backbone.get_output_size()
+        head = nn.Linear(output_size, num_classes)
+        self.model = nn.Sequential(backbone, head)
+
+    def forward(self, x):
+        return self.model(x)
+```
+
+### **Step 3: Define Train Loop**
+
+Suppose the custom train loop is as follows:
+
+```python
+import os
+import torch
+
+data_path = os.environ.get("DATA_PATH", ".")
+batch_size = 256
+max_epochs = 10
+lr = 0.01
+
+model = ResNet18(10, pretrained=False, include_top=False, freeze=True)
+loss_func = nn.CrossEntropyLoss()
+optimizer = torch.optim.Adam(model.parameters(), lr=lr)
+train_loader = create_dataloader(data_path, batch_size)
+
+model.train()
+
+for _i in range(max_epochs):
+    total_loss, num = 0, 0
+    for X, y in train_loader:
+        optimizer.zero_grad()
+        loss = loss_func(model(X), y)
+        loss.backward()
+        optimizer.step()
+        
+        total_loss += loss.sum()
+        num += 1
+    print(f'avg_loss: {total_loss / num}')
+```
+
+The `TorchNano` (`bigdl.nano.pytorch.TorchNano`) class is what we use to accelerate raw pytorch code. By using it, we only need to make very few changes to accelerate custom training loop.
+
+We only need the following steps:
+
+- define a class `MyNano` derived from our `TorchNano`
+- copy all lines of code into the `train` method of `MyNano`
+- add one line to setup model, optimizer and dataloader
+- replace the `loss.backward()` with `self.backward(loss)`
+
+```python
+import os
+import torch
+
+from bigdl.nano.pytorch import TorchNano
+
+class MyNano(TorchNano):
+    def train(self):
+        # copy all lines of code into this method
+        data_path = os.environ.get("DATA_PATH", ".")
+        batch_size = 256
+        max_epochs = 10
+        lr = 0.01
+
+        model = ResNet18(10, pretrained=False, include_top=False, freeze=True)
+        loss_func = nn.CrossEntropyLoss()
+        optimizer = torch.optim.Adam(model.parameters(), lr=lr)
+        train_loader = create_dataloader(data_path, batch_size)
+
+        # add this line to setup model, optimizer and dataloaders
+        model, optimizer, train_loader = self.setup(model, optimizer, train_loader)
+
+        model.train()
+
+        for _i in range(max_epochs):
+            total_loss, num = 0, 0
+            for X, y in train_loader:
+                optimizer.zero_grad()
+                loss = loss_func(model(X), y)
+                self.backward(loss)     # modify this line
+                optimizer.step()
+                
+                total_loss += loss.sum()
+                num += 1
+            print(f'avg_loss: {total_loss / num}')
+```
+
+### **Step 4: Run with Nano TorchNano**
+
+```python
+MyNano().train()
+```
+
+At this stage, you may already experience some speedup due to the optimized environment variables set by source bigdl-nano-init. Besides, you can also enable optimizations delivered by BigDL-Nano by setting a paramter or calling a method to accelerate PyTorch application on training workloads.
+
+#### Increase the number of processes in distributed training to accelerate training.
+
+```python
+MyNano(num_processes=2, strategy="subprocess").train()
+```
+
+- Note: BigDL-Nano now support 'spawn', 'subprocess' and 'ray' strategies for distributed training, but only the 'subprocess' strategy can be used in interactive environment.
+
+#### Intel Extension for Pytorch (a.k.a. [IPEX](https://github.com/intel/intel-extension-for-pytorch))
+
+IPEX extends Pytorch with optimizations on intel hardware. BigDL-Nano also integrates IPEX into the `TorchNano`, you can turn on IPEX optimization by setting `use_ipex=True`.
+
+```python
+MyNano(use_ipex=True, num_processes=2, strategy="subprocess").train()
+```
diff --git a/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_train.md b/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_train.md
index 874dd4a4..0ffd4c23 100644
--- a/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_train.md
+++ b/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_train.md
@@ -60,6 +60,38 @@ trainer = Trainer(max_epoch=10, num_processes=4)
 
 Note that the effective batch size multi-instance training is the `batch_size` in your `dataloader` times `num_processes` so the number of iterations of each epoch will be reduced `num_processes` fold. A common practice to compensate for that is to gradually increase the learning rate to `num_processes` times. You can find more details of this trick in the [Facebook paper](https://arxiv.org/abs/1706.02677).
 
+### BigDL-Nano PyTorch TorchNano
+
+The `TorchNano` (`bigdl.nano.pytorch.TorchNano`) class is what we use to accelerate raw pytorch code. By using it, we only need to make very few changes to accelerate custom training loop. For example,
+
+```python
+from bigdl.nano.pytorch import TorchNano
+
+class MyNano(TorchNano) :
+    def train(self, ...):
+        # copy your train loop here and make a few changes
+
+MyNano().train(...)
+```
+
+- note: see [this tutorial](./pytorch_nano.html) for details about our `TorchNano`.
+
+Our `TorchNano` also integrates IPEX and distributed training optimizations. For example,
+
+```python
+from bigdl.nano.pytorch import TorchNano
+
+class MyNano(TorchNano):
+    def train(self, ...):
+        # define train loop
+
+# enable IPEX optimizaiton
+MyNano(use_ipex=True).train(...)
+
+# enable IPEX and distributed training, using subprocess strategy
+MyNano(use_ipex=True, num_processes=2, strategy="subprocess").train(...)
+```
+
 ### Optimized Data pipeline
 
 Computer Vision task often needs a data processing pipeline that sometimes constitutes a non-trivial part of the whole training pipeline. Leveraging OpenCV and libjpeg-turbo, BigDL-Nano can accelerate computer vision data pipelines by providing a drop-in replacement of torch_vision's `datasets` and `transforms`.
diff --git a/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_train_quickstart.md b/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_train_quickstart.md
index f009c3a7..1a2d3113 100644
--- a/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_train_quickstart.md
+++ b/docs/readthedocs/source/doc/Nano/QuickStart/pytorch_train_quickstart.md
@@ -1,6 +1,6 @@
-# BigDL-Nano PyTorch Training Quickstart
+# BigDL-Nano PyTorch Trainer Quickstart
 
-**In this guide we will describe how to scale out PyTorch programs using Nano in 5 simple steps**
+**In this guide we will describe how to scale out PyTorch programs using Nano Trainer in 5 simple steps**
 
 ### **Step 0: Prepare Environment**
 
diff --git a/docs/readthedocs/source/doc/PythonAPI/Nano/pytorch.rst b/docs/readthedocs/source/doc/PythonAPI/Nano/pytorch.rst
index e5552362..52a87cd2 100644
--- a/docs/readthedocs/source/doc/PythonAPI/Nano/pytorch.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/Nano/pytorch.rst
@@ -1,10 +1,18 @@
 Nano PyTorch API
 ==================
 
-bigdl.nano.pytorch
+bigdl.nano.pytorch.Trainer
 ---------------------------
 
 .. autoclass:: bigdl.nano.pytorch.Trainer
     :members:
     :undoc-members:
     :exclude-members: accelerator_connector, checkpoint_connector, reload_dataloaders_every_n_epochs, limit_val_batches, logger, logger_connector, state
+
+bigdl.nano.pytorch.TorchNano
+---------------------------
+
+.. autoclass:: bigdl.nano.pytorch.TorchNano
+    :members:
+    :undoc-members:
+    :exclude-members: run