[Nano] How-To Guides: Training - TensorFlow (#5836)

* Add basic guides structure of Training - TensorFlow

* Add how-to guides: How to accelerate a TensorFlow Keras application on training workloads through multiple instances

* Change import order and add pip install for tensorflow-dataset

* Diable other nano tests for now

* Add github action tests for how-to guides Tensorflow training

* Use jupyter nbconvert to test notebooks for training tensorflow instead to avoid errors

* Add how-to guide: How to optimize your model with a sparse Embedding layer and SparseAdam optimizer

* Enable other nano tests again

* Small Revision: fix typos

* Small Revision: refactor some sentences

* Revision: refactor contents based on comments

* Add How-to guides: How to choose the number of processes for multi-instance training

* Small Revision: fix typos and refactor some sentences

* Make timeout time for github action longer for TensorFlow, 600s->700s
This commit is contained in:
Yuwen Hu 2022-09-26 15:40:22 +08:00 committed by GitHub
parent 0cabc79133
commit 331c3054d9
4 changed files with 76 additions and 0 deletions

View file

@ -0,0 +1,44 @@
# Choose the Number of Porcesses for Multi-Instance Training
BigDL-Nano supports multi-instance training on a server with multiple CPU cores or sockets. With Nano, you could launch a self-defined number of processes to perform data-parallel training. When choosing the number of processes, there are 3 empirical recommendations for better training performance:
1. There should be at least 7 CPU cores assigned to each process.
2. For multiple sockets, the CPU cores assiged to each process should belong to the same socket (due to NUMA issue). That is, the number of CPU cores per process should be a divisor of the number of CPU cores placed in each sockets.
3. Only physical CPU cores should be considered (do not count in CPU cores for hyperthreading).
```eval_rst
.. note::
By default, Nano will distribute CPU cores evenly among processes.
```
Here is an example. Suppose we have a sever with 2 sockets. Each socket has 28 physical CPU cores. For this case, the number of CPU cores per process c should satisfiy:
```eval_rst
.. math::
\begin{cases}
c \text{ is divisor of } 28 \\
c \ge 7 \\
\end{cases} \Rightarrow
c \in \{7, 14, 28\}
```
Based on that, the number of processes np can be calculated as:
```eval_rst
.. math::
\begin{cases}
np = \frac{28+28}{c}\ , c \in \{7, 14, 28\} \\
np > 1 \\
\end{cases} \Rightarrow np = \text{8 or 4 or 2}
```
That is, empirically, we could set the number of processes to 2, 4 or 8 here for good training performance.
```eval_rst
.. card::
**Related Readings**
^^^
* `How to accelerate a PyTorch Lightning application on training workloads through multiple instances <../PyTorchLightning/accelerate_pytorch_lightning_training_multi_instance.html>`_
* `How to accelerate a TensorFlow Keras application on training workloads through multiple instances <../TensorFlow/accelerate_tensorflow_training_multi_instance.html>`_
```

View file

@ -0,0 +1,3 @@
{
"path": "../../../../../../../../python/nano/tutorial/notebook/training/tensorflow/accelerate_tensorflow_training_multi_instance.ipynb"
}

View file

@ -0,0 +1,3 @@
{
"path": "../../../../../../../../python/nano/tutorial/notebook/training/tensorflow/tensorflow_training_embedding_sparseadam.ipynb"
}

View file

@ -26,6 +26,32 @@ PyTorch Lightning
Training/PyTorchLightning/pytorch_lightning_training_bf16
Training/PyTorchLightning/pytorch_lightning_cv_data_pipeline
TensorFlow
~~~~~~~~~~~~~~~~~~~~~~~~~
* `How to accelerate a TensorFlow Keras application on training workloads through multiple instances <Training/TensorFlow/accelerate_tensorflow_training_multi_instance.html>`_
* |tensorflow_training_embedding_sparseadam_link|_
.. |tensorflow_training_embedding_sparseadam_link| replace:: How to optimize your model with a sparse ``Embedding`` layer and ``SparseAdam`` optimizer
.. _tensorflow_training_embedding_sparseadam_link: Training/TensorFlow/tensorflow_training_embedding_sparseadam.html
.. toctree::
:maxdepth: 1
:hidden:
Training/TensorFlow/accelerate_tensorflow_training_multi_instance
Training/TensorFlow/tensorflow_training_embedding_sparseadam
General
~~~~~~~~~~~~~~~~~~~~~~~~~
* `How to choose the number of processes for multi-instance training <Training/General/choose_num_processes_training.html>`_
.. toctree::
:maxdepth: 1
:hidden:
Training/General/choose_num_processes_training
Inference Optimization
-------------------------