[Nano] How-To Guides: Training - TensorFlow (#5836)
* Add basic guides structure of Training - TensorFlow * Add how-to guides: How to accelerate a TensorFlow Keras application on training workloads through multiple instances * Change import order and add pip install for tensorflow-dataset * Diable other nano tests for now * Add github action tests for how-to guides Tensorflow training * Use jupyter nbconvert to test notebooks for training tensorflow instead to avoid errors * Add how-to guide: How to optimize your model with a sparse Embedding layer and SparseAdam optimizer * Enable other nano tests again * Small Revision: fix typos * Small Revision: refactor some sentences * Revision: refactor contents based on comments * Add How-to guides: How to choose the number of processes for multi-instance training * Small Revision: fix typos and refactor some sentences * Make timeout time for github action longer for TensorFlow, 600s->700s
This commit is contained in:
parent
0cabc79133
commit
331c3054d9
4 changed files with 76 additions and 0 deletions
|
|
@ -0,0 +1,44 @@
|
|||
# Choose the Number of Porcesses for Multi-Instance Training
|
||||
|
||||
BigDL-Nano supports multi-instance training on a server with multiple CPU cores or sockets. With Nano, you could launch a self-defined number of processes to perform data-parallel training. When choosing the number of processes, there are 3 empirical recommendations for better training performance:
|
||||
|
||||
1. There should be at least 7 CPU cores assigned to each process.
|
||||
2. For multiple sockets, the CPU cores assiged to each process should belong to the same socket (due to NUMA issue). That is, the number of CPU cores per process should be a divisor of the number of CPU cores placed in each sockets.
|
||||
3. Only physical CPU cores should be considered (do not count in CPU cores for hyperthreading).
|
||||
|
||||
```eval_rst
|
||||
.. note::
|
||||
By default, Nano will distribute CPU cores evenly among processes.
|
||||
```
|
||||
|
||||
Here is an example. Suppose we have a sever with 2 sockets. Each socket has 28 physical CPU cores. For this case, the number of CPU cores per process c should satisfiy:
|
||||
|
||||
```eval_rst
|
||||
.. math::
|
||||
\begin{cases}
|
||||
c \text{ is divisor of } 28 \\
|
||||
c \ge 7 \\
|
||||
\end{cases} \Rightarrow
|
||||
c \in \{7, 14, 28\}
|
||||
```
|
||||
|
||||
Based on that, the number of processes np can be calculated as:
|
||||
|
||||
```eval_rst
|
||||
.. math::
|
||||
\begin{cases}
|
||||
np = \frac{28+28}{c}\ , c \in \{7, 14, 28\} \\
|
||||
np > 1 \\
|
||||
\end{cases} \Rightarrow np = \text{8 or 4 or 2}
|
||||
```
|
||||
|
||||
That is, empirically, we could set the number of processes to 2, 4 or 8 here for good training performance.
|
||||
|
||||
```eval_rst
|
||||
.. card::
|
||||
|
||||
**Related Readings**
|
||||
^^^
|
||||
* `How to accelerate a PyTorch Lightning application on training workloads through multiple instances <../PyTorchLightning/accelerate_pytorch_lightning_training_multi_instance.html>`_
|
||||
* `How to accelerate a TensorFlow Keras application on training workloads through multiple instances <../TensorFlow/accelerate_tensorflow_training_multi_instance.html>`_
|
||||
```
|
||||
|
|
@ -0,0 +1,3 @@
|
|||
{
|
||||
"path": "../../../../../../../../python/nano/tutorial/notebook/training/tensorflow/accelerate_tensorflow_training_multi_instance.ipynb"
|
||||
}
|
||||
|
|
@ -0,0 +1,3 @@
|
|||
{
|
||||
"path": "../../../../../../../../python/nano/tutorial/notebook/training/tensorflow/tensorflow_training_embedding_sparseadam.ipynb"
|
||||
}
|
||||
|
|
@ -26,6 +26,32 @@ PyTorch Lightning
|
|||
Training/PyTorchLightning/pytorch_lightning_training_bf16
|
||||
Training/PyTorchLightning/pytorch_lightning_cv_data_pipeline
|
||||
|
||||
TensorFlow
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
* `How to accelerate a TensorFlow Keras application on training workloads through multiple instances <Training/TensorFlow/accelerate_tensorflow_training_multi_instance.html>`_
|
||||
* |tensorflow_training_embedding_sparseadam_link|_
|
||||
|
||||
.. |tensorflow_training_embedding_sparseadam_link| replace:: How to optimize your model with a sparse ``Embedding`` layer and ``SparseAdam`` optimizer
|
||||
.. _tensorflow_training_embedding_sparseadam_link: Training/TensorFlow/tensorflow_training_embedding_sparseadam.html
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:hidden:
|
||||
|
||||
Training/TensorFlow/accelerate_tensorflow_training_multi_instance
|
||||
Training/TensorFlow/tensorflow_training_embedding_sparseadam
|
||||
|
||||
General
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
* `How to choose the number of processes for multi-instance training <Training/General/choose_num_processes_training.html>`_
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:hidden:
|
||||
|
||||
Training/General/choose_num_processes_training
|
||||
|
||||
|
||||
Inference Optimization
|
||||
-------------------------
|
||||
|
||||
|
|
|
|||
Loading…
Reference in a new issue