Add Nano Known issues and some notebook link (#4667)
* Add Nano Known issues and some notebook link * fix typo
This commit is contained in:
		
							parent
							
								
									353e7ffbdf
								
							
						
					
					
						commit
						55fa3e13e6
					
				
					 3 changed files with 62 additions and 2 deletions
				
			
		
							
								
								
									
										53
									
								
								docs/readthedocs/source/doc/Nano/Overview/known_issues.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										53
									
								
								docs/readthedocs/source/doc/Nano/Overview/known_issues.md
									
									
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,53 @@
 | 
			
		|||
# Nano Known Issues
 | 
			
		||||
 | 
			
		||||
## **PyTorch Issues**
 | 
			
		||||
 | 
			
		||||
### **AttributeError: module 'distutils' has no attribute 'version'**
 | 
			
		||||
 | 
			
		||||
This usually is because the latest setuptools does not compatible with PyTorch 1.9.
 | 
			
		||||
 | 
			
		||||
You can downgrade setuptools to 58.0.4 to solve this problem.
 | 
			
		||||
 | 
			
		||||
For example, if your `setuptools` is installed by conda, you can run:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
conda install setuptools==58.0.4
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### **error while loading shared libraries: libunwind.so.8**
 | 
			
		||||
 | 
			
		||||
You may see this error message when running `source bigdl-nano-init`
 | 
			
		||||
```
 | 
			
		||||
 Sed: error while loading shared libraries: libunwind.so.8: cannot open shared object file: No such file or directory.
 | 
			
		||||
```
 | 
			
		||||
You can use the following command to fix this issue.
 | 
			
		||||
 | 
			
		||||
* `apt-get install libunwind8-dev` 
 | 
			
		||||
 | 
			
		||||
### **Bus error (core dumped) in multi-instance training with spawn distributed backend**
 | 
			
		||||
 | 
			
		||||
This usually is because you did not set enough shared memory size in your docker container.
 | 
			
		||||
 | 
			
		||||
You can increase `--shm-size` to a larger value, e.g. a few GB, to your `docker run` command, or use `--ipc=host`.
 | 
			
		||||
 | 
			
		||||
If you are running in k8s, you can mount larger storage in `/dev/shm`. For example, you can add the following `volume` and `volumeMount` in your pod and container definition.
 | 
			
		||||
 | 
			
		||||
```yaml
 | 
			
		||||
spec:
 | 
			
		||||
  containers:
 | 
			
		||||
    ...
 | 
			
		||||
    volumeMounts:
 | 
			
		||||
    - mountPath: /dev/shm
 | 
			
		||||
      name: cache-volume
 | 
			
		||||
  volumes:
 | 
			
		||||
  - emptyDir:
 | 
			
		||||
    medium: Memory
 | 
			
		||||
    sizeLimit: 8Gi
 | 
			
		||||
    name: cache-volume
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
## **TensorFlow Issues**
 | 
			
		||||
 | 
			
		||||
### **Nano keras multi-instance training currently does not suport tensorflow dataset.from_generators, numpy_function, py_function**
 | 
			
		||||
 | 
			
		||||
Nano keras multi-instance training will serialize TensorFlow dataset object into a `graph.pb` file, which does not work with `dataset.from_generators`, `dataset.numpy_function`, `dataset.py_function` due to limitations in TensorFlow.
 | 
			
		||||
| 
						 | 
				
			
			@ -2,7 +2,7 @@
 | 
			
		|||
 | 
			
		||||
BigDL-Nano can be used to accelerate PyTorch or PyTorch-Lightning applications on training workloads. The optimizations in BigDL-Nano are delivered through an extended version of PyTorch-Lightning `Trainer`. These optimizations are either enabled by default or can be easily turned on by setting a parameter or calling a method.
 | 
			
		||||
 | 
			
		||||
We will briefly describe here the major features in BigDL-Nano for PyTorch training. You can find complete examples here [links to be added]().
 | 
			
		||||
We will briefly describe here the major features in BigDL-Nano for PyTorch training. You can find complete examples [here](https://github.com/intel-analytics/BigDL/tree/main/python/nano/notebooks/pytorch).
 | 
			
		||||
 | 
			
		||||
### Best Known Configurations
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -40,13 +40,19 @@ trainer.fit(lightning_module, train_loader)
 | 
			
		|||
 | 
			
		||||
#### Intel® Extension for PyTorch
 | 
			
		||||
 | 
			
		||||
Intel Extension for Pytorch (a.k.a. IPEX) extends PyTorch with optimizations for an extra performance boost on Intel hardware. BigDL-Nano integrates IPEX through the `Trainer`. Users can turn on IPEX by setting `use_ipex=True`.
 | 
			
		||||
Intel Extension for Pytorch (a.k.a. IPEX) [link](https://github.com/intel/intel-extension-for-pytorch) extends PyTorch with optimizations for an extra performance boost on Intel hardware. BigDL-Nano integrates IPEX through the `Trainer`. Users can turn on IPEX by setting `use_ipex=True`.
 | 
			
		||||
 | 
			
		||||
```python
 | 
			
		||||
from bigdl.nano.pytorch import Trainer
 | 
			
		||||
trainer = Trainer(max_epoch=10, use_ipex=True)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Note: BigDL-Nano does not install IPEX by default. You can install IPEX using the following command:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
python -m pip install torch_ipex==1.9.0 -f https://software.intel.com/ipex-whl-stable
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
#### Multi-instance Training
 | 
			
		||||
 | 
			
		||||
When training on a server with dozens of CPU cores, it is often beneficial to use multiple training instances in a data-parallel fashion to make full use of the CPU cores. However, using PyTorch's DDP API is a little cumbersome and error-prone, and if not configured correctly, it will make the training even slow.
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -52,6 +52,7 @@ BigDL Documentation
 | 
			
		|||
   doc/Nano/QuickStart/tensorflow_train.md
 | 
			
		||||
   doc/Nano/QuickStart/tensorflow_inference.md
 | 
			
		||||
   doc/Nano/QuickStart/hpo.md
 | 
			
		||||
   doc/Nano/Overview/known_issues.md
 | 
			
		||||
 | 
			
		||||
.. toctree::
 | 
			
		||||
   :maxdepth: 1
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
		Reference in a new issue