parent
							
								
									3543a58723
								
							
						
					
					
						commit
						8e4e643f53
					
				
					 2 changed files with 37 additions and 36 deletions
				
			
		| 
						 | 
					@ -2,7 +2,7 @@
 | 
				
			||||||
 | 
					
 | 
				
			||||||
This tutorial provides a step-by-step guide on how to run BigDL-Orca programs on Kubernetes (K8s) clusters, using a [PyTorch Fashin-MNIST program](https://github.com/intel-analytics/BigDL/tree/main/python/orca/tutorial/pytorch/FashionMNIST) as a working example.
 | 
					This tutorial provides a step-by-step guide on how to run BigDL-Orca programs on Kubernetes (K8s) clusters, using a [PyTorch Fashin-MNIST program](https://github.com/intel-analytics/BigDL/tree/main/python/orca/tutorial/pytorch/FashionMNIST) as a working example.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
The **Client Container** that appears in this tutorial refer to the docker container where you launch or submit your applications. The __Develop Node__ is the host machine where you launch the client container.
 | 
					The **Client Container** that appears in this tutorial refer to the Docker container where you launch or submit your applications. The __Develop Node__ is the host machine where you launch the client container.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
---
 | 
					---
 | 
				
			||||||
## 1. Basic Concepts
 | 
					## 1. Basic Concepts
 | 
				
			||||||
| 
						 | 
					@ -20,12 +20,12 @@ init_orca_context(cluster_mode, master, container_image,
 | 
				
			||||||
In `init_orca_context`, you may specify necessary runtime configurations for running the example on K8s, including:
 | 
					In `init_orca_context`, you may specify necessary runtime configurations for running the example on K8s, including:
 | 
				
			||||||
* `cluster_mode`: one of `"k8s-client"`, `"k8s-cluster"` or `"spark-submit"` when you run on K8s clusters.
 | 
					* `cluster_mode`: one of `"k8s-client"`, `"k8s-cluster"` or `"spark-submit"` when you run on K8s clusters.
 | 
				
			||||||
* `master`: a URL format to specify the master address of the K8s cluster.
 | 
					* `master`: a URL format to specify the master address of the K8s cluster.
 | 
				
			||||||
* `container_image`: a string that specifies the name of docker container image for executors.
 | 
					* `container_image`: a string that specifies the name of Docker container image for executors. The Docker container image for BigDL is `intelanalytics/bigdl-k8s`.
 | 
				
			||||||
* `cores`: an integer that specifies the number of cores for each executor (default to be `2`).
 | 
					* `cores`: an integer that specifies the number of cores for each executor (default to be `2`).
 | 
				
			||||||
* `memory`: a string that specifies the memory for each executor (default to be `"2g"`).
 | 
					* `memory`: a string that specifies the memory for each executor (default to be `"2g"`).
 | 
				
			||||||
* `num_nodes`: an integer that specifies the number of executors (default to be `1`).
 | 
					* `num_nodes`: an integer that specifies the number of executors (default to be `1`).
 | 
				
			||||||
* `driver_cores`: an integer that specifies the number of cores for the driver node (default to be `4`).
 | 
					* `driver_cores`: an integer that specifies the number of cores for the driver node (default to be `4`).
 | 
				
			||||||
* `driver_memory`: a string that specifies the memory for the driver node (default to be `"1g"`).
 | 
					* `driver_memory`: a string that specifies the memory for the driver node (default to be `"2g"`).
 | 
				
			||||||
* `extra_python_lib`: a string that specifies the path to extra Python packages, separated by comma (default to be `None`). `.py`, `.zip` or `.egg` files are supported.
 | 
					* `extra_python_lib`: a string that specifies the path to extra Python packages, separated by comma (default to be `None`). `.py`, `.zip` or `.egg` files are supported.
 | 
				
			||||||
* `penv_archive`: a string that specifies the path to a packed conda archive (default to be `None`).
 | 
					* `penv_archive`: a string that specifies the path to a packed conda archive (default to be `None`).
 | 
				
			||||||
* `conf`: a dictionary to append extra conf for Spark (default to be `None`).
 | 
					* `conf`: a dictionary to append extra conf for Spark (default to be `None`).
 | 
				
			||||||
| 
						 | 
					@ -50,7 +50,7 @@ For k8s-client, the Spark driver runs in the client process (outside the K8s clu
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Please see more details in [K8s-Cluster](https://spark.apache.org/docs/latest/running-on-kubernetes.html#cluster-mode) and [K8s-Client](https://spark.apache.org/docs/latest/running-on-kubernetes.html#client-mode).
 | 
					Please see more details in [K8s-Cluster](https://spark.apache.org/docs/latest/running-on-kubernetes.html#cluster-mode) and [K8s-Client](https://spark.apache.org/docs/latest/running-on-kubernetes.html#client-mode).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
For **k8s-cluster** mode, a `driver pod name` will be returned when the application is completed. You can retrieve the results on the __Develop Node__ following the commands below:
 | 
					For **k8s-cluster** mode, a `driver-pod-name` will be returned when the application is completed. You can retrieve the results on the __Develop Node__ following the commands below:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* Retrieve the logs on the driver pod:
 | 
					* Retrieve the logs on the driver pod:
 | 
				
			||||||
```bash
 | 
					```bash
 | 
				
			||||||
| 
						 | 
					@ -69,14 +69,14 @@ When you are running programs on K8s, please load data from [Volumes](https://ku
 | 
				
			||||||
To load data from Volumes, please set the corresponding Volume configurations for spark using `--conf` option in Spark scripts or specifying `conf` in `init_orca_context`. Here we list the configurations for using NFS as Volume.
 | 
					To load data from Volumes, please set the corresponding Volume configurations for spark using `--conf` option in Spark scripts or specifying `conf` in `init_orca_context`. Here we list the configurations for using NFS as Volume.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
For **k8s-client** mode:
 | 
					For **k8s-client** mode:
 | 
				
			||||||
* `spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName`: specify the claim name of `persistentVolumeClaim` with volumnName `nfsvolumeclaim` to mount into executor pods.
 | 
					* `spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName`: specify the claim name of persistentVolumeClaim with volumnName `nfsvolumeclaim` to mount into executor pods.
 | 
				
			||||||
* `spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path`: specify the NFS path to be mounted as `nfsvolumeclaim` to executor pods.
 | 
					* `spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path`: specify the NFS path (`/bigdl/nfsdata` in our example) to be mounted as nfsvolumeclaim into executor pods.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Besides the above two configurations, you need to additionally set the following configurations for **k8s-cluster** mode:
 | 
					Besides the above two configurations, you need to additionally set the following configurations for **k8s-cluster** mode:
 | 
				
			||||||
* `spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName`: specify the claim name of `persistentVolumeClaim` with volumnName `nfsvolumeclaim` to mount into the driver pod.
 | 
					* `spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName`: specify the claim name of persistentVolumeClaim with volumnName `nfsvolumeclaim` to mount into the driver pod.
 | 
				
			||||||
* `spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path`: specify the NFS path to be mounted as `nfsvolumeclaim` to the driver pod.
 | 
					* `spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path`: specify the NFS path (`/bigdl/nfsdata` in our example) to be mounted as nfsvolumeclaim into the driver pod.
 | 
				
			||||||
* `spark.kubernetes.authenticate.driver.serviceAccountName`: the service account for the driver pod.
 | 
					* `spark.kubernetes.authenticate.driver.serviceAccountName`: the service account for the driver pod.
 | 
				
			||||||
* `spark.kubernetes.file.upload.path`: the path to store files at spark submit side in k8s-cluster mode.
 | 
					* `spark.kubernetes.file.upload.path`: the path to store files at spark submit side in k8s-cluster mode. In this example we use the NFS path as well.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Sample conf for NFS in the Fashion-MNIST example provided by this tutorial is as follows:
 | 
					Sample conf for NFS in the Fashion-MNIST example provided by this tutorial is as follows:
 | 
				
			||||||
```python
 | 
					```python
 | 
				
			||||||
| 
						 | 
					@ -112,11 +112,13 @@ def train_data_creator(config, batch_size):
 | 
				
			||||||
    return trainloader
 | 
					    return trainloader
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					You are recommended to put your working directory in the Volume (NFS) as well.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
---
 | 
					---
 | 
				
			||||||
## 2. Create BigDL K8s Container 
 | 
					## 2. Create BigDL K8s Container 
 | 
				
			||||||
### 2.1 Pull Docker Image
 | 
					### 2.1 Pull Docker Image
 | 
				
			||||||
Please pull the BigDL [`bigdl-k8s`]((https://hub.docker.com/r/intelanalytics/bigdl-k8s/tags)) image (built on top of Spark 3.1.3) from Docker Hub as follows:
 | 
					Please pull the BigDL [`bigdl-k8s`](https://hub.docker.com/r/intelanalytics/bigdl-k8s/tags) image (built on top of Spark 3.1.3) from Docker Hub as follows:
 | 
				
			||||||
```bash
 | 
					```bash
 | 
				
			||||||
# For the latest nightly build version
 | 
					# For the latest nightly build version
 | 
				
			||||||
sudo docker pull intelanalytics/bigdl-k8s:latest
 | 
					sudo docker pull intelanalytics/bigdl-k8s:latest
 | 
				
			||||||
| 
						 | 
					@ -159,14 +161,14 @@ In the script:
 | 
				
			||||||
* `-v /etc/kubernetes:/etc/kubernetes`: specify the path of Kubernetes configurations to mount into the Docker container.
 | 
					* `-v /etc/kubernetes:/etc/kubernetes`: specify the path of Kubernetes configurations to mount into the Docker container.
 | 
				
			||||||
* `-v /root/.kube:/root/.kube`: specify the path of Kubernetes installation to mount into the Docker container.
 | 
					* `-v /root/.kube:/root/.kube`: specify the path of Kubernetes installation to mount into the Docker container.
 | 
				
			||||||
* `-v /path/to/nfsdata:/bigdl/nfsdata`: mount NFS path on the host into the container as the specified path (e.g. "/bigdl/nfsdata").
 | 
					* `-v /path/to/nfsdata:/bigdl/nfsdata`: mount NFS path on the host into the container as the specified path (e.g. "/bigdl/nfsdata").
 | 
				
			||||||
* `NOTEBOOK_PORT`: an integer that specifies the port number for the Notebook (only required if you use notebook).
 | 
					* `NOTEBOOK_PORT`: an integer that specifies the port number for the Notebook. This is not necessary if you don't use notebook.
 | 
				
			||||||
* `NOTEBOOK_TOKEN`: a string that specifies the token for Notebook (only required if you use notebook).
 | 
					* `NOTEBOOK_TOKEN`: a string that specifies the token for Notebook. This is not necessary if you don't use notebook.
 | 
				
			||||||
* `RUNTIME_SPARK_MASTER`: a URL format that specifies the Spark master: k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>.
 | 
					* `RUNTIME_SPARK_MASTER`: a URL format that specifies the Spark master: k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>.
 | 
				
			||||||
* `RUNTIME_K8S_SERVICE_ACCOUNT`: a string that specifies the service account for driver pod.
 | 
					* `RUNTIME_K8S_SERVICE_ACCOUNT`: a string that specifies the service account for the driver pod.
 | 
				
			||||||
* `RUNTIME_K8S_SPARK_IMAGE`: the name of the BigDL K8s docker image.
 | 
					* `RUNTIME_K8S_SPARK_IMAGE`: the name of the BigDL K8s Docker image.
 | 
				
			||||||
* `RUNTIME_PERSISTENT_VOLUME_CLAIM`: a string that specifies the Kubernetes volumeName (e.g. "nfsvolumeclaim").
 | 
					* `RUNTIME_PERSISTENT_VOLUME_CLAIM`: a string that specifies the Kubernetes volumeName (e.g. "nfsvolumeclaim").
 | 
				
			||||||
* `RUNTIME_DRIVER_HOST`: a URL format that specifies the driver localhost (only required by k8s-client mode).
 | 
					* `RUNTIME_DRIVER_HOST`: a URL format that specifies the driver localhost (only required if you use k8s-client mode).
 | 
				
			||||||
* `RUNTIME_DRIVER_PORT`: a string that specifies the driver port (only required by k8s-client mode).
 | 
					* `RUNTIME_DRIVER_PORT`: a string that specifies the driver port (only required if you use k8s-client mode).
 | 
				
			||||||
* `RUNTIME_EXECUTOR_INSTANCES`: an integer that specifies the number of executors.
 | 
					* `RUNTIME_EXECUTOR_INSTANCES`: an integer that specifies the number of executors.
 | 
				
			||||||
* `RUNTIME_EXECUTOR_CORES`: an integer that specifies the number of cores for each executor.
 | 
					* `RUNTIME_EXECUTOR_CORES`: an integer that specifies the number of cores for each executor.
 | 
				
			||||||
* `RUNTIME_EXECUTOR_MEMORY`: a string that specifies the memory for each executor.
 | 
					* `RUNTIME_EXECUTOR_MEMORY`: a string that specifies the memory for each executor.
 | 
				
			||||||
| 
						 | 
					@ -184,7 +186,7 @@ Once the container is created, a `containerID` would be returned and with which
 | 
				
			||||||
```bash
 | 
					```bash
 | 
				
			||||||
sudo docker exec -it <containerID> bash
 | 
					sudo docker exec -it <containerID> bash
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
In the remaining part of this tutorial, you are supposed to operate and run commands inside this __Client Container__.
 | 
					In the remaining part of this tutorial, you are supposed to operate and run commands *__inside__* this __Client Container__.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
---
 | 
					---
 | 
				
			||||||
| 
						 | 
					@ -195,19 +197,17 @@ In the launched BigDL K8s **Client Container**, please setup the environment fol
 | 
				
			||||||
 | 
					
 | 
				
			||||||
- See [here](../Overview/install.md#to-install-orca-for-spark3) to install BigDL Orca in the created conda environment.
 | 
					- See [here](../Overview/install.md#to-install-orca-for-spark3) to install BigDL Orca in the created conda environment.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
- You should install all the other Python libraries that you need in your program in the conda environment as well. `torch` and `torchvision` are needed to run the Fashion-MNIST example:
 | 
					- You should install all the other Python libraries that you need in your program in the conda environment as well. `torch` and `torchvision` are needed to run the Fashion-MNIST example we provide:
 | 
				
			||||||
```bash
 | 
					```bash
 | 
				
			||||||
pip install torch torchvision
 | 
					pip install torch torchvision tqdm
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
- For more details, please see [Python User Guide](https://bigdl.readthedocs.io/en/latest/doc/UserGuide/python.html).
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
 | 
					
 | 
				
			||||||
---
 | 
					---
 | 
				
			||||||
## 4. Prepare Dataset
 | 
					## 4. Prepare Dataset
 | 
				
			||||||
To run the Fashion-MNIST example provided by this tutorial on K8s, you should upload the dataset to a K8s Volume (e.g. NFS).
 | 
					To run the Fashion-MNIST example provided by this tutorial on K8s, you should upload the dataset to a K8s Volume (e.g. NFS).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Please download the Fashion-MNIST dataset manually on your __Develop Node__ and put the data into the Volume. Note that PyTorch `FashionMNIST Dataset` requires unzipped files located in `FashionMNIST/raw/` under the root folder.
 | 
					Please download the Fashion-MNIST dataset manually on your __Develop Node__ and put the data into the Volume. Note that PyTorch `FashionMNIST Dataset` requires unzipped files located in `FashionMNIST/raw/` under the dataset folder.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```bash
 | 
					```bash
 | 
				
			||||||
# PyTorch official dataset download link
 | 
					# PyTorch official dataset download link
 | 
				
			||||||
| 
						 | 
					@ -217,6 +217,7 @@ git clone https://github.com/zalandoresearch/fashion-mnist.git
 | 
				
			||||||
mv /path/to/fashion-mnist/data/fashion/* /bigdl/nfsdata/dataset/FashionMNIST/raw
 | 
					mv /path/to/fashion-mnist/data/fashion/* /bigdl/nfsdata/dataset/FashionMNIST/raw
 | 
				
			||||||
 | 
					
 | 
				
			||||||
# Extract FashionMNIST archives
 | 
					# Extract FashionMNIST archives
 | 
				
			||||||
 | 
					# May need to upgrade gzip before running the command
 | 
				
			||||||
gzip -dk /bigdl/nfsdata/dataset/FashionMNIST/raw/*
 | 
					gzip -dk /bigdl/nfsdata/dataset/FashionMNIST/raw/*
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -227,7 +228,7 @@ In the given example, you can specify the argument `--remote_dir` to be the dire
 | 
				
			||||||
## 5. Prepare Custom Modules
 | 
					## 5. Prepare Custom Modules
 | 
				
			||||||
Spark allows to upload Python files(`.py`), and zipped Python packages(`.zip`) across the cluster by setting `--py-files` option in Spark scripts or specifying `extra_python_lib` in `init_orca_context`.
 | 
					Spark allows to upload Python files(`.py`), and zipped Python packages(`.zip`) across the cluster by setting `--py-files` option in Spark scripts or specifying `extra_python_lib` in `init_orca_context`.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
The FasionMNIST example needs to import modules from `model.py`.
 | 
					The FasionMNIST example needs to import the modules from [`model.py`](https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/pytorch/FashionMNIST/model.py).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
__Note:__ Please upload the extra Python dependency files to the Volume (e.g. NFS) when running the program on k8s-cluster mode (see __[Section 6.2.2](#id2)__ for more details).
 | 
					__Note:__ Please upload the extra Python dependency files to the Volume (e.g. NFS) when running the program on k8s-cluster mode (see __[Section 6.2.2](#id2)__ for more details).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -253,20 +254,20 @@ For more details, please see [Spark Python Dependencies](https://spark.apache.or
 | 
				
			||||||
from model import model_creator, optimizer_creator
 | 
					from model import model_creator, optimizer_creator
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
__Note__:
 | 
					__Notes__:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
If your program depends on a nested directory of Python files, you are recommended to follow the steps below to use a zipped package instead.
 | 
					If your program depends on a nested directory of Python files, you are recommended to follow the steps below to use a zipped package instead.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
1. Compress the directory into a zipped package.
 | 
					1. Compress the directory into a zipped package.
 | 
				
			||||||
```bash
 | 
					    ```bash
 | 
				
			||||||
zip -q -r FashionMNIST_zipped.zip FashionMNIST
 | 
					    zip -q -r FashionMNIST_zipped.zip FashionMNIST
 | 
				
			||||||
```
 | 
					    ```
 | 
				
			||||||
2. Upload the zipped package (`FashionMNIST_zipped.zip`) to K8s by setting `--py-files` or specifying `extra_python_lib` as discussed above.
 | 
					2. Upload the zipped package (`FashionMNIST_zipped.zip`) to K8s by setting `--py-files` or specifying `extra_python_lib` as discussed above.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
3. You can then import the custom modules from the unzipped file in your program as follows:
 | 
					3. You can then import the custom modules from the unzipped file in your program as follows:
 | 
				
			||||||
```python
 | 
					    ```python
 | 
				
			||||||
from FashionMNIST.model import model_creator, optimizer_creator
 | 
					    from FashionMNIST.model import model_creator, optimizer_creator
 | 
				
			||||||
```
 | 
					    ```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
---
 | 
					---
 | 
				
			||||||
| 
						 | 
					@ -339,7 +340,7 @@ Some runtime configurations for Spark are as follows:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `--master`: a URL format that specifies the Spark master: k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>.
 | 
					* `--master`: a URL format that specifies the Spark master: k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>.
 | 
				
			||||||
* `--name`: the name of the Spark application.
 | 
					* `--name`: the name of the Spark application.
 | 
				
			||||||
* `--conf spark.kubernetes.container.image`: the name of the BigDL K8s docker image.
 | 
					* `--conf spark.kubernetes.container.image`: the name of the BigDL K8s Docker image.
 | 
				
			||||||
* `--conf spark.kubernetes.authenticate.driver.serviceAccountName`: the service account for the driver pod.
 | 
					* `--conf spark.kubernetes.authenticate.driver.serviceAccountName`: the service account for the driver pod.
 | 
				
			||||||
* `--conf spark.executor.instances`: the number of executors.
 | 
					* `--conf spark.executor.instances`: the number of executors.
 | 
				
			||||||
* `--executor-memory`: the memory for each executor.
 | 
					* `--executor-memory`: the memory for each executor.
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -22,7 +22,7 @@ In `init_orca_context`, you may specify necessary runtime configurations for run
 | 
				
			||||||
* `memory`: a string that specifies the memory for each executor (default to be `"2g"`).
 | 
					* `memory`: a string that specifies the memory for each executor (default to be `"2g"`).
 | 
				
			||||||
* `num_nodes`: an integer that specifies the number of executors (default to be `1`).
 | 
					* `num_nodes`: an integer that specifies the number of executors (default to be `1`).
 | 
				
			||||||
* `driver_cores`: an integer that specifies the number of cores for the driver node (default to be `4`).
 | 
					* `driver_cores`: an integer that specifies the number of cores for the driver node (default to be `4`).
 | 
				
			||||||
* `driver_memory`: a string that specifies the memory for the driver node (default to be `"1g"`).
 | 
					* `driver_memory`: a string that specifies the memory for the driver node (default to be `"2g"`).
 | 
				
			||||||
* `extra_python_lib`: a string that specifies the path to extra Python packages, separated by comma (default to be `None`). `.py`, `.zip` or `.egg` files are supported.
 | 
					* `extra_python_lib`: a string that specifies the path to extra Python packages, separated by comma (default to be `None`). `.py`, `.zip` or `.egg` files are supported.
 | 
				
			||||||
* `conf`: a dictionary to append extra conf for Spark (default to be `None`).
 | 
					* `conf`: a dictionary to append extra conf for Spark (default to be `None`).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -95,10 +95,9 @@ __Note__:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
- You should install all the other Python libraries that you need in your program in the conda environment as well. `torch` and `torchvision` are needed to run the Fashion-MNIST example:
 | 
					- You should install all the other Python libraries that you need in your program in the conda environment as well. `torch` and `torchvision` are needed to run the Fashion-MNIST example:
 | 
				
			||||||
    ```bash
 | 
					    ```bash
 | 
				
			||||||
    pip install torch torchvision
 | 
					    pip install torch torchvision tqdm
 | 
				
			||||||
    ```
 | 
					    ```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
- For more details, please see [Python User Guide](https://bigdl.readthedocs.io/en/latest/doc/UserGuide/python.html).
 | 
					 | 
				
			||||||
 | 
					
 | 
				
			||||||
### 2.3 Run on CDH
 | 
					### 2.3 Run on CDH
 | 
				
			||||||
* For [CDH](https://www.cloudera.com/products/open-source/apache-hadoop/key-cdh-components.html) users, the environment variable `HADOOP_CONF_DIR` should be `/etc/hadoop/conf` by default.
 | 
					* For [CDH](https://www.cloudera.com/products/open-source/apache-hadoop/key-cdh-components.html) users, the environment variable `HADOOP_CONF_DIR` should be `/etc/hadoop/conf` by default.
 | 
				
			||||||
| 
						 | 
					@ -113,7 +112,7 @@ __Note__:
 | 
				
			||||||
## 3. Prepare Dataset 
 | 
					## 3. Prepare Dataset 
 | 
				
			||||||
To run the Fashion-MNIST example provided by this tutorial on YARN, you should upload the Fashion-MNIST dataset to a distributed storage (such as HDFS or S3).   
 | 
					To run the Fashion-MNIST example provided by this tutorial on YARN, you should upload the Fashion-MNIST dataset to a distributed storage (such as HDFS or S3).   
 | 
				
			||||||
 | 
					
 | 
				
			||||||
First, download the Fashion-MNIST dataset manually on your __Client Node__. Note that PyTorch `FashionMNIST Dataset` requires unzipped files located in `FashionMNIST/raw/` under the root folder.
 | 
					First, download the Fashion-MNIST dataset manually on your __Client Node__. Note that PyTorch `FashionMNIST Dataset` requires unzipped files located in `FashionMNIST/raw/` under the dataset folder.
 | 
				
			||||||
```bash
 | 
					```bash
 | 
				
			||||||
# PyTorch official dataset download link
 | 
					# PyTorch official dataset download link
 | 
				
			||||||
git clone https://github.com/zalandoresearch/fashion-mnist.git
 | 
					git clone https://github.com/zalandoresearch/fashion-mnist.git
 | 
				
			||||||
| 
						 | 
					@ -122,6 +121,7 @@ git clone https://github.com/zalandoresearch/fashion-mnist.git
 | 
				
			||||||
mv /path/to/fashion-mnist/data/fashion/* /path/to/local/data/FashionMNIST/raw
 | 
					mv /path/to/fashion-mnist/data/fashion/* /path/to/local/data/FashionMNIST/raw
 | 
				
			||||||
 | 
					
 | 
				
			||||||
# Extract FashionMNIST archives
 | 
					# Extract FashionMNIST archives
 | 
				
			||||||
 | 
					# May need to upgrade gzip before running the command
 | 
				
			||||||
gzip -dk /bigdl/nfsdata/dataset/FashionMNIST/raw/*
 | 
					gzip -dk /bigdl/nfsdata/dataset/FashionMNIST/raw/*
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
Then upload it to a distributed storage. Sample command to upload data to HDFS is as follows:
 | 
					Then upload it to a distributed storage. Sample command to upload data to HDFS is as follows:
 | 
				
			||||||
| 
						 | 
					@ -134,7 +134,7 @@ In the given example, you can specify the argument `--remote_dir` to be the dire
 | 
				
			||||||
## 4. Prepare Custom Modules
 | 
					## 4. Prepare Custom Modules
 | 
				
			||||||
Spark allows to upload Python files (`.py`), and zipped Python packages (`.zip`) across the cluster by setting `--py-files` option in Spark scripts or specifying `extra_python_lib` in `init_orca_context`.
 | 
					Spark allows to upload Python files (`.py`), and zipped Python packages (`.zip`) across the cluster by setting `--py-files` option in Spark scripts or specifying `extra_python_lib` in `init_orca_context`.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
The FasionMNIST example needs to import modules from [`model.py`](https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/pytorch/FashionMNIST/model.py).
 | 
					The FasionMNIST example needs to import the modules from [`model.py`](https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/pytorch/FashionMNIST/model.py).
 | 
				
			||||||
* When using [`python` command](#use-python-command), please specify `extra_python_lib` in `init_orca_context`.
 | 
					* When using [`python` command](#use-python-command), please specify `extra_python_lib` in `init_orca_context`.
 | 
				
			||||||
    ```python
 | 
					    ```python
 | 
				
			||||||
    init_orca_context(..., extra_python_lib="model.py")
 | 
					    init_orca_context(..., extra_python_lib="model.py")
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in a new issue