Update K8s tutorial (part4) (#6724)
* update python * update spark-submit
This commit is contained in:
		
							parent
							
								
									7f2beefb7b
								
							
						
					
					
						commit
						d0388cb8d8
					
				
					 2 changed files with 187 additions and 261 deletions
				
			
		| 
						 | 
				
			
			@ -2,7 +2,7 @@
 | 
			
		|||
 | 
			
		||||
This tutorial provides a step-by-step guide on how to run BigDL-Orca programs on Kubernetes (K8s) clusters, using a [PyTorch Fashin-MNIST program](https://github.com/intel-analytics/BigDL/tree/main/python/orca/tutorial/pytorch/FashionMNIST) as a working example.
 | 
			
		||||
 | 
			
		||||
The **Client Container** that appears in this tutorial refer to the docker container where you launch or submit your applications.
 | 
			
		||||
The **Client Container** that appears in this tutorial refer to the docker container where you launch or submit your applications. The __Develop Node__ is the host machine where you launch the client container.
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
## 1. Basic Concepts
 | 
			
		||||
| 
						 | 
				
			
			@ -50,10 +50,48 @@ For k8s-client, the Spark driver runs in the client process (outside the K8s clu
 | 
			
		|||
 | 
			
		||||
Please see more details in [K8s-Cluster](https://spark.apache.org/docs/latest/running-on-kubernetes.html#cluster-mode) and [K8s-Client](https://spark.apache.org/docs/latest/running-on-kubernetes.html#client-mode).
 | 
			
		||||
 | 
			
		||||
For **k8s-cluster** mode, a `driver pod name` will be returned when the application is completed. You can retrieve the results on the __Develop Node__ following the commands below:
 | 
			
		||||
 | 
			
		||||
* Retrieve the logs on the driver pod:
 | 
			
		||||
```bash
 | 
			
		||||
kubectl logs <driver-pod-name>
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
* Check the pod status or get basic information of the driver pod:
 | 
			
		||||
```bash
 | 
			
		||||
kubectl describe pod <driver-pod-name>
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### 1.3 Load Data from Volumes
 | 
			
		||||
When you are running programs on K8s, please load data from [Volumes](https://kubernetes.io/docs/concepts/storage/volumes/) accessible to all K8s pods. We use Network File Systems (NFS) in this tutorial as an example.
 | 
			
		||||
When you are running programs on K8s, please load data from [Volumes](https://kubernetes.io/docs/concepts/storage/volumes/) accessible to all K8s pods. We use Network File Systems (NFS) with path `/bigdl/nfsdata` in this tutorial as an example.
 | 
			
		||||
 | 
			
		||||
To load data from Volumes, please set the corresponding Volume configurations for spark using `--conf` option in Spark scripts or specifying `conf` in `init_orca_context`. Here we list the configurations for using NFS as Volume.
 | 
			
		||||
 | 
			
		||||
For **k8s-client** mode:
 | 
			
		||||
* `spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName`: specify the claim name of `persistentVolumeClaim` with volumnName `nfsvolumeclaim` to mount into executor pods.
 | 
			
		||||
* `spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path`: specify the NFS path to be mounted as `nfsvolumeclaim` to executor pods.
 | 
			
		||||
 | 
			
		||||
Besides the above two configurations, you need to additionally set the following configurations for **k8s-cluster** mode:
 | 
			
		||||
* `spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName`: specify the claim name of `persistentVolumeClaim` with volumnName `nfsvolumeclaim` to mount into the driver pod.
 | 
			
		||||
* `spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path`: specify the NFS path to be mounted as `nfsvolumeclaim` to the driver pod.
 | 
			
		||||
* `spark.kubernetes.authenticate.driver.serviceAccountName`: the service account for the driver pod.
 | 
			
		||||
* `spark.kubernetes.file.upload.path`: the path to store files at spark submit side in k8s-cluster mode.
 | 
			
		||||
 | 
			
		||||
Sample conf for NFS in the Fashion-MNIST example provided by this tutorial is as follows:
 | 
			
		||||
```python
 | 
			
		||||
{
 | 
			
		||||
    # For k8s-client mode
 | 
			
		||||
    "spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName": "nfsvolumeclaim",
 | 
			
		||||
    "spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path": "/bigdl/nfsdata",
 | 
			
		||||
    
 | 
			
		||||
    # Additionally for k8s-cluster mode
 | 
			
		||||
    "spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName": "nfsvolumeclaim",
 | 
			
		||||
    "spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path": "/bigdl/nfsdata",
 | 
			
		||||
    "spark.kubernetes.authenticate.driver.serviceAccountName": "spark",
 | 
			
		||||
    "spark.kubernetes.file.upload.path": "/bigdl/nfsdata/"
 | 
			
		||||
}
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
After mounting the Volume (NFS) into the BigDL container (see __[Section 2.2](#create-a-k8s-client-container)__ for more details), the Fashion-MNIST example could load data from NFS as local storage.
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -91,24 +129,6 @@ sudo docker pull intelanalytics/bigdl-k8s:2.1.0
 | 
			
		|||
### 2.2 Create a K8s Client Container
 | 
			
		||||
Please create the __Client Container__ using the script below:
 | 
			
		||||
```bash
 | 
			
		||||
sudo docker run -itd --net=host \
 | 
			
		||||
    -v /etc/kubernetes:/etc/kubernetes \
 | 
			
		||||
    -v /root/.kube:/root/.kube \
 | 
			
		||||
    intelanalytics/bigdl-k8s:latest bash
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
In the script:
 | 
			
		||||
* **Please switch the tag according to the BigDL image you pull.**
 | 
			
		||||
* `--net=host`: use the host network stack for the Docker container.
 | 
			
		||||
* `-v /etc/kubernetes:/etc/kubernetes`: specify the path of Kubernetes configurations to mount into the Docker container.
 | 
			
		||||
* `-v /root/.kube:/root/.kube`: specify the path of Kubernetes installation to mount into the Docker container.
 | 
			
		||||
 | 
			
		||||
__Notes:__
 | 
			
		||||
* The __Client Container__ contains all the required environment except K8s configurations.
 | 
			
		||||
* You don't need to create Spark executor containers manually, which are scheduled by K8s at runtime.
 | 
			
		||||
 | 
			
		||||
We recommend you to specify more arguments when creating the __Client Container__:
 | 
			
		||||
```bash
 | 
			
		||||
sudo docker run -itd --net=host \
 | 
			
		||||
    -v /etc/kubernetes:/etc/kubernetes \
 | 
			
		||||
    -v /root/.kube:/root/.kube \
 | 
			
		||||
| 
						 | 
				
			
			@ -124,23 +144,26 @@ sudo docker run -itd --net=host \
 | 
			
		|||
    -e RUNTIME_DRIVER_HOST=x.x.x.x \
 | 
			
		||||
    -e RUNTIME_DRIVER_PORT=54321 \
 | 
			
		||||
    -e RUNTIME_EXECUTOR_INSTANCES=2 \
 | 
			
		||||
    -e RUNTIME_EXECUTOR_CORES=2 \
 | 
			
		||||
    -e RUNTIME_EXECUTOR_MEMORY=20g \
 | 
			
		||||
    -e RUNTIME_TOTAL_EXECUTOR_CORES=4 \
 | 
			
		||||
    -e RUNTIME_DRIVER_CORES=4 \
 | 
			
		||||
    -e RUNTIME_DRIVER_MEMORY=10g \
 | 
			
		||||
    -e RUNTIME_EXECUTOR_CORES=4 \
 | 
			
		||||
    -e RUNTIME_EXECUTOR_MEMORY=2g \
 | 
			
		||||
    -e RUNTIME_TOTAL_EXECUTOR_CORES=8 \
 | 
			
		||||
    -e RUNTIME_DRIVER_CORES=2 \
 | 
			
		||||
    -e RUNTIME_DRIVER_MEMORY=2g \
 | 
			
		||||
    intelanalytics/bigdl-k8s:latest bash
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
In the script:
 | 
			
		||||
* **Please switch the tag according to the BigDL image you pull.**
 | 
			
		||||
* **Please make sure you are mounting the correct Volume path (e.g. NFS) into the container.**
 | 
			
		||||
* `--net=host`: use the host network stack for the Docker container.
 | 
			
		||||
* `-v /etc/kubernetes:/etc/kubernetes`: specify the path of Kubernetes configurations to mount into the Docker container.
 | 
			
		||||
* `-v /root/.kube:/root/.kube`: specify the path of Kubernetes installation to mount into the Docker container.
 | 
			
		||||
* `-v /path/to/nfsdata:/bigdl/nfsdata`: mount NFS path on the host into the container as the specified path (e.g. "/bigdl/nfsdata").
 | 
			
		||||
* `NOTEBOOK_PORT`: an integer that specifies the port number for the Notebook (only required if you use notebook).
 | 
			
		||||
* `NOTEBOOK_TOKEN`: a string that specifies the token for Notebook (only required if you use notebook).
 | 
			
		||||
* `RUNTIME_SPARK_MASTER`: a URL format that specifies the Spark master.
 | 
			
		||||
* `RUNTIME_SPARK_MASTER`: a URL format that specifies the Spark master: k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>.
 | 
			
		||||
* `RUNTIME_K8S_SERVICE_ACCOUNT`: a string that specifies the service account for driver pod.
 | 
			
		||||
* `RUNTIME_K8S_SPARK_IMAGE`: the launched k8s image for Spark.
 | 
			
		||||
* `RUNTIME_K8S_SPARK_IMAGE`: the name of the BigDL K8s docker image.
 | 
			
		||||
* `RUNTIME_PERSISTENT_VOLUME_CLAIM`: a string that specifies the Kubernetes volumeName (e.g. "nfsvolumeclaim").
 | 
			
		||||
* `RUNTIME_DRIVER_HOST`: a URL format that specifies the driver localhost (only required by k8s-client mode).
 | 
			
		||||
* `RUNTIME_DRIVER_PORT`: a string that specifies the driver port (only required by k8s-client mode).
 | 
			
		||||
| 
						 | 
				
			
			@ -151,6 +174,10 @@ In the script:
 | 
			
		|||
* `RUNTIME_DRIVER_CORES`: an integer that specifies the number of cores for the driver node.
 | 
			
		||||
* `RUNTIME_DRIVER_MEMORY`: a string that specifies the memory for the driver node.
 | 
			
		||||
 | 
			
		||||
__Notes:__
 | 
			
		||||
* The __Client Container__ contains all the required environment except K8s configurations.
 | 
			
		||||
* You don't need to create Spark executor containers manually, which are scheduled by K8s at runtime.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### 2.3 Launch the K8s Client Container
 | 
			
		||||
Once the container is created, a `containerID` would be returned and with which you can enter the container following the command below:
 | 
			
		||||
| 
						 | 
				
			
			@ -180,7 +207,7 @@ pip install torch torchvision
 | 
			
		|||
## 4. Prepare Dataset
 | 
			
		||||
To run the Fashion-MNIST example provided by this tutorial on K8s, you should upload the dataset to a K8s Volume (e.g. NFS).
 | 
			
		||||
 | 
			
		||||
Please download the Fashion-MNIST dataset manually on your __Develop Node__ (where you launch the container image) and put the data into the Volume. Note that PyTorch `FashionMNIST Dataset` requires unzipped files located in `FashionMNIST/raw/` under the root folder.
 | 
			
		||||
Please download the Fashion-MNIST dataset manually on your __Develop Node__ and put the data into the Volume. Note that PyTorch `FashionMNIST Dataset` requires unzipped files located in `FashionMNIST/raw/` under the root folder.
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
# PyTorch official dataset download link
 | 
			
		||||
| 
						 | 
				
			
			@ -193,10 +220,12 @@ mv /path/to/fashion-mnist/data/fashion /bigdl/nfsdata/dataset/FashionMNIST/raw
 | 
			
		|||
gzip -dk /bigdl/nfsdata/dataset/FashionMNIST/raw/*
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
In the given example, you can specify the argument `--remote_dir` to be the directory on NFS for the Fashion-MNIST dataset.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
## 5. Prepare Custom Modules
 | 
			
		||||
Spark allows to upload Python files(`.py`), and zipped Python packages(`.zip`) across the cluster by setting `--py-files` option in Spark scripts or specifying `extra_python_lib` in `init_orca_context`. 
 | 
			
		||||
Spark allows to upload Python files(`.py`), and zipped Python packages(`.zip`) across the cluster by setting `--py-files` option in Spark scripts or specifying `extra_python_lib` in `init_orca_context`.
 | 
			
		||||
 | 
			
		||||
The FasionMNIST example needs to import modules from `model.py`.
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -251,129 +280,83 @@ In the following part, we will illustrate four ways to submit and run BigDL Orca
 | 
			
		|||
 | 
			
		||||
You can choose one of them based on your preference or cluster settings.
 | 
			
		||||
 | 
			
		||||
We provide the running command for the [Fashion-MNIST example](https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/pytorch/FashionMNIST/) in this section.
 | 
			
		||||
We provide the running command for the [Fashion-MNIST example](https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/pytorch/FashionMNIST/) in the __Client Container__ in this section.
 | 
			
		||||
 | 
			
		||||
### 6.1 Use `python` command
 | 
			
		||||
This is the easiest and most recommended way to run BigDL Orca on K8s as a normal Python program.
 | 
			
		||||
 | 
			
		||||
See [here](#init-orca-context) for the runtime configurations.
 | 
			
		||||
 | 
			
		||||
#### 6.1.1 K8s-Client
 | 
			
		||||
Before running the example on `k8s-client` mode, you should:
 | 
			
		||||
* On the __Client Container__:
 | 
			
		||||
    1. Please call `init_orca_context` at very begining part of each Orca program.
 | 
			
		||||
        ```python
 | 
			
		||||
        from bigdl.orca import init_orca_context, stop_orca_context
 | 
			
		||||
 | 
			
		||||
        conf={
 | 
			
		||||
            "spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName":"nfsvolumeclaim",
 | 
			
		||||
            "spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path": "/bigdl/nfsdata",
 | 
			
		||||
            }
 | 
			
		||||
 | 
			
		||||
        init_orca_context(cluster_mode="k8s-client", num_nodes=2, cores=2, memory="2g",
 | 
			
		||||
                        master="k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>",
 | 
			
		||||
                        container_image="intelanalytics/bigdl-k8s:2.1.0",
 | 
			
		||||
                        extra_python_lib="/path/to/model.py", conf=conf)
 | 
			
		||||
        ```
 | 
			
		||||
 | 
			
		||||
        To load data from NFS, please use the following configuration propeties: 
 | 
			
		||||
        * `spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName`: specify the claim name of `persistentVolumeClaim` with volumnName `nfsvolumeclaim` to mount `persistentVolume` into executor pods;
 | 
			
		||||
        * `spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path`: add volumeName `nfsvolumeclaim` of the volumeType `persistentVolumeClaim` to executor pods on the NFS path specified in value;
 | 
			
		||||
    2. Using Conda to install BigDL and needed Python dependency libraries (see __[Section 3](#3-prepare-environment)__).
 | 
			
		||||
 | 
			
		||||
Please run the Fashion-MNIST example following the command below:
 | 
			
		||||
Run the example with the following command by setting the cluster_mode to "k8s-client":
 | 
			
		||||
```bash
 | 
			
		||||
python train.py --cluster_mode k8s-client --remote_dir file:///bigdl/nfsdata/dataset
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
In the script:
 | 
			
		||||
* `cluster_mode`: set the cluster_mode in `init_orca_context`.
 | 
			
		||||
* `remote_dir`: directory on NFS for loading the dataset.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
#### 6.1.2 K8s-Cluster
 | 
			
		||||
Before running the example on `k8s-cluster` mode, you should:
 | 
			
		||||
* On the __Client Container__:
 | 
			
		||||
    1. Please call `init_orca_context` at very begining part of each Orca program.
 | 
			
		||||
        ```python
 | 
			
		||||
        from bigdl.orca import init_orca_context, stop_orca_context
 | 
			
		||||
* In the __Client Container__:
 | 
			
		||||
 | 
			
		||||
        conf={
 | 
			
		||||
              "spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName":"nfsvolumeclaim",
 | 
			
		||||
              "spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path": "/bigdl/nfsdata",
 | 
			
		||||
              "spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName":"nfsvolumeclaim",
 | 
			
		||||
              "spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path": "/bigdl/nfsdata",
 | 
			
		||||
              "spark.kubernetes.authenticate.driver.serviceAccountName":"spark",
 | 
			
		||||
              "spark.kubernetes.file.upload.path":"/bigdl/nfsdata/"
 | 
			
		||||
              }
 | 
			
		||||
Pack the current activate conda environment to an archive:
 | 
			
		||||
```bash
 | 
			
		||||
conda pack -o environment.tar.gz
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
        init_orca_context(cluster_mode="k8s-cluster", num_nodes=2, cores=2, memory="2g",
 | 
			
		||||
                          master="k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>", 
 | 
			
		||||
                          container_image="intelanalytics/bigdl-k8s:2.1.0",
 | 
			
		||||
                          penv_archive="file:///bigdl/nfsdata/environment.tar.gz",
 | 
			
		||||
                          extra_python_lib="/bigdl/nfsdata/model.py", conf=conf)
 | 
			
		||||
        ```
 | 
			
		||||
 | 
			
		||||
        When running Orca programs on `k8s-cluster` mode, please use the following additional configuration propeties: 
 | 
			
		||||
        * `spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName`: specify the claim name of `persistentVolumeClaim` with volumnName `nfsvolumeclaim` to mount `persistentVolume` into driver pod;
 | 
			
		||||
        * `spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path`: add volumeName `nfsvolumeclaim` of the volumeType `persistentVolumeClaim` to driver pod on the NFS path specified in value;
 | 
			
		||||
        * `spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName`: specify the claim name of `persistentVolumeClaim` with volumnName `nfsvolumeclaim` to mount `persistentVolume` into executor pods;
 | 
			
		||||
        * `spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path`: add volumeName `nfsvolumeclaim` of the volumeType `persistentVolumeClaim` to executor pods on the NFS path specified in value;
 | 
			
		||||
        * `spark.kubernetes.authenticate.driver.serviceAccountName`: the service account for driver pod;
 | 
			
		||||
        * `spark.kubernetes.file.upload.path`: the path to store files at spark submit side in cluster mode;
 | 
			
		||||
    2. Using Conda to install BigDL and needed Python dependency libraries (see __[Section 3](#3-prepare-environment)__), then pack the current activate Conda environment to an archive.
 | 
			
		||||
        ```
 | 
			
		||||
        conda pack -o /path/to/environment.tar.gz
 | 
			
		||||
        ```
 | 
			
		||||
* On the __Develop Node__:
 | 
			
		||||
    1. Upload the Conda archive to NFS.
 | 
			
		||||
        ```bash
 | 
			
		||||
        docker cp <containerID>:/opt/spark/work-dir/environment.tar.gz /bigdl/nfsdata
 | 
			
		||||
        ```
 | 
			
		||||
    2. Upload the example Python file to NFS.
 | 
			
		||||
        ```bash
 | 
			
		||||
        mv /path/to/train.py /bigdl/nfsdata
 | 
			
		||||
        ```
 | 
			
		||||
    3. Upload the extra Python dependency file to NFS.
 | 
			
		||||
        ```bash
 | 
			
		||||
        mv /path/to/model.py /bigdl/nfsdata
 | 
			
		||||
        ```
 | 
			
		||||
1. Upload the conda archive to NFS.
 | 
			
		||||
```bash
 | 
			
		||||
docker cp <containerID>:/path/to/environment.tar.gz /bigdl/nfsdata
 | 
			
		||||
```
 | 
			
		||||
2. Upload the example Python file to NFS.
 | 
			
		||||
```bash
 | 
			
		||||
cp /path/to/train.py /bigdl/nfsdata
 | 
			
		||||
```
 | 
			
		||||
3. Upload the extra Python dependency files to NFS.
 | 
			
		||||
```bash
 | 
			
		||||
cp /path/to/model.py /bigdl/nfsdata
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Please run the Fashion-MNIST example in __Client Container__ following the command below:
 | 
			
		||||
Run the example with the following command by setting the cluster_mode to “k8s-cluster”:
 | 
			
		||||
```bash
 | 
			
		||||
python /bigdl/nfsdata/train.py --cluster_mode k8s-cluster --remote_dir /bigdl/nfsdata/dataset
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
In the script:
 | 
			
		||||
* `cluster_mode`: set the cluster_mode in `init_orca_context`.
 | 
			
		||||
* `remote_dir`: directory on NFS for loading the dataset.
 | 
			
		||||
 | 
			
		||||
__Note:__ It will return a `driver pod name` when the application is completed.
 | 
			
		||||
 | 
			
		||||
Please retreive training stats on the __Develop Node__ following the command below:
 | 
			
		||||
* Retrive training logs on the driver pod:
 | 
			
		||||
    ```bash
 | 
			
		||||
    kubectl logs <driver-pod-name>
 | 
			
		||||
    ```
 | 
			
		||||
 | 
			
		||||
* Check pod status or get basic informations around pod:
 | 
			
		||||
    ```bash
 | 
			
		||||
    kubectl describe pod <driver-pod-name>
 | 
			
		||||
    ```
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### 6.2 Use `spark-submit`
 | 
			
		||||
 | 
			
		||||
Set the cluster_mode to "bigdl-submit" in `init_orca_context`.
 | 
			
		||||
```python
 | 
			
		||||
init_orca_context(cluster_mode="spark-submit")
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Pack the current activate conda environment to an archive in the __Client Container__:
 | 
			
		||||
```bash
 | 
			
		||||
conda pack -o environment.tar.gz
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Some runtime configurations for Spark are as follows:
 | 
			
		||||
 | 
			
		||||
* `--master`: a URL format that specifies the Spark master: k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>.
 | 
			
		||||
* `--name`: the name of the Spark application.
 | 
			
		||||
* `--conf spark.kubernetes.container.image`: the name of the BigDL K8s docker image.
 | 
			
		||||
* `--conf spark.kubernetes.authenticate.driver.serviceAccountName`: the service account for the driver pod.
 | 
			
		||||
* `--conf spark.executor.instances`: the number of executors.
 | 
			
		||||
* `--executor-memory`: the memory for each executor.
 | 
			
		||||
* `--driver-memory`: the memory for the driver node.
 | 
			
		||||
* `--executor-cores`: the number of cores for each executor.
 | 
			
		||||
* `--total-executor-cores`: the total number of executor cores.
 | 
			
		||||
* `--properties-file`: the BigDL configuration properties to be uploaded to K8s.
 | 
			
		||||
* `--py-files`: the extra Python dependency files to be uploaded to K8s.
 | 
			
		||||
* `--archives`: the conda archive to be uploaded to K8s.
 | 
			
		||||
* `--conf spark.driver.extraClassPath`: upload and register BigDL jars files to the driver's classpath.
 | 
			
		||||
* `--conf spark.executor.extraClassPath`: upload and register BigDL jars files to the executors' classpath.
 | 
			
		||||
* `--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName`: specify the claim name of `persistentVolumeClaim` to mount `persistentVolume` into executor pods.
 | 
			
		||||
* `--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path`: specify the path to be mounted as `persistentVolumeClaim` to executor pods.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
#### 6.2.1 K8s Client
 | 
			
		||||
Before submitting the example on `k8s-client` mode, you should:
 | 
			
		||||
* On the __Client Container__:
 | 
			
		||||
    1. Please call `init_orca_context` at very begining part of each Orca program.
 | 
			
		||||
        ```python
 | 
			
		||||
        from bigdl.orca import init_orca_context
 | 
			
		||||
 | 
			
		||||
        init_orca_context(cluster_mode="spark-submit")
 | 
			
		||||
        ```
 | 
			
		||||
    2. Using Conda to install BigDL and needed Python dependency libraries (see __[Section 3](#3-prepare-environment)__), then pack the current activate Conda environment to an archive.
 | 
			
		||||
        ```bash
 | 
			
		||||
        conda pack -o environment.tar.gz
 | 
			
		||||
        ```
 | 
			
		||||
 | 
			
		||||
Please submit the example following the script below:
 | 
			
		||||
Submit and run the program for `k8s-client` mode following the `spark-submit` script below: 
 | 
			
		||||
```bash
 | 
			
		||||
${SPARK_HOME}/bin/spark-submit \
 | 
			
		||||
    --master ${RUNTIME_SPARK_MASTER} \
 | 
			
		||||
| 
						 | 
				
			
			@ -390,68 +373,41 @@ ${SPARK_HOME}/bin/spark-submit \
 | 
			
		|||
    --total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
 | 
			
		||||
    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
 | 
			
		||||
    --conf spark.pyspark.driver.python=python \
 | 
			
		||||
    --conf spark.pyspark.python=./env/bin/python \
 | 
			
		||||
    --archives /path/to/environment.tar.gz#env \
 | 
			
		||||
    --conf spark.pyspark.python=./environment/bin/python \
 | 
			
		||||
    --archives /path/to/environment.tar.gz#environment \
 | 
			
		||||
    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
 | 
			
		||||
    --py-files ${BIGDL_HOME}/python/bigdl-spark_3.1.2-2.1.0-python-api.zip,/path/to/train.py,/path/to/model.py \
 | 
			
		||||
    --conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/* \
 | 
			
		||||
    --conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/* \
 | 
			
		||||
    --py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,/path/to/train.py,/path/to/model.py \
 | 
			
		||||
    --conf spark.driver.extraClassPath=${BIGDL_HOME}/jars/* \
 | 
			
		||||
    --conf spark.executor.extraClassPath=${BIGDL_HOME}/jars/* \
 | 
			
		||||
    --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
 | 
			
		||||
    --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/bigdl/nfsdata \
 | 
			
		||||
    local:///path/to/train.py --cluster_mode "spark-submit" --remote_dir /bigdl/nfsdata/dataset
 | 
			
		||||
    train.py --cluster_mode spark-submit --remote_dir /bigdl/nfsdata/dataset
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
In the script:
 | 
			
		||||
* `master`: the spark master with a URL format;
 | 
			
		||||
* `deploy-mode`: set it to `client` when submitting in client mode;
 | 
			
		||||
* `name`: the name of Spark application;
 | 
			
		||||
* `spark.driver.host`: the localhost for driver pod (only required when submitting in client mode);
 | 
			
		||||
* `spark.kubernetes.container.image`: the BigDL docker image you downloaded; 
 | 
			
		||||
* `spark.kubernetes.authenticate.driver.serviceAccountName`: the service account for driver pod;
 | 
			
		||||
* `spark.pyspark.driver.python`: specify the Python location in Conda archive as driver's Python environment;
 | 
			
		||||
* `spark.pyspark.python`: specify the Python location in Conda archive as executors' Python environment;
 | 
			
		||||
* `archives`: upload the packed Conda archive to K8s;
 | 
			
		||||
* `properties-file`: upload BigDL configuration properties to K8s;
 | 
			
		||||
* `py-files`: upload extra Python dependency files to K8s;
 | 
			
		||||
* `spark.driver.extraClassPath`: upload and register the BigDL jars files to the driver's classpath;
 | 
			
		||||
* `spark.executor.extraClassPath`: upload and register the BigDL jars files to the executors' classpath;
 | 
			
		||||
* `spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName`: specify the claim name of `persistentVolumeClaim` with specified volumnName to mount `persistentVolume` into executor pods;
 | 
			
		||||
* `spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path`: add specified volumeName of the volumeType `persistentVolumeClaim` to executor pods on the NFS path specified in value;
 | 
			
		||||
* `cluster_mode`: the cluster_mode in `init_orca_context`;
 | 
			
		||||
* `remote_dir`: directory on NFS for loading the dataset.
 | 
			
		||||
In the `spark-submit` script:
 | 
			
		||||
* `deploy-mode`: set it to `client` when running programs on k8s-client mode.
 | 
			
		||||
* `--conf spark.driver.host`: the localhost for the driver pod.
 | 
			
		||||
* `--conf spark.pyspark.driver.python`: set the activate Python location in __Client Container__ as the driver's Python environment.
 | 
			
		||||
* `--conf spark.pyspark.python`: set the Python location in conda archive as each executor's Python environment.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
#### 6.2.2 K8s Cluster
 | 
			
		||||
Before submitting the example on `k8s-cluster` mode, you should:
 | 
			
		||||
* On the __Client Container__:
 | 
			
		||||
    1. Please call `init_orca_context` at very begining part of each Orca program.
 | 
			
		||||
        ```python
 | 
			
		||||
        from bigdl.orca import init_orca_context
 | 
			
		||||
 | 
			
		||||
        init_orca_context(cluster_mode="spark-submit")
 | 
			
		||||
        ```
 | 
			
		||||
    2. Using Conda to install BigDL and needed Python dependency libraries (see __[Section 3](#3-prepare-environment)__), then pack the Conda environment to an archive.
 | 
			
		||||
        ```bash
 | 
			
		||||
        conda pack -o environment.tar.gz
 | 
			
		||||
        ```
 | 
			
		||||
* On the __Develop Node__:
 | 
			
		||||
1. Upload the conda archive to NFS.
 | 
			
		||||
```bash
 | 
			
		||||
docker cp <containerID>:/path/to/environment.tar.gz /bigdl/nfsdata
 | 
			
		||||
```
 | 
			
		||||
2. Upload the example Python file to NFS.
 | 
			
		||||
```bash
 | 
			
		||||
cp /path/to/train.py /bigdl/nfsdata
 | 
			
		||||
```
 | 
			
		||||
3. Upload the extra Python dependency files to NFS.
 | 
			
		||||
```bash
 | 
			
		||||
cp /path/to/model.py /bigdl/nfsdata
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
* On the __Develop Node__ (where you launch the __Client Container__):
 | 
			
		||||
    1. Upload Conda archive to NFS.
 | 
			
		||||
        ```bash
 | 
			
		||||
        docker cp <containerID>:/path/to/environment.tar.gz /bigdl/nfsdata
 | 
			
		||||
        ```
 | 
			
		||||
 | 
			
		||||
    2. Upload the example python files to NFS.
 | 
			
		||||
        ```bash
 | 
			
		||||
        mv /path/to/train.py /bigdl/nfsdata
 | 
			
		||||
        ```
 | 
			
		||||
 | 
			
		||||
    3. Upload the extra Python dependencies to NFS.
 | 
			
		||||
        ```bash
 | 
			
		||||
        mv /path/to/model.py /bigdl/nfsdata
 | 
			
		||||
        ```
 | 
			
		||||
 | 
			
		||||
Please run the example following the script below in the __Client Container__:
 | 
			
		||||
Submit and run the program for `k8s-cluster` mode following the `spark-submit` script below:
 | 
			
		||||
```bash
 | 
			
		||||
${SPARK_HOME}/bin/spark-submit \
 | 
			
		||||
    --master ${RUNTIME_SPARK_MASTER} \
 | 
			
		||||
| 
						 | 
				
			
			@ -460,9 +416,9 @@ ${SPARK_HOME}/bin/spark-submit \
 | 
			
		|||
    --conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
 | 
			
		||||
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
 | 
			
		||||
    --conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
 | 
			
		||||
    --archives file:///bigdl/nfsdata/environment.tar.gz#python_env \
 | 
			
		||||
    --conf spark.pyspark.python=python_env/bin/python \
 | 
			
		||||
    --conf spark.executorEnv.PYTHONHOME=python_env \
 | 
			
		||||
    --archives file:///bigdl/nfsdata/environment.tar.gz#environment \
 | 
			
		||||
    --conf spark.pyspark.python=environment/bin/python \
 | 
			
		||||
    --conf spark.executorEnv.PYTHONHOME=environment \
 | 
			
		||||
    --conf spark.kubernetes.file.upload.path=/bigdl/nfsdata \
 | 
			
		||||
    --executor-cores ${RUNTIME_EXECUTOR_CORES} \
 | 
			
		||||
    --executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
 | 
			
		||||
| 
						 | 
				
			
			@ -477,41 +433,14 @@ ${SPARK_HOME}/bin/spark-submit \
 | 
			
		|||
    --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/bigdl/nfsdata \
 | 
			
		||||
    --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
 | 
			
		||||
    --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/bigdl/nfsdata \
 | 
			
		||||
    file:///bigdl/nfsdata/train.py --cluster_mode "spark-submit" --remote_dir /bigdl/nfsdata/dataset
 | 
			
		||||
    file:///bigdl/nfsdata/train.py --cluster_mode spark-submit --remote_dir /bigdl/nfsdata/dataset
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
In the script:
 | 
			
		||||
* `master`: the spark master with a URL format;
 | 
			
		||||
* `deploy-mode`: set it to `cluster` when submitting in cluster mode;
 | 
			
		||||
* `name`: the name of Spark application;
 | 
			
		||||
* `spark.kubernetes.container.image`: the BigDL docker image you downloaded; 
 | 
			
		||||
* `spark.kubernetes.authenticate.driver.serviceAccountName`: the service account for driver pod;
 | 
			
		||||
* `archives`: upload the Conda archive to K8s;
 | 
			
		||||
* `properties-file`: upload BigDL configuration properties to K8s;
 | 
			
		||||
* `py-files`: upload needed extra Python dependency files to K8s;
 | 
			
		||||
* `spark.pyspark.python`: specify the Python location in Conda archive as executors' Python environment;
 | 
			
		||||
* `spark.executorEnv.PYTHONHOME`: the search path of Python libraries on executor pod;
 | 
			
		||||
* `spark.kubernetes.file.upload.path`: the path to store files at spark submit side in cluster mode;
 | 
			
		||||
* `spark.driver.extraClassPath`: upload and register the BigDL jars files to the driver's classpath;
 | 
			
		||||
* `spark.executor.extraClassPath`: upload and register the BigDL jars files to the executors' classpath;
 | 
			
		||||
* `spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName`: specify the claim name of `persistentVolumeClaim` with specified volumnName to mount `persistentVolume` into driver pod;
 | 
			
		||||
* `spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path`: add specified volumeName of the volumeType `persistentVolumeClaim` to driver pod on the NFS path specified in value;
 | 
			
		||||
* `spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName`: specify the claim name of `persistentVolumeClaim` with specified volumnName to mount `persistentVolume` into executor pods;
 | 
			
		||||
* `spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path`: add specified volumeName of the volumeType `persistentVolumeClaim` to executor pods on the NFS path specified in value;
 | 
			
		||||
* `cluster_mode`: specify the cluster_mode in `init_orca_context`;
 | 
			
		||||
* `remote_dir`: directory on NFS for loading the dataset.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
Please retrieve training stats on the __Develop Node__ following the commands below:
 | 
			
		||||
* Retrive training logs on the driver pod:
 | 
			
		||||
    ```bash
 | 
			
		||||
    kubectl logs `orca-k8s-cluster-tutorial-driver`
 | 
			
		||||
    ```
 | 
			
		||||
 | 
			
		||||
* Check pod status or get basic informations around pod using:
 | 
			
		||||
    ```bash
 | 
			
		||||
    kubectl describe pod `orca-k8s-cluster-tutorial-driver`
 | 
			
		||||
    ```
 | 
			
		||||
In the `spark-submit` script:
 | 
			
		||||
* `deploy-mode`: set it to `cluster` when running programs on k8s-cluster mode.
 | 
			
		||||
* `spark.pyspark.python`: sset the Python location in conda archive as each executor's Python environment.
 | 
			
		||||
* `spark.executorEnv.PYTHONHOME`: the search path of Python libraries on executor pods.
 | 
			
		||||
* `spark.kubernetes.file.upload.path`: the path to store files at spark submit side in k8s-cluster mode.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### 6.3 Use Kubernetes Deployment (with Conda Archive)
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -128,7 +128,7 @@ In the given example, you can specify the argument `--remote_dir` to be the dire
 | 
			
		|||
 | 
			
		||||
---
 | 
			
		||||
## 4. Prepare Custom Modules
 | 
			
		||||
Spark allows to upload Python files (`.py`), and zipped Python packages (`.zip`) across the cluster by setting `--py-files` option in Spark scripts or specifying `extra_python_lib` in `init_orca_context`. 
 | 
			
		||||
Spark allows to upload Python files (`.py`), and zipped Python packages (`.zip`) across the cluster by setting `--py-files` option in Spark scripts or specifying `extra_python_lib` in `init_orca_context`.
 | 
			
		||||
 | 
			
		||||
The FasionMNIST example needs to import modules from [`model.py`](https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/pytorch/FashionMNIST/model.py).
 | 
			
		||||
* When using [`python` command](#use-python-command), please specify `extra_python_lib` in `init_orca_context`.
 | 
			
		||||
| 
						 | 
				
			
			@ -178,7 +178,7 @@ In the following part, we will illustrate three ways to submit and run BigDL Orc
 | 
			
		|||
 | 
			
		||||
You can choose one of them based on your preference or cluster settings.
 | 
			
		||||
 | 
			
		||||
We provide the running command for the [Fashion-MNIST example](https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/pytorch/FashionMNIST/) in this section.
 | 
			
		||||
We provide the running command for the [Fashion-MNIST example](https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/pytorch/FashionMNIST/) on the __Client Node__ in this section.
 | 
			
		||||
 | 
			
		||||
### 5.1 Use `python` Command
 | 
			
		||||
This is the easiest and most recommended way to run BigDL Orca on YARN as a normal Python program. Using this way, you only need to prepare the environment on the __Client Node__ and the environment would be automatically packaged and distributed to the YARN cluster. 
 | 
			
		||||
| 
						 | 
				
			
			@ -208,8 +208,8 @@ jupyter notebook --notebook-dir=/path/to/notebook/directory --ip=* --no-browser
 | 
			
		|||
 | 
			
		||||
You can copy the code in [train.py](https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/pytorch/FashionMNIST/train.py) to the notebook and run the cells. Set the cluster_mode to "yarn-client" in `init_orca_context`.
 | 
			
		||||
```python
 | 
			
		||||
sc = init_orca_context(cluster_mode="yarn-client", cores=4, memory="10g", num_nodes=2, 
 | 
			
		||||
                       driver_cores=2, driver_memory="4g",
 | 
			
		||||
sc = init_orca_context(cluster_mode="yarn-client", cores=4, memory="2g", num_nodes=2, 
 | 
			
		||||
                       driver_cores=2, driver_memory="2g",
 | 
			
		||||
                       extra_python_lib="model.py")
 | 
			
		||||
```
 | 
			
		||||
Note that Jupyter Notebook cannot run on `yarn-cluster` mode, as the driver is not running on the __Client Node__ (where you run the notebook).
 | 
			
		||||
| 
						 | 
				
			
			@ -243,8 +243,8 @@ Submit and run the example for `yarn-client` mode following the `bigdl-submit` s
 | 
			
		|||
bigdl-submit \
 | 
			
		||||
    --master yarn \
 | 
			
		||||
    --deploy-mode client \
 | 
			
		||||
    --executor-memory 10g \
 | 
			
		||||
    --driver-memory 4g \
 | 
			
		||||
    --executor-memory 2g \
 | 
			
		||||
    --driver-memory 2g \
 | 
			
		||||
    --executor-cores 4 \
 | 
			
		||||
    --num-executors 2 \
 | 
			
		||||
    --py-files model.py \
 | 
			
		||||
| 
						 | 
				
			
			@ -255,7 +255,7 @@ bigdl-submit \
 | 
			
		|||
```
 | 
			
		||||
In the `bigdl-submit` script:
 | 
			
		||||
* `--master`: the spark master, set it to "yarn".
 | 
			
		||||
* `--deploy-mode`: set it to "client" when running programs on yarn-client mode.
 | 
			
		||||
* `--deploy-mode`: set it to `client` when running programs on yarn-client mode.
 | 
			
		||||
* `--conf spark.pyspark.driver.python`: set the activate Python location on __Client Node__ as the driver's Python environment. You can find it by running `which python`.
 | 
			
		||||
* `--conf spark.pyspark.python`: set the Python location in conda archive as each executor's Python environment.
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -266,8 +266,8 @@ Submit and run the program for `yarn-cluster` mode following the `bigdl-submit`
 | 
			
		|||
bigdl-submit \
 | 
			
		||||
    --master yarn \
 | 
			
		||||
    --deploy-mode cluster \
 | 
			
		||||
    --executor-memory 10g \
 | 
			
		||||
    --driver-memory 4g \
 | 
			
		||||
    --executor-memory 2g \
 | 
			
		||||
    --driver-memory 2g \
 | 
			
		||||
    --executor-cores 4 \
 | 
			
		||||
    --num-executors 2 \
 | 
			
		||||
    --py-files model.py \
 | 
			
		||||
| 
						 | 
				
			
			@ -278,7 +278,7 @@ bigdl-submit \
 | 
			
		|||
```
 | 
			
		||||
In the `bigdl-submit` script:
 | 
			
		||||
* `--master`: the spark master, set it to "yarn".
 | 
			
		||||
* `--deploy-mode`: set it to "cluster" when running programs on yarn-cluster mode.
 | 
			
		||||
* `--deploy-mode`: set it to `cluster` when running programs on yarn-cluster mode.
 | 
			
		||||
* `--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON`: set the Python location in conda archive as the Python environment of the Application Master.
 | 
			
		||||
* `--conf spark.executorEnv.PYSPARK_PYTHON`: also set the Python location in conda archive as each executor's Python environment. The Application Master and the executors will all use the archive for the Python environment.
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -323,43 +323,17 @@ Some runtime configurations for Spark are as follows:
 | 
			
		|||
* `--num_executors`: the number of executors.
 | 
			
		||||
* `--py-files`: the extra Python dependency files to be uploaded to YARN.
 | 
			
		||||
* `--archives`: the conda archive to be uploaded to YARN.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
#### 5.3.1 Yarn Client
 | 
			
		||||
Submit and run the program for `yarn-client` mode following the `spark-submit` script below: 
 | 
			
		||||
```bash
 | 
			
		||||
${SPARK_HOME}/bin/spark-submit \
 | 
			
		||||
    --master yarn \
 | 
			
		||||
    --deploy-mode client \
 | 
			
		||||
    --executor-memory 10g \
 | 
			
		||||
    --driver-memory 4g \
 | 
			
		||||
    --executor-cores 4 \
 | 
			
		||||
    --num-executors 2 \
 | 
			
		||||
    --archives /path/to/environment.tar.gz#environment \
 | 
			
		||||
    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
 | 
			
		||||
    --conf spark.pyspark.driver.python=/path/to/python \
 | 
			
		||||
    --conf spark.pyspark.python=environment/bin/python \
 | 
			
		||||
    --py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,model.py \
 | 
			
		||||
    --jars ${BIGDL_HOME}/jars/bigdl-assembly-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
 | 
			
		||||
    train.py --cluster_mode spark-submit --remote_dir hdfs://path/to/remote/data
 | 
			
		||||
```
 | 
			
		||||
In the `spark-submit` script:
 | 
			
		||||
* `--master`: the spark master, set it to "yarn".
 | 
			
		||||
* `--deploy-mode`: set it to "client" when running programs on yarn-client mode.
 | 
			
		||||
* `--properties-file`: the BigDL configuration properties to be uploaded to YARN.
 | 
			
		||||
* `--conf spark.pyspark.driver.python`: set the activate Python location on __Client Node__ as the driver's Python environment. You can find the location by running `which python`.
 | 
			
		||||
* `--conf spark.pyspark.python`: set the Python location in conda archive as each executor's Python environment.
 | 
			
		||||
* `--jars`: upload and register BigDL jars to YARN.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
#### 5.3.2 Yarn Cluster
 | 
			
		||||
#### 5.3.1 Yarn Cluster
 | 
			
		||||
Submit and run the program for `yarn-cluster` mode following the `spark-submit` script below:
 | 
			
		||||
```bash
 | 
			
		||||
${SPARK_HOME}/bin/spark-submit \
 | 
			
		||||
    --master yarn \
 | 
			
		||||
    --deploy-mode cluster \
 | 
			
		||||
    --executor-memory 4g \
 | 
			
		||||
    --driver-memory 4g \
 | 
			
		||||
    --executor-memory 2g \
 | 
			
		||||
    --driver-memory 2g \
 | 
			
		||||
    --executor-cores 4 \
 | 
			
		||||
    --num-executors 2 \
 | 
			
		||||
    --archives /path/to/environment.tar.gz#environment \
 | 
			
		||||
| 
						 | 
				
			
			@ -372,8 +346,31 @@ ${SPARK_HOME}/bin/spark-submit \
 | 
			
		|||
```
 | 
			
		||||
In the `spark-submit` script:
 | 
			
		||||
* `--master`: the spark master, set it to "yarn".
 | 
			
		||||
* `--deploy-mode`: set it to "cluster" when running programs on yarn-cluster mode.
 | 
			
		||||
* `--properties-file`: the BigDL configuration properties to be uploaded to YARN.
 | 
			
		||||
* `--deploy-mode`: set it to `cluster` when running programs on yarn-cluster mode.
 | 
			
		||||
* `--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON`: set the Python location in conda archive as the Python environment of the Application Master.
 | 
			
		||||
* `--conf spark.executorEnv.PYSPARK_PYTHON`: also set the Python location in conda archive as each executor's Python environment. The Application Master and the executors will all use the archive for the Python environment.
 | 
			
		||||
* `--jars`: upload and register BigDL jars to YARN.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
#### 5.3.2 Yarn Client
 | 
			
		||||
Submit and run the program for `yarn-client` mode following the `spark-submit` script below: 
 | 
			
		||||
```bash
 | 
			
		||||
${SPARK_HOME}/bin/spark-submit \
 | 
			
		||||
    --master yarn \
 | 
			
		||||
    --deploy-mode client \
 | 
			
		||||
    --executor-memory 2g \
 | 
			
		||||
    --driver-memory 2g \
 | 
			
		||||
    --executor-cores 4 \
 | 
			
		||||
    --num-executors 2 \
 | 
			
		||||
    --archives /path/to/environment.tar.gz#environment \
 | 
			
		||||
    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
 | 
			
		||||
    --conf spark.pyspark.driver.python=/path/to/python \
 | 
			
		||||
    --conf spark.pyspark.python=environment/bin/python \
 | 
			
		||||
    --py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,model.py \
 | 
			
		||||
    --jars ${BIGDL_HOME}/jars/bigdl-assembly-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
 | 
			
		||||
    train.py --cluster_mode spark-submit --remote_dir hdfs://path/to/remote/data
 | 
			
		||||
```
 | 
			
		||||
In the `spark-submit` script:
 | 
			
		||||
* `--master`: the spark master, set it to "yarn".
 | 
			
		||||
* `--deploy-mode`: set it to `client` when running programs on yarn-client mode.
 | 
			
		||||
* `--conf spark.pyspark.driver.python`: set the activate Python location on __Client Node__ as the driver's Python environment. You can find the location by running `which python`.
 | 
			
		||||
* `--conf spark.pyspark.python`: set the Python location in conda archive as each executor's Python environment.
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
		Reference in a new issue