Update k8s command (#7532)
* remove redundant conf * update command * update command * format * change to num executor * fix * minor * fix * modify cores * remove pythonhome * meet review * minor * rephase * minor * minor * update yarn master * update args
This commit is contained in:
		
							parent
							
								
									7727b4c9ba
								
							
						
					
					
						commit
						2e1d977e08
					
				
					 2 changed files with 78 additions and 77 deletions
				
			
		| 
						 | 
					@ -159,12 +159,6 @@ sudo docker run -itd --net=host \
 | 
				
			||||||
    -e RUNTIME_PERSISTENT_VOLUME_CLAIM=nfsvolumeclaim \
 | 
					    -e RUNTIME_PERSISTENT_VOLUME_CLAIM=nfsvolumeclaim \
 | 
				
			||||||
    -e RUNTIME_DRIVER_HOST=x.x.x.x \
 | 
					    -e RUNTIME_DRIVER_HOST=x.x.x.x \
 | 
				
			||||||
    -e RUNTIME_DRIVER_PORT=54321 \
 | 
					    -e RUNTIME_DRIVER_PORT=54321 \
 | 
				
			||||||
    -e RUNTIME_EXECUTOR_INSTANCES=2 \
 | 
					 | 
				
			||||||
    -e RUNTIME_EXECUTOR_CORES=4 \
 | 
					 | 
				
			||||||
    -e RUNTIME_EXECUTOR_MEMORY=2g \
 | 
					 | 
				
			||||||
    -e RUNTIME_TOTAL_EXECUTOR_CORES=8 \
 | 
					 | 
				
			||||||
    -e RUNTIME_DRIVER_CORES=2 \
 | 
					 | 
				
			||||||
    -e RUNTIME_DRIVER_MEMORY=2g \
 | 
					 | 
				
			||||||
    intelanalytics/bigdl-k8s:latest bash
 | 
					    intelanalytics/bigdl-k8s:latest bash
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -177,22 +171,16 @@ In the script:
 | 
				
			||||||
* `-v /path/to/nfsdata:/bigdl/nfsdata`: mount NFS path on the host into the container as the specified path (e.g. "/bigdl/nfsdata").
 | 
					* `-v /path/to/nfsdata:/bigdl/nfsdata`: mount NFS path on the host into the container as the specified path (e.g. "/bigdl/nfsdata").
 | 
				
			||||||
* `NOTEBOOK_PORT`: an integer that specifies the port number for the Notebook. This is not necessary if you don't use notebook.
 | 
					* `NOTEBOOK_PORT`: an integer that specifies the port number for the Notebook. This is not necessary if you don't use notebook.
 | 
				
			||||||
* `NOTEBOOK_TOKEN`: a string that specifies the token for Notebook. This is not necessary if you don't use notebook.
 | 
					* `NOTEBOOK_TOKEN`: a string that specifies the token for Notebook. This is not necessary if you don't use notebook.
 | 
				
			||||||
* `RUNTIME_SPARK_MASTER`: a URL format that specifies the Spark master: k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>.
 | 
					* `RUNTIME_SPARK_MASTER`: a URL format that specifies the Spark master: `k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>`.
 | 
				
			||||||
* `RUNTIME_K8S_SERVICE_ACCOUNT`: a string that specifies the service account for the driver pod.
 | 
					* `RUNTIME_K8S_SERVICE_ACCOUNT`: a string that specifies the service account for the driver pod.
 | 
				
			||||||
* `RUNTIME_K8S_SPARK_IMAGE`: the name of the BigDL K8s Docker image.
 | 
					* `RUNTIME_K8S_SPARK_IMAGE`: the name of the BigDL K8s Docker image.
 | 
				
			||||||
* `RUNTIME_PERSISTENT_VOLUME_CLAIM`: a string that specifies the Kubernetes volumeName (e.g. "nfsvolumeclaim").
 | 
					* `RUNTIME_PERSISTENT_VOLUME_CLAIM`: a string that specifies the Kubernetes volumeName (e.g. "nfsvolumeclaim").
 | 
				
			||||||
* `RUNTIME_DRIVER_HOST`: a URL format that specifies the driver localhost (only required if you use k8s-client mode).
 | 
					* `RUNTIME_DRIVER_HOST`: a URL format that specifies the driver localhost (only required if you use k8s-client mode).
 | 
				
			||||||
* `RUNTIME_DRIVER_PORT`: a string that specifies the driver port (only required if you use k8s-client mode).
 | 
					* `RUNTIME_DRIVER_PORT`: a string that specifies the driver port (only required if you use k8s-client mode).
 | 
				
			||||||
* `RUNTIME_EXECUTOR_INSTANCES`: an integer that specifies the number of executors.
 | 
					 | 
				
			||||||
* `RUNTIME_EXECUTOR_CORES`: an integer that specifies the number of cores for each executor.
 | 
					 | 
				
			||||||
* `RUNTIME_EXECUTOR_MEMORY`: a string that specifies the memory for each executor.
 | 
					 | 
				
			||||||
* `RUNTIME_TOTAL_EXECUTOR_CORES`: an integer that specifies the number of cores for all executors.
 | 
					 | 
				
			||||||
* `RUNTIME_DRIVER_CORES`: an integer that specifies the number of cores for the driver node.
 | 
					 | 
				
			||||||
* `RUNTIME_DRIVER_MEMORY`: a string that specifies the memory for the driver node.
 | 
					 | 
				
			||||||
 | 
					
 | 
				
			||||||
__Notes:__
 | 
					__Notes:__
 | 
				
			||||||
* The __Client Container__ contains all the required environment except K8s configurations.
 | 
					* The __Client Container__ already contains all the required environment configurations for Spark and BigDL Orca.
 | 
				
			||||||
* You don't need to create Spark executor containers manually, which are scheduled by K8s at runtime.
 | 
					* Spark executor containers are scheduled by K8s at runtime and you don't need to create them manually.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### 2.3 Launch the K8s Client Container
 | 
					### 2.3 Launch the K8s Client Container
 | 
				
			||||||
| 
						 | 
					@ -209,7 +197,7 @@ In the launched BigDL K8s **Client Container**, please setup the environment fol
 | 
				
			||||||
 | 
					
 | 
				
			||||||
- See [here](../Overview/install.md#install-anaconda) to install conda and prepare the Python environment.
 | 
					- See [here](../Overview/install.md#install-anaconda) to install conda and prepare the Python environment.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
- See [here](../Overview/install.md#to-install-orca-for-spark3) to install BigDL Orca in the created conda environment.
 | 
					- See [here](../Overview/install.md#to-install-orca-for-spark3) to install BigDL Orca in the created conda environment. *Note that if you use [`spark-submit`](#use-spark-submit), please __skip__ this step and __DO NOT__ install BigDL Orca with pip install command in the conda environment.*
 | 
				
			||||||
 | 
					
 | 
				
			||||||
- You should install all the other Python libraries that you need in your program in the conda environment as well. `torch` and `torchvision` are needed to run the Fashion-MNIST example we provide:
 | 
					- You should install all the other Python libraries that you need in your program in the conda environment as well. `torch` and `torchvision` are needed to run the Fashion-MNIST example we provide:
 | 
				
			||||||
```bash
 | 
					```bash
 | 
				
			||||||
| 
						 | 
					@ -339,34 +327,42 @@ python /bigdl/nfsdata/train.py --cluster_mode k8s-cluster --data_dir /bigdl/nfsd
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### 6.2 Use `spark-submit`
 | 
					### 6.2 Use `spark-submit`
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Set the cluster_mode to "bigdl-submit" in `init_orca_context`.
 | 
					If you prefer to use `spark-submit`, please follow the steps below to prepare the environment in the __Client Container__. 
 | 
				
			||||||
```python
 | 
					 | 
				
			||||||
init_orca_context(cluster_mode="spark-submit")
 | 
					 | 
				
			||||||
```
 | 
					 | 
				
			||||||
 | 
					
 | 
				
			||||||
Pack the current activate conda environment to an archive in the __Client Container__:
 | 
					1. Set the cluster_mode to "spark-submit" in `init_orca_context`.
 | 
				
			||||||
```bash
 | 
					    ```python
 | 
				
			||||||
conda pack -o environment.tar.gz
 | 
					    sc = init_orca_context(cluster_mode="spark-submit")
 | 
				
			||||||
```
 | 
					    ```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					2. Download the requirement file(s) from [here](https://github.com/intel-analytics/BigDL/tree/main/python/requirements/orca) and install the required Python libraries of BigDL Orca according to your needs.
 | 
				
			||||||
 | 
					    ```bash
 | 
				
			||||||
 | 
					    pip install -r /path/to/requirements.txt
 | 
				
			||||||
 | 
					    ```
 | 
				
			||||||
 | 
					    Note that you are recommended **NOT** to install BigDL Orca with pip install command in the conda environment if you use spark-submit to avoid possible conflicts.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					3. Pack the current activate conda environment to an archive before submitting the example:
 | 
				
			||||||
 | 
					    ```bash
 | 
				
			||||||
 | 
					    conda pack -o environment.tar.gz
 | 
				
			||||||
 | 
					    ```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Some runtime configurations for Spark are as follows:
 | 
					Some runtime configurations for Spark are as follows:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `--master`: a URL format that specifies the Spark master: k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>.
 | 
					* `--master`: a URL format that specifies the Spark master: k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>.
 | 
				
			||||||
* `--name`: the name of the Spark application.
 | 
					* `--name`: the name of the Spark application.
 | 
				
			||||||
* `--conf spark.kubernetes.container.image`: the name of the BigDL K8s Docker image.
 | 
					* `--conf spark.kubernetes.container.image`: the name of the BigDL K8s Docker image.
 | 
				
			||||||
* `--conf spark.kubernetes.authenticate.driver.serviceAccountName`: the service account for the driver pod.
 | 
					* `--num-executors`: the number of executors.
 | 
				
			||||||
* `--conf spark.executor.instances`: the number of executors.
 | 
					 | 
				
			||||||
* `--executor-memory`: the memory for each executor.
 | 
					 | 
				
			||||||
* `--driver-memory`: the memory for the driver node.
 | 
					 | 
				
			||||||
* `--executor-cores`: the number of cores for each executor.
 | 
					* `--executor-cores`: the number of cores for each executor.
 | 
				
			||||||
* `--total-executor-cores`: the total number of executor cores.
 | 
					* `--total-executor-cores`: the total number of executor cores.
 | 
				
			||||||
 | 
					* `--executor-memory`: the memory for each executor.
 | 
				
			||||||
 | 
					* `--driver-cores`: the number of cores for the driver.
 | 
				
			||||||
 | 
					* `--driver-memory`: the memory for the driver.
 | 
				
			||||||
* `--properties-file`: the BigDL configuration properties to be uploaded to K8s.
 | 
					* `--properties-file`: the BigDL configuration properties to be uploaded to K8s.
 | 
				
			||||||
* `--py-files`: the extra Python dependency files to be uploaded to K8s.
 | 
					* `--py-files`: the extra Python dependency files to be uploaded to K8s.
 | 
				
			||||||
* `--archives`: the conda archive to be uploaded to K8s.
 | 
					* `--archives`: the conda archive to be uploaded to K8s.
 | 
				
			||||||
* `--conf spark.driver.extraClassPath`: upload and register BigDL jars files to the driver's classpath.
 | 
					* `--conf spark.driver.extraClassPath`: upload and register BigDL jars files to the driver's classpath.
 | 
				
			||||||
* `--conf spark.executor.extraClassPath`: upload and register BigDL jars files to the executors' classpath.
 | 
					* `--conf spark.executor.extraClassPath`: upload and register BigDL jars files to the executors' classpath.
 | 
				
			||||||
* `--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName`: specify the claim name of `persistentVolumeClaim` to mount `persistentVolume` into executor pods.
 | 
					* `--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName`: specify the claim name of `persistentVolumeClaim` to mount `persistentVolume` into executor pods.
 | 
				
			||||||
* `--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path`: specify the path to be mounted as `persistentVolumeClaim` to executor pods.
 | 
					* `--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path`: specify the path to be mounted as `persistentVolumeClaim` into executor pods.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
#### 6.2.1 K8s Client
 | 
					#### 6.2.1 K8s Client
 | 
				
			||||||
| 
						 | 
					@ -378,19 +374,17 @@ ${SPARK_HOME}/bin/spark-submit \
 | 
				
			||||||
    --name orca-k8s-client-tutorial \
 | 
					    --name orca-k8s-client-tutorial \
 | 
				
			||||||
    --conf spark.driver.host=${RUNTIME_DRIVER_HOST} \
 | 
					    --conf spark.driver.host=${RUNTIME_DRIVER_HOST} \
 | 
				
			||||||
    --conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
 | 
					    --conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
 | 
				
			||||||
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
 | 
					    --num-executors 2 \
 | 
				
			||||||
    --conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
 | 
					    --executor-cores 4 \
 | 
				
			||||||
    --driver-cores ${RUNTIME_DRIVER_CORES} \
 | 
					    --total-executor-cores 8 \
 | 
				
			||||||
    --driver-memory ${RUNTIME_DRIVER_MEMORY} \
 | 
					    --executor-memory 2g \
 | 
				
			||||||
    --executor-cores ${RUNTIME_EXECUTOR_CORES} \
 | 
					    --driver-cores 2 \
 | 
				
			||||||
    --executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
 | 
					    --driver-memory 2g \
 | 
				
			||||||
    --total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
 | 
					    --archives /path/to/environment.tar.gz#environment \
 | 
				
			||||||
    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
 | 
					 | 
				
			||||||
    --conf spark.pyspark.driver.python=python \
 | 
					    --conf spark.pyspark.driver.python=python \
 | 
				
			||||||
    --conf spark.pyspark.python=./environment/bin/python \
 | 
					    --conf spark.pyspark.python=./environment/bin/python \
 | 
				
			||||||
    --archives /path/to/environment.tar.gz#environment \
 | 
					 | 
				
			||||||
    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
 | 
					    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
 | 
				
			||||||
    --py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,/path/to/train.py,/path/to/model.py \
 | 
					    --py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,/path/to/model.py \
 | 
				
			||||||
    --conf spark.driver.extraClassPath=${BIGDL_HOME}/jars/* \
 | 
					    --conf spark.driver.extraClassPath=${BIGDL_HOME}/jars/* \
 | 
				
			||||||
    --conf spark.executor.extraClassPath=${BIGDL_HOME}/jars/* \
 | 
					    --conf spark.executor.extraClassPath=${BIGDL_HOME}/jars/* \
 | 
				
			||||||
    --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
 | 
					    --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
 | 
				
			||||||
| 
						 | 
					@ -429,18 +423,18 @@ ${SPARK_HOME}/bin/spark-submit \
 | 
				
			||||||
    --name orca-k8s-cluster-tutorial \
 | 
					    --name orca-k8s-cluster-tutorial \
 | 
				
			||||||
    --conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
 | 
					    --conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
 | 
				
			||||||
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
 | 
					    --conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
 | 
				
			||||||
    --conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
 | 
					    --num-executors 2 \
 | 
				
			||||||
 | 
					    --executor-cores 4 \
 | 
				
			||||||
 | 
					    --total-executor-cores 8 \
 | 
				
			||||||
 | 
					    --executor-memory 2g \
 | 
				
			||||||
 | 
					    --driver-cores 2 \
 | 
				
			||||||
 | 
					    --driver-memory 2g \
 | 
				
			||||||
    --archives file:///bigdl/nfsdata/environment.tar.gz#environment \
 | 
					    --archives file:///bigdl/nfsdata/environment.tar.gz#environment \
 | 
				
			||||||
 | 
					    --conf spark.pyspark.driver.python=environment/bin/python \
 | 
				
			||||||
    --conf spark.pyspark.python=environment/bin/python \
 | 
					    --conf spark.pyspark.python=environment/bin/python \
 | 
				
			||||||
    --conf spark.executorEnv.PYTHONHOME=environment \
 | 
					 | 
				
			||||||
    --conf spark.kubernetes.file.upload.path=/bigdl/nfsdata \
 | 
					    --conf spark.kubernetes.file.upload.path=/bigdl/nfsdata \
 | 
				
			||||||
    --executor-cores ${RUNTIME_EXECUTOR_CORES} \
 | 
					 | 
				
			||||||
    --executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
 | 
					 | 
				
			||||||
    --total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
 | 
					 | 
				
			||||||
    --driver-cores ${RUNTIME_DRIVER_CORES} \
 | 
					 | 
				
			||||||
    --driver-memory ${RUNTIME_DRIVER_MEMORY} \
 | 
					 | 
				
			||||||
    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
 | 
					    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
 | 
				
			||||||
    --py-files local://${BIGDL_HOME}/python/bigdl-spark_3.1.2-2.1.0-SNAPSHOT-python-api.zip,file:///bigdl/nfsdata/train.py,file:///bigdl/nfsdata/model.py \
 | 
					    --py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,file:///bigdl/nfsdata/train.py,file:///bigdl/nfsdata/model.py \
 | 
				
			||||||
    --conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/* \
 | 
					    --conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/* \
 | 
				
			||||||
    --conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/* \
 | 
					    --conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/* \
 | 
				
			||||||
    --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
 | 
					    --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
 | 
				
			||||||
| 
						 | 
					@ -452,9 +446,12 @@ ${SPARK_HOME}/bin/spark-submit \
 | 
				
			||||||
 | 
					
 | 
				
			||||||
In the `spark-submit` script:
 | 
					In the `spark-submit` script:
 | 
				
			||||||
* `deploy-mode`: set it to `cluster` when running programs on k8s-cluster mode.
 | 
					* `deploy-mode`: set it to `cluster` when running programs on k8s-cluster mode.
 | 
				
			||||||
* `spark.pyspark.python`: sset the Python location in conda archive as each executor's Python environment.
 | 
					* `--conf spark.kubernetes.authenticate.driver.serviceAccountName`: the service account for the driver pod.
 | 
				
			||||||
* `spark.executorEnv.PYTHONHOME`: the search path of Python libraries on executor pods.
 | 
					* `--conf spark.pyspark.driver.python`: set the Python location in conda archive as the driver's Python environment.
 | 
				
			||||||
* `spark.kubernetes.file.upload.path`: the path to store files at spark submit side in k8s-cluster mode.
 | 
					* `--conf spark.pyspark.python`: also set the Python location in conda archive as each executor's Python environment.
 | 
				
			||||||
 | 
					* `--conf spark.kubernetes.file.upload.path`: the path to store files at spark submit side in k8s-cluster mode.
 | 
				
			||||||
 | 
					* `--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName`: specify the claim name of `persistentVolumeClaim` to mount `persistentVolume` into the driver pod.
 | 
				
			||||||
 | 
					* `--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path`: specify the path to be mounted as `persistentVolumeClaim` into the driver pod.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### 6.3 Use Kubernetes Deployment (with Conda Archive)
 | 
					### 6.3 Use Kubernetes Deployment (with Conda Archive)
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -91,7 +91,7 @@ __Note__:
 | 
				
			||||||
### 2.2 Install Python Libraries
 | 
					### 2.2 Install Python Libraries
 | 
				
			||||||
- See [here](../Overview/install.md#install-anaconda) to install conda and prepare the Python environment on the __Client Node__.
 | 
					- See [here](../Overview/install.md#install-anaconda) to install conda and prepare the Python environment on the __Client Node__.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
- See [here](../Overview/install.md#install-bigdl-orca) to install BigDL Orca in the created conda environment. Note that if you use [`spark-submit`](#use-spark-submit), please skip this step and __DO NOT__ install BigDL Orca.
 | 
					- See [here](../Overview/install.md#install-bigdl-orca) to install BigDL Orca in the created conda environment. *Note that if you use [`spark-submit`](#use-spark-submit), please __skip__ this step and __DO NOT__ install BigDL Orca with pip install command in the conda environment.*
 | 
				
			||||||
 | 
					
 | 
				
			||||||
- You should install all the other Python libraries that you need in your program in the conda environment as well. `torch` and `torchvision` are needed to run the Fashion-MNIST example:
 | 
					- You should install all the other Python libraries that you need in your program in the conda environment as well. `torch` and `torchvision` are needed to run the Fashion-MNIST example:
 | 
				
			||||||
    ```bash
 | 
					    ```bash
 | 
				
			||||||
| 
						 | 
					@ -233,10 +233,12 @@ conda pack -o environment.tar.gz
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Some runtime configurations for Spark are as follows:
 | 
					Some runtime configurations for Spark are as follows:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `--executor-memory`: the memory for each executor.
 | 
					* `--master`: the spark master, set it to "yarn".
 | 
				
			||||||
* `--driver-memory`: the memory for the driver node.
 | 
					 | 
				
			||||||
* `--executor-cores`: the number of cores for each executor.
 | 
					 | 
				
			||||||
* `--num_executors`: the number of executors.
 | 
					* `--num_executors`: the number of executors.
 | 
				
			||||||
 | 
					* `--executor-cores`: the number of cores for each executor.
 | 
				
			||||||
 | 
					* `--executor-memory`: the memory for each executor.
 | 
				
			||||||
 | 
					* `--driver-cores`: the number of cores for the driver.
 | 
				
			||||||
 | 
					* `--driver-memory`: the memory for the driver.
 | 
				
			||||||
* `--py-files`: the extra Python dependency files to be uploaded to YARN.
 | 
					* `--py-files`: the extra Python dependency files to be uploaded to YARN.
 | 
				
			||||||
* `--archives`: the conda archive to be uploaded to YARN.
 | 
					* `--archives`: the conda archive to be uploaded to YARN.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -246,10 +248,11 @@ Submit and run the example for `yarn-client` mode following the `bigdl-submit` s
 | 
				
			||||||
bigdl-submit \
 | 
					bigdl-submit \
 | 
				
			||||||
    --master yarn \
 | 
					    --master yarn \
 | 
				
			||||||
    --deploy-mode client \
 | 
					    --deploy-mode client \
 | 
				
			||||||
    --executor-memory 2g \
 | 
					 | 
				
			||||||
    --driver-memory 2g \
 | 
					 | 
				
			||||||
    --executor-cores 4 \
 | 
					 | 
				
			||||||
    --num-executors 2 \
 | 
					    --num-executors 2 \
 | 
				
			||||||
 | 
					    --executor-cores 4 \
 | 
				
			||||||
 | 
					    --executor-memory 2g \
 | 
				
			||||||
 | 
					    --driver-cores 2 \
 | 
				
			||||||
 | 
					    --driver-memory 2g \
 | 
				
			||||||
    --py-files model.py \
 | 
					    --py-files model.py \
 | 
				
			||||||
    --archives /path/to/environment.tar.gz#environment \
 | 
					    --archives /path/to/environment.tar.gz#environment \
 | 
				
			||||||
    --conf spark.pyspark.driver.python=/path/to/python \
 | 
					    --conf spark.pyspark.driver.python=/path/to/python \
 | 
				
			||||||
| 
						 | 
					@ -257,7 +260,6 @@ bigdl-submit \
 | 
				
			||||||
    train.py --cluster_mode bigdl-submit --data_dir hdfs://path/to/remote/data
 | 
					    train.py --cluster_mode bigdl-submit --data_dir hdfs://path/to/remote/data
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
In the `bigdl-submit` script:
 | 
					In the `bigdl-submit` script:
 | 
				
			||||||
* `--master`: the spark master, set it to "yarn".
 | 
					 | 
				
			||||||
* `--deploy-mode`: set it to `client` when running programs on yarn-client mode.
 | 
					* `--deploy-mode`: set it to `client` when running programs on yarn-client mode.
 | 
				
			||||||
* `--conf spark.pyspark.driver.python`: set the activate Python location on __Client Node__ as the driver's Python environment. You can find it by running `which python`.
 | 
					* `--conf spark.pyspark.driver.python`: set the activate Python location on __Client Node__ as the driver's Python environment. You can find it by running `which python`.
 | 
				
			||||||
* `--conf spark.pyspark.python`: set the Python location in conda archive as each executor's Python environment.
 | 
					* `--conf spark.pyspark.python`: set the Python location in conda archive as each executor's Python environment.
 | 
				
			||||||
| 
						 | 
					@ -269,10 +271,11 @@ Submit and run the program for `yarn-cluster` mode following the `bigdl-submit`
 | 
				
			||||||
bigdl-submit \
 | 
					bigdl-submit \
 | 
				
			||||||
    --master yarn \
 | 
					    --master yarn \
 | 
				
			||||||
    --deploy-mode cluster \
 | 
					    --deploy-mode cluster \
 | 
				
			||||||
    --executor-memory 2g \
 | 
					 | 
				
			||||||
    --driver-memory 2g \
 | 
					 | 
				
			||||||
    --executor-cores 4 \
 | 
					 | 
				
			||||||
    --num-executors 2 \
 | 
					    --num-executors 2 \
 | 
				
			||||||
 | 
					    --executor-cores 4 \
 | 
				
			||||||
 | 
					    --executor-memory 2g \
 | 
				
			||||||
 | 
					    --driver-cores 2 \
 | 
				
			||||||
 | 
					    --driver-memory 2g \
 | 
				
			||||||
    --py-files model.py \
 | 
					    --py-files model.py \
 | 
				
			||||||
    --archives /path/to/environment.tar.gz#environment \
 | 
					    --archives /path/to/environment.tar.gz#environment \
 | 
				
			||||||
    --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=environment/bin/python \
 | 
					    --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=environment/bin/python \
 | 
				
			||||||
| 
						 | 
					@ -280,7 +283,6 @@ bigdl-submit \
 | 
				
			||||||
    train.py --cluster_mode bigdl-submit --data_dir hdfs://path/to/remote/data
 | 
					    train.py --cluster_mode bigdl-submit --data_dir hdfs://path/to/remote/data
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
In the `bigdl-submit` script:
 | 
					In the `bigdl-submit` script:
 | 
				
			||||||
* `--master`: the spark master, set it to "yarn".
 | 
					 | 
				
			||||||
* `--deploy-mode`: set it to `cluster` when running programs on yarn-cluster mode.
 | 
					* `--deploy-mode`: set it to `cluster` when running programs on yarn-cluster mode.
 | 
				
			||||||
* `--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON`: set the Python location in conda archive as the Python environment of the Application Master.
 | 
					* `--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON`: set the Python location in conda archive as the Python environment of the Application Master.
 | 
				
			||||||
* `--conf spark.executorEnv.PYSPARK_PYTHON`: also set the Python location in conda archive as each executor's Python environment. The Application Master and the executors will all use the archive for the Python environment.
 | 
					* `--conf spark.executorEnv.PYSPARK_PYTHON`: also set the Python location in conda archive as each executor's Python environment. The Application Master and the executors will all use the archive for the Python environment.
 | 
				
			||||||
| 
						 | 
					@ -294,11 +296,11 @@ If you prefer to use `spark-submit` instead of `bigdl-submit`, please follow the
 | 
				
			||||||
    sc = init_orca_context(cluster_mode="spark-submit")
 | 
					    sc = init_orca_context(cluster_mode="spark-submit")
 | 
				
			||||||
    ```
 | 
					    ```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
2. Download the requirement file from [here](https://github.com/intel-analytics/BigDL/tree/main/python/requirements/orca) and install the required Python libraries of BigDL Orca according to your needs.
 | 
					2. Download the requirement file(s) from [here](https://github.com/intel-analytics/BigDL/tree/main/python/requirements/orca) and install the required Python libraries of BigDL Orca according to your needs.
 | 
				
			||||||
    ```bash
 | 
					    ```bash
 | 
				
			||||||
    pip install -r /path/to/requirements.txt
 | 
					    pip install -r /path/to/requirements.txt
 | 
				
			||||||
    ```
 | 
					    ```
 | 
				
			||||||
    Note that you are recommended **NOT** to install BigDL Orca in the conda environment if you use spark-submit to avoid possible conflicts.
 | 
					    Note that you are recommended **NOT** to install BigDL Orca with pip install command in the conda environment if you use spark-submit to avoid possible conflicts.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
3. Pack the current activate conda environment to an archive before submitting the example:
 | 
					3. Pack the current activate conda environment to an archive before submitting the example:
 | 
				
			||||||
    ```bash
 | 
					    ```bash
 | 
				
			||||||
| 
						 | 
					@ -307,22 +309,24 @@ If you prefer to use `spark-submit` instead of `bigdl-submit`, please follow the
 | 
				
			||||||
 | 
					
 | 
				
			||||||
4. Download the BigDL assembly package from [here](../Overview/install.html#download-bigdl-orca) and unzip it. Then setup the environment variables `${BIGDL_HOME}` and `${BIGDL_VERSION}`.
 | 
					4. Download the BigDL assembly package from [here](../Overview/install.html#download-bigdl-orca) and unzip it. Then setup the environment variables `${BIGDL_HOME}` and `${BIGDL_VERSION}`.
 | 
				
			||||||
    ```bash
 | 
					    ```bash
 | 
				
			||||||
    export BIGDL_HOME=/path/to/unzipped_BigDL  # the folder path where you extract the BigDL package
 | 
					 | 
				
			||||||
    export BIGDL_VERSION="downloaded BigDL version"
 | 
					    export BIGDL_VERSION="downloaded BigDL version"
 | 
				
			||||||
 | 
					    export BIGDL_HOME=/path/to/unzipped_BigDL  # the folder path where you extract the BigDL package
 | 
				
			||||||
    ```
 | 
					    ```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
5. Download and extract [Spark](https://archive.apache.org/dist/spark/). BigDL is currently released for [Spark 2.4](https://archive.apache.org/dist/spark/spark-2.4.6/spark-2.4.6-bin-hadoop2.7.tgz) and [Spark 3.1](https://archive.apache.org/dist/spark/spark-3.1.3/spark-3.1.3-bin-hadoop2.7.tgz). Make sure the version of your downloaded Spark matches the one that your downloaded BigDL is released with. Then setup the environment variables `${SPARK_HOME}` and `${SPARK_VERSION}`.
 | 
					5. Download and extract [Spark](https://archive.apache.org/dist/spark/). BigDL is currently released for [Spark 2.4](https://archive.apache.org/dist/spark/spark-2.4.6/spark-2.4.6-bin-hadoop2.7.tgz) and [Spark 3.1](https://archive.apache.org/dist/spark/spark-3.1.3/spark-3.1.3-bin-hadoop2.7.tgz). Make sure the version of your downloaded Spark matches the one that your downloaded BigDL is released with. Then setup the environment variables `${SPARK_HOME}` and `${SPARK_VERSION}`.
 | 
				
			||||||
    ```bash
 | 
					    ```bash
 | 
				
			||||||
    export SPARK_HOME=/path/to/uncompressed_spark  # the folder path where you extract the Spark package
 | 
					 | 
				
			||||||
    export SPARK_VERSION="downloaded Spark version"
 | 
					    export SPARK_VERSION="downloaded Spark version"
 | 
				
			||||||
 | 
					    export SPARK_HOME=/path/to/uncompressed_spark  # the folder path where you extract the Spark package
 | 
				
			||||||
    ```
 | 
					    ```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Some runtime configurations for Spark are as follows:
 | 
					Some runtime configurations for Spark are as follows:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `--executor-memory`: the memory for each executor.
 | 
					* `--master`: the spark master, set it to "yarn".
 | 
				
			||||||
* `--driver-memory`: the memory for the driver node.
 | 
					 | 
				
			||||||
* `--executor-cores`: the number of cores for each executor.
 | 
					 | 
				
			||||||
* `--num_executors`: the number of executors.
 | 
					* `--num_executors`: the number of executors.
 | 
				
			||||||
 | 
					* `--executor-cores`: the number of cores for each executor.
 | 
				
			||||||
 | 
					* `--executor-memory`: the memory for each executor.
 | 
				
			||||||
 | 
					* `--driver-cores`: the number of cores for the driver.
 | 
				
			||||||
 | 
					* `--driver-memory`: the memory for the driver.
 | 
				
			||||||
* `--py-files`: the extra Python dependency files to be uploaded to YARN.
 | 
					* `--py-files`: the extra Python dependency files to be uploaded to YARN.
 | 
				
			||||||
* `--archives`: the conda archive to be uploaded to YARN.
 | 
					* `--archives`: the conda archive to be uploaded to YARN.
 | 
				
			||||||
* `--properties-file`: the BigDL configuration properties to be uploaded to YARN.
 | 
					* `--properties-file`: the BigDL configuration properties to be uploaded to YARN.
 | 
				
			||||||
| 
						 | 
					@ -334,10 +338,11 @@ Submit and run the program for `yarn-client` mode following the `spark-submit` s
 | 
				
			||||||
${SPARK_HOME}/bin/spark-submit \
 | 
					${SPARK_HOME}/bin/spark-submit \
 | 
				
			||||||
    --master yarn \
 | 
					    --master yarn \
 | 
				
			||||||
    --deploy-mode client \
 | 
					    --deploy-mode client \
 | 
				
			||||||
    --executor-memory 2g \
 | 
					 | 
				
			||||||
    --driver-memory 2g \
 | 
					 | 
				
			||||||
    --executor-cores 4 \
 | 
					 | 
				
			||||||
    --num-executors 2 \
 | 
					    --num-executors 2 \
 | 
				
			||||||
 | 
					    --executor-cores 4 \
 | 
				
			||||||
 | 
					    --executor-memory 2g \
 | 
				
			||||||
 | 
					    --driver-cores 2 \
 | 
				
			||||||
 | 
					    --driver-memory 2g \
 | 
				
			||||||
    --archives /path/to/environment.tar.gz#environment \
 | 
					    --archives /path/to/environment.tar.gz#environment \
 | 
				
			||||||
    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
 | 
					    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
 | 
				
			||||||
    --conf spark.pyspark.driver.python=/path/to/python \
 | 
					    --conf spark.pyspark.driver.python=/path/to/python \
 | 
				
			||||||
| 
						 | 
					@ -347,7 +352,6 @@ ${SPARK_HOME}/bin/spark-submit \
 | 
				
			||||||
    train.py --cluster_mode spark-submit --data_dir hdfs://path/to/remote/data
 | 
					    train.py --cluster_mode spark-submit --data_dir hdfs://path/to/remote/data
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
In the `spark-submit` script:
 | 
					In the `spark-submit` script:
 | 
				
			||||||
* `--master`: the spark master, set it to "yarn".
 | 
					 | 
				
			||||||
* `--deploy-mode`: set it to `client` when running programs on yarn-client mode.
 | 
					* `--deploy-mode`: set it to `client` when running programs on yarn-client mode.
 | 
				
			||||||
* `--conf spark.pyspark.driver.python`: set the activate Python location on __Client Node__ as the driver's Python environment. You can find the location by running `which python`.
 | 
					* `--conf spark.pyspark.driver.python`: set the activate Python location on __Client Node__ as the driver's Python environment. You can find the location by running `which python`.
 | 
				
			||||||
* `--conf spark.pyspark.python`: set the Python location in conda archive as each executor's Python environment.
 | 
					* `--conf spark.pyspark.python`: set the Python location in conda archive as each executor's Python environment.
 | 
				
			||||||
| 
						 | 
					@ -358,10 +362,11 @@ Submit and run the program for `yarn-cluster` mode following the `spark-submit`
 | 
				
			||||||
${SPARK_HOME}/bin/spark-submit \
 | 
					${SPARK_HOME}/bin/spark-submit \
 | 
				
			||||||
    --master yarn \
 | 
					    --master yarn \
 | 
				
			||||||
    --deploy-mode cluster \
 | 
					    --deploy-mode cluster \
 | 
				
			||||||
    --executor-memory 2g \
 | 
					 | 
				
			||||||
    --driver-memory 2g \
 | 
					 | 
				
			||||||
    --executor-cores 4 \
 | 
					 | 
				
			||||||
    --num-executors 2 \
 | 
					    --num-executors 2 \
 | 
				
			||||||
 | 
					    --executor-cores 4 \
 | 
				
			||||||
 | 
					    --executor-memory 2g \
 | 
				
			||||||
 | 
					    --driver-cores 2 \
 | 
				
			||||||
 | 
					    --driver-memory 2g \
 | 
				
			||||||
    --archives /path/to/environment.tar.gz#environment \
 | 
					    --archives /path/to/environment.tar.gz#environment \
 | 
				
			||||||
    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
 | 
					    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
 | 
				
			||||||
    --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=environment/bin/python \
 | 
					    --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=environment/bin/python \
 | 
				
			||||||
| 
						 | 
					@ -371,7 +376,6 @@ ${SPARK_HOME}/bin/spark-submit \
 | 
				
			||||||
    train.py --cluster_mode spark-submit --data_dir hdfs://path/to/remote/data
 | 
					    train.py --cluster_mode spark-submit --data_dir hdfs://path/to/remote/data
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
In the `spark-submit` script:
 | 
					In the `spark-submit` script:
 | 
				
			||||||
* `--master`: the spark master, set it to "yarn".
 | 
					 | 
				
			||||||
* `--deploy-mode`: set it to `cluster` when running programs on yarn-cluster mode.
 | 
					* `--deploy-mode`: set it to `cluster` when running programs on yarn-cluster mode.
 | 
				
			||||||
* `--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON`: set the Python location in conda archive as the Python environment of the Application Master.
 | 
					* `--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON`: set the Python location in conda archive as the Python environment of the Application Master.
 | 
				
			||||||
* `--conf spark.executorEnv.PYSPARK_PYTHON`: also set the Python location in conda archive as each executor's Python environment. The Application Master and the executors will all use the archive for the Python environment.
 | 
					* `--conf spark.executorEnv.PYSPARK_PYTHON`: also set the Python location in conda archive as each executor's Python environment. The Application Master and the executors will all use the archive for the Python environment.
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in a new issue