Update k8s command (#7532)

* remove redundant conf

* update command

* update command

* format

* change to num executor

* fix

* minor

* fix

* modify cores

* remove pythonhome

* meet review

* minor

* rephase

* minor

* minor

* update yarn master

* update args
This commit is contained in:
Kai Huang 2023-02-15 19:09:27 +08:00 committed by GitHub
parent 7727b4c9ba
commit 2e1d977e08
2 changed files with 78 additions and 77 deletions

View file

@ -159,12 +159,6 @@ sudo docker run -itd --net=host \
-e RUNTIME_PERSISTENT_VOLUME_CLAIM=nfsvolumeclaim \
-e RUNTIME_DRIVER_HOST=x.x.x.x \
-e RUNTIME_DRIVER_PORT=54321 \
-e RUNTIME_EXECUTOR_INSTANCES=2 \
-e RUNTIME_EXECUTOR_CORES=4 \
-e RUNTIME_EXECUTOR_MEMORY=2g \
-e RUNTIME_TOTAL_EXECUTOR_CORES=8 \
-e RUNTIME_DRIVER_CORES=2 \
-e RUNTIME_DRIVER_MEMORY=2g \
intelanalytics/bigdl-k8s:latest bash
```
@ -177,22 +171,16 @@ In the script:
* `-v /path/to/nfsdata:/bigdl/nfsdata`: mount NFS path on the host into the container as the specified path (e.g. "/bigdl/nfsdata").
* `NOTEBOOK_PORT`: an integer that specifies the port number for the Notebook. This is not necessary if you don't use notebook.
* `NOTEBOOK_TOKEN`: a string that specifies the token for Notebook. This is not necessary if you don't use notebook.
* `RUNTIME_SPARK_MASTER`: a URL format that specifies the Spark master: k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>.
* `RUNTIME_SPARK_MASTER`: a URL format that specifies the Spark master: `k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>`.
* `RUNTIME_K8S_SERVICE_ACCOUNT`: a string that specifies the service account for the driver pod.
* `RUNTIME_K8S_SPARK_IMAGE`: the name of the BigDL K8s Docker image.
* `RUNTIME_PERSISTENT_VOLUME_CLAIM`: a string that specifies the Kubernetes volumeName (e.g. "nfsvolumeclaim").
* `RUNTIME_DRIVER_HOST`: a URL format that specifies the driver localhost (only required if you use k8s-client mode).
* `RUNTIME_DRIVER_PORT`: a string that specifies the driver port (only required if you use k8s-client mode).
* `RUNTIME_EXECUTOR_INSTANCES`: an integer that specifies the number of executors.
* `RUNTIME_EXECUTOR_CORES`: an integer that specifies the number of cores for each executor.
* `RUNTIME_EXECUTOR_MEMORY`: a string that specifies the memory for each executor.
* `RUNTIME_TOTAL_EXECUTOR_CORES`: an integer that specifies the number of cores for all executors.
* `RUNTIME_DRIVER_CORES`: an integer that specifies the number of cores for the driver node.
* `RUNTIME_DRIVER_MEMORY`: a string that specifies the memory for the driver node.
__Notes:__
* The __Client Container__ contains all the required environment except K8s configurations.
* You don't need to create Spark executor containers manually, which are scheduled by K8s at runtime.
* The __Client Container__ already contains all the required environment configurations for Spark and BigDL Orca.
* Spark executor containers are scheduled by K8s at runtime and you don't need to create them manually.
### 2.3 Launch the K8s Client Container
@ -209,7 +197,7 @@ In the launched BigDL K8s **Client Container**, please setup the environment fol
- See [here](../Overview/install.md#install-anaconda) to install conda and prepare the Python environment.
- See [here](../Overview/install.md#to-install-orca-for-spark3) to install BigDL Orca in the created conda environment.
- See [here](../Overview/install.md#to-install-orca-for-spark3) to install BigDL Orca in the created conda environment. *Note that if you use [`spark-submit`](#use-spark-submit), please __skip__ this step and __DO NOT__ install BigDL Orca with pip install command in the conda environment.*
- You should install all the other Python libraries that you need in your program in the conda environment as well. `torch` and `torchvision` are needed to run the Fashion-MNIST example we provide:
```bash
@ -339,34 +327,42 @@ python /bigdl/nfsdata/train.py --cluster_mode k8s-cluster --data_dir /bigdl/nfsd
### 6.2 Use `spark-submit`
Set the cluster_mode to "bigdl-submit" in `init_orca_context`.
```python
init_orca_context(cluster_mode="spark-submit")
```
If you prefer to use `spark-submit`, please follow the steps below to prepare the environment in the __Client Container__.
Pack the current activate conda environment to an archive in the __Client Container__:
```bash
conda pack -o environment.tar.gz
```
1. Set the cluster_mode to "spark-submit" in `init_orca_context`.
```python
sc = init_orca_context(cluster_mode="spark-submit")
```
2. Download the requirement file(s) from [here](https://github.com/intel-analytics/BigDL/tree/main/python/requirements/orca) and install the required Python libraries of BigDL Orca according to your needs.
```bash
pip install -r /path/to/requirements.txt
```
Note that you are recommended **NOT** to install BigDL Orca with pip install command in the conda environment if you use spark-submit to avoid possible conflicts.
3. Pack the current activate conda environment to an archive before submitting the example:
```bash
conda pack -o environment.tar.gz
```
Some runtime configurations for Spark are as follows:
* `--master`: a URL format that specifies the Spark master: k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>.
* `--name`: the name of the Spark application.
* `--conf spark.kubernetes.container.image`: the name of the BigDL K8s Docker image.
* `--conf spark.kubernetes.authenticate.driver.serviceAccountName`: the service account for the driver pod.
* `--conf spark.executor.instances`: the number of executors.
* `--executor-memory`: the memory for each executor.
* `--driver-memory`: the memory for the driver node.
* `--num-executors`: the number of executors.
* `--executor-cores`: the number of cores for each executor.
* `--total-executor-cores`: the total number of executor cores.
* `--executor-memory`: the memory for each executor.
* `--driver-cores`: the number of cores for the driver.
* `--driver-memory`: the memory for the driver.
* `--properties-file`: the BigDL configuration properties to be uploaded to K8s.
* `--py-files`: the extra Python dependency files to be uploaded to K8s.
* `--archives`: the conda archive to be uploaded to K8s.
* `--conf spark.driver.extraClassPath`: upload and register BigDL jars files to the driver's classpath.
* `--conf spark.executor.extraClassPath`: upload and register BigDL jars files to the executors' classpath.
* `--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName`: specify the claim name of `persistentVolumeClaim` to mount `persistentVolume` into executor pods.
* `--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path`: specify the path to be mounted as `persistentVolumeClaim` to executor pods.
* `--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path`: specify the path to be mounted as `persistentVolumeClaim` into executor pods.
#### 6.2.1 K8s Client
@ -378,19 +374,17 @@ ${SPARK_HOME}/bin/spark-submit \
--name orca-k8s-client-tutorial \
--conf spark.driver.host=${RUNTIME_DRIVER_HOST} \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--num-executors 2 \
--executor-cores 4 \
--total-executor-cores 8 \
--executor-memory 2g \
--driver-cores 2 \
--driver-memory 2g \
--archives /path/to/environment.tar.gz#environment \
--conf spark.pyspark.driver.python=python \
--conf spark.pyspark.python=./environment/bin/python \
--archives /path/to/environment.tar.gz#environment \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,/path/to/train.py,/path/to/model.py \
--py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,/path/to/model.py \
--conf spark.driver.extraClassPath=${BIGDL_HOME}/jars/* \
--conf spark.executor.extraClassPath=${BIGDL_HOME}/jars/* \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
@ -429,18 +423,18 @@ ${SPARK_HOME}/bin/spark-submit \
--name orca-k8s-cluster-tutorial \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--num-executors 2 \
--executor-cores 4 \
--total-executor-cores 8 \
--executor-memory 2g \
--driver-cores 2 \
--driver-memory 2g \
--archives file:///bigdl/nfsdata/environment.tar.gz#environment \
--conf spark.pyspark.driver.python=environment/bin/python \
--conf spark.pyspark.python=environment/bin/python \
--conf spark.executorEnv.PYTHONHOME=environment \
--conf spark.kubernetes.file.upload.path=/bigdl/nfsdata \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-spark_3.1.2-2.1.0-SNAPSHOT-python-api.zip,file:///bigdl/nfsdata/train.py,file:///bigdl/nfsdata/model.py \
--py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,file:///bigdl/nfsdata/train.py,file:///bigdl/nfsdata/model.py \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/* \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/* \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
@ -452,9 +446,12 @@ ${SPARK_HOME}/bin/spark-submit \
In the `spark-submit` script:
* `deploy-mode`: set it to `cluster` when running programs on k8s-cluster mode.
* `spark.pyspark.python`: sset the Python location in conda archive as each executor's Python environment.
* `spark.executorEnv.PYTHONHOME`: the search path of Python libraries on executor pods.
* `spark.kubernetes.file.upload.path`: the path to store files at spark submit side in k8s-cluster mode.
* `--conf spark.kubernetes.authenticate.driver.serviceAccountName`: the service account for the driver pod.
* `--conf spark.pyspark.driver.python`: set the Python location in conda archive as the driver's Python environment.
* `--conf spark.pyspark.python`: also set the Python location in conda archive as each executor's Python environment.
* `--conf spark.kubernetes.file.upload.path`: the path to store files at spark submit side in k8s-cluster mode.
* `--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName`: specify the claim name of `persistentVolumeClaim` to mount `persistentVolume` into the driver pod.
* `--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path`: specify the path to be mounted as `persistentVolumeClaim` into the driver pod.
### 6.3 Use Kubernetes Deployment (with Conda Archive)

View file

@ -91,7 +91,7 @@ __Note__:
### 2.2 Install Python Libraries
- See [here](../Overview/install.md#install-anaconda) to install conda and prepare the Python environment on the __Client Node__.
- See [here](../Overview/install.md#install-bigdl-orca) to install BigDL Orca in the created conda environment. Note that if you use [`spark-submit`](#use-spark-submit), please skip this step and __DO NOT__ install BigDL Orca.
- See [here](../Overview/install.md#install-bigdl-orca) to install BigDL Orca in the created conda environment. *Note that if you use [`spark-submit`](#use-spark-submit), please __skip__ this step and __DO NOT__ install BigDL Orca with pip install command in the conda environment.*
- You should install all the other Python libraries that you need in your program in the conda environment as well. `torch` and `torchvision` are needed to run the Fashion-MNIST example:
```bash
@ -233,10 +233,12 @@ conda pack -o environment.tar.gz
Some runtime configurations for Spark are as follows:
* `--executor-memory`: the memory for each executor.
* `--driver-memory`: the memory for the driver node.
* `--executor-cores`: the number of cores for each executor.
* `--master`: the spark master, set it to "yarn".
* `--num_executors`: the number of executors.
* `--executor-cores`: the number of cores for each executor.
* `--executor-memory`: the memory for each executor.
* `--driver-cores`: the number of cores for the driver.
* `--driver-memory`: the memory for the driver.
* `--py-files`: the extra Python dependency files to be uploaded to YARN.
* `--archives`: the conda archive to be uploaded to YARN.
@ -246,10 +248,11 @@ Submit and run the example for `yarn-client` mode following the `bigdl-submit` s
bigdl-submit \
--master yarn \
--deploy-mode client \
--executor-memory 2g \
--driver-memory 2g \
--executor-cores 4 \
--num-executors 2 \
--executor-cores 4 \
--executor-memory 2g \
--driver-cores 2 \
--driver-memory 2g \
--py-files model.py \
--archives /path/to/environment.tar.gz#environment \
--conf spark.pyspark.driver.python=/path/to/python \
@ -257,7 +260,6 @@ bigdl-submit \
train.py --cluster_mode bigdl-submit --data_dir hdfs://path/to/remote/data
```
In the `bigdl-submit` script:
* `--master`: the spark master, set it to "yarn".
* `--deploy-mode`: set it to `client` when running programs on yarn-client mode.
* `--conf spark.pyspark.driver.python`: set the activate Python location on __Client Node__ as the driver's Python environment. You can find it by running `which python`.
* `--conf spark.pyspark.python`: set the Python location in conda archive as each executor's Python environment.
@ -269,10 +271,11 @@ Submit and run the program for `yarn-cluster` mode following the `bigdl-submit`
bigdl-submit \
--master yarn \
--deploy-mode cluster \
--executor-memory 2g \
--driver-memory 2g \
--executor-cores 4 \
--num-executors 2 \
--executor-cores 4 \
--executor-memory 2g \
--driver-cores 2 \
--driver-memory 2g \
--py-files model.py \
--archives /path/to/environment.tar.gz#environment \
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=environment/bin/python \
@ -280,7 +283,6 @@ bigdl-submit \
train.py --cluster_mode bigdl-submit --data_dir hdfs://path/to/remote/data
```
In the `bigdl-submit` script:
* `--master`: the spark master, set it to "yarn".
* `--deploy-mode`: set it to `cluster` when running programs on yarn-cluster mode.
* `--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON`: set the Python location in conda archive as the Python environment of the Application Master.
* `--conf spark.executorEnv.PYSPARK_PYTHON`: also set the Python location in conda archive as each executor's Python environment. The Application Master and the executors will all use the archive for the Python environment.
@ -294,11 +296,11 @@ If you prefer to use `spark-submit` instead of `bigdl-submit`, please follow the
sc = init_orca_context(cluster_mode="spark-submit")
```
2. Download the requirement file from [here](https://github.com/intel-analytics/BigDL/tree/main/python/requirements/orca) and install the required Python libraries of BigDL Orca according to your needs.
2. Download the requirement file(s) from [here](https://github.com/intel-analytics/BigDL/tree/main/python/requirements/orca) and install the required Python libraries of BigDL Orca according to your needs.
```bash
pip install -r /path/to/requirements.txt
```
Note that you are recommended **NOT** to install BigDL Orca in the conda environment if you use spark-submit to avoid possible conflicts.
Note that you are recommended **NOT** to install BigDL Orca with pip install command in the conda environment if you use spark-submit to avoid possible conflicts.
3. Pack the current activate conda environment to an archive before submitting the example:
```bash
@ -307,22 +309,24 @@ If you prefer to use `spark-submit` instead of `bigdl-submit`, please follow the
4. Download the BigDL assembly package from [here](../Overview/install.html#download-bigdl-orca) and unzip it. Then setup the environment variables `${BIGDL_HOME}` and `${BIGDL_VERSION}`.
```bash
export BIGDL_HOME=/path/to/unzipped_BigDL # the folder path where you extract the BigDL package
export BIGDL_VERSION="downloaded BigDL version"
export BIGDL_HOME=/path/to/unzipped_BigDL # the folder path where you extract the BigDL package
```
5. Download and extract [Spark](https://archive.apache.org/dist/spark/). BigDL is currently released for [Spark 2.4](https://archive.apache.org/dist/spark/spark-2.4.6/spark-2.4.6-bin-hadoop2.7.tgz) and [Spark 3.1](https://archive.apache.org/dist/spark/spark-3.1.3/spark-3.1.3-bin-hadoop2.7.tgz). Make sure the version of your downloaded Spark matches the one that your downloaded BigDL is released with. Then setup the environment variables `${SPARK_HOME}` and `${SPARK_VERSION}`.
```bash
export SPARK_HOME=/path/to/uncompressed_spark # the folder path where you extract the Spark package
export SPARK_VERSION="downloaded Spark version"
export SPARK_HOME=/path/to/uncompressed_spark # the folder path where you extract the Spark package
```
Some runtime configurations for Spark are as follows:
* `--executor-memory`: the memory for each executor.
* `--driver-memory`: the memory for the driver node.
* `--executor-cores`: the number of cores for each executor.
* `--master`: the spark master, set it to "yarn".
* `--num_executors`: the number of executors.
* `--executor-cores`: the number of cores for each executor.
* `--executor-memory`: the memory for each executor.
* `--driver-cores`: the number of cores for the driver.
* `--driver-memory`: the memory for the driver.
* `--py-files`: the extra Python dependency files to be uploaded to YARN.
* `--archives`: the conda archive to be uploaded to YARN.
* `--properties-file`: the BigDL configuration properties to be uploaded to YARN.
@ -334,10 +338,11 @@ Submit and run the program for `yarn-client` mode following the `spark-submit` s
${SPARK_HOME}/bin/spark-submit \
--master yarn \
--deploy-mode client \
--executor-memory 2g \
--driver-memory 2g \
--executor-cores 4 \
--num-executors 2 \
--executor-cores 4 \
--executor-memory 2g \
--driver-cores 2 \
--driver-memory 2g \
--archives /path/to/environment.tar.gz#environment \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--conf spark.pyspark.driver.python=/path/to/python \
@ -347,7 +352,6 @@ ${SPARK_HOME}/bin/spark-submit \
train.py --cluster_mode spark-submit --data_dir hdfs://path/to/remote/data
```
In the `spark-submit` script:
* `--master`: the spark master, set it to "yarn".
* `--deploy-mode`: set it to `client` when running programs on yarn-client mode.
* `--conf spark.pyspark.driver.python`: set the activate Python location on __Client Node__ as the driver's Python environment. You can find the location by running `which python`.
* `--conf spark.pyspark.python`: set the Python location in conda archive as each executor's Python environment.
@ -358,10 +362,11 @@ Submit and run the program for `yarn-cluster` mode following the `spark-submit`
${SPARK_HOME}/bin/spark-submit \
--master yarn \
--deploy-mode cluster \
--executor-memory 2g \
--driver-memory 2g \
--executor-cores 4 \
--num-executors 2 \
--executor-cores 4 \
--executor-memory 2g \
--driver-cores 2 \
--driver-memory 2g \
--archives /path/to/environment.tar.gz#environment \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=environment/bin/python \
@ -371,7 +376,6 @@ ${SPARK_HOME}/bin/spark-submit \
train.py --cluster_mode spark-submit --data_dir hdfs://path/to/remote/data
```
In the `spark-submit` script:
* `--master`: the spark master, set it to "yarn".
* `--deploy-mode`: set it to `cluster` when running programs on yarn-cluster mode.
* `--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON`: set the Python location in conda archive as the Python environment of the Application Master.
* `--conf spark.executorEnv.PYSPARK_PYTHON`: also set the Python location in conda archive as each executor's Python environment. The Application Master and the executors will all use the archive for the Python environment.