Update k8s command (#7532)

* remove redundant conf * update command * update command * format * change to num executor * fix * minor * fix * modify cores * remove pythonhome * meet review * minor * rephase * minor * minor * update yarn master * update args
2023-02-15 19:09:27 +08:00 · 2023-02-15 19:09:27 +08:00 · 2e1d977e08
commit 2e1d977e08
parent 7727b4c9ba
2 changed files with 78 additions and 77 deletions
--- a/docs/readthedocs/source/doc/Orca/Tutorial/k8s.md
+++ b/docs/readthedocs/source/doc/Orca/Tutorial/k8s.md
@ -159,12 +159,6 @@ sudo docker run -itd --net=host \
    -e RUNTIME_PERSISTENT_VOLUME_CLAIM=nfsvolumeclaim \
    -e RUNTIME_DRIVER_HOST=x.x.x.x \
    -e RUNTIME_DRIVER_PORT=54321 \
    -e RUNTIME_EXECUTOR_INSTANCES=2 \
    -e RUNTIME_EXECUTOR_CORES=4 \
    -e RUNTIME_EXECUTOR_MEMORY=2g \
    -e RUNTIME_TOTAL_EXECUTOR_CORES=8 \
    -e RUNTIME_DRIVER_CORES=2 \
    -e RUNTIME_DRIVER_MEMORY=2g \
    intelanalytics/bigdl-k8s:latest bash
 ```
@ -177,22 +171,16 @@ In the script:
 * `-v /path/to/nfsdata:/bigdl/nfsdata`: mount NFS path on the host into the container as the specified path (e.g. "/bigdl/nfsdata").
 * `NOTEBOOK_PORT`: an integer that specifies the port number for the Notebook. This is not necessary if you don't use notebook.
 * `NOTEBOOK_TOKEN`: a string that specifies the token for Notebook. This is not necessary if you don't use notebook.
-* `RUNTIME_SPARK_MASTER`: a URL format that specifies the Spark master: k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>.
+* `RUNTIME_SPARK_MASTER`: a URL format that specifies the Spark master: `k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>`.
 * `RUNTIME_K8S_SERVICE_ACCOUNT`: a string that specifies the service account for the driver pod.
 * `RUNTIME_K8S_SPARK_IMAGE`: the name of the BigDL K8s Docker image.
 * `RUNTIME_PERSISTENT_VOLUME_CLAIM`: a string that specifies the Kubernetes volumeName (e.g. "nfsvolumeclaim").
 * `RUNTIME_DRIVER_HOST`: a URL format that specifies the driver localhost (only required if you use k8s-client mode).
 * `RUNTIME_DRIVER_PORT`: a string that specifies the driver port (only required if you use k8s-client mode).
 * `RUNTIME_EXECUTOR_INSTANCES`: an integer that specifies the number of executors.
 * `RUNTIME_EXECUTOR_CORES`: an integer that specifies the number of cores for each executor.
 * `RUNTIME_EXECUTOR_MEMORY`: a string that specifies the memory for each executor.
 * `RUNTIME_TOTAL_EXECUTOR_CORES`: an integer that specifies the number of cores for all executors.
 * `RUNTIME_DRIVER_CORES`: an integer that specifies the number of cores for the driver node.
 * `RUNTIME_DRIVER_MEMORY`: a string that specifies the memory for the driver node.
 __Notes:__
-* The __Client Container__ contains all the required environment except K8s configurations.
+* The __Client Container__ already contains all the required environment configurations for Spark and BigDL Orca.
-* You don't need to create Spark executor containers manually, which are scheduled by K8s at runtime.
+* Spark executor containers are scheduled by K8s at runtime and you don't need to create them manually.
 ### 2.3 Launch the K8s Client Container
@ -209,7 +197,7 @@ In the launched BigDL K8s **Client Container**, please setup the environment fol
 - See [here](../Overview/install.md#install-anaconda) to install conda and prepare the Python environment.
- See [here](../Overview/install.md#to-install-orca-for-spark3) to install BigDL Orca in the created conda environment.
+- See [here](../Overview/install.md#to-install-orca-for-spark3) to install BigDL Orca in the created conda environment. *Note that if you use [`spark-submit`](#use-spark-submit), please __skip__ this step and __DO NOT__ install BigDL Orca with pip install command in the conda environment.*
 - You should install all the other Python libraries that you need in your program in the conda environment as well. `torch` and `torchvision` are needed to run the Fashion-MNIST example we provide:
 ```bash
@ -339,34 +327,42 @@ python /bigdl/nfsdata/train.py --cluster_mode k8s-cluster --data_dir /bigdl/nfsd
 ### 6.2 Use `spark-submit`
-Set the cluster_mode to "bigdl-submit" in `init_orca_context`.
+If you prefer to use `spark-submit`, please follow the steps below to prepare the environment in the __Client Container__. 
 ```python
 init_orca_context(cluster_mode="spark-submit")
 ```
-Pack the current activate conda environment to an archive in the __Client Container__:
+1. Set the cluster_mode to "spark-submit" in `init_orca_context`.
-```bash
+    ```python
-conda pack -o environment.tar.gz
+    sc = init_orca_context(cluster_mode="spark-submit")
-```
+    ```
 2. Download the requirement file(s) from [here](https://github.com/intel-analytics/BigDL/tree/main/python/requirements/orca) and install the required Python libraries of BigDL Orca according to your needs.
    ```bash
    pip install -r /path/to/requirements.txt
    ```
    Note that you are recommended **NOT** to install BigDL Orca with pip install command in the conda environment if you use spark-submit to avoid possible conflicts.
 3. Pack the current activate conda environment to an archive before submitting the example:
    ```bash
    conda pack -o environment.tar.gz
    ```
 Some runtime configurations for Spark are as follows:
 * `--master`: a URL format that specifies the Spark master: k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>.
 * `--name`: the name of the Spark application.
 * `--conf spark.kubernetes.container.image`: the name of the BigDL K8s Docker image.
-* `--conf spark.kubernetes.authenticate.driver.serviceAccountName`: the service account for the driver pod.
+* `--num-executors`: the number of executors.
 * `--conf spark.executor.instances`: the number of executors.
 * `--executor-memory`: the memory for each executor.
 * `--driver-memory`: the memory for the driver node.
 * `--executor-cores`: the number of cores for each executor.
 * `--total-executor-cores`: the total number of executor cores.
 * `--executor-memory`: the memory for each executor.
 * `--driver-cores`: the number of cores for the driver.
 * `--driver-memory`: the memory for the driver.
 * `--properties-file`: the BigDL configuration properties to be uploaded to K8s.
 * `--py-files`: the extra Python dependency files to be uploaded to K8s.
 * `--archives`: the conda archive to be uploaded to K8s.
 * `--conf spark.driver.extraClassPath`: upload and register BigDL jars files to the driver's classpath.
 * `--conf spark.executor.extraClassPath`: upload and register BigDL jars files to the executors' classpath.
 * `--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName`: specify the claim name of `persistentVolumeClaim` to mount `persistentVolume` into executor pods.
-* `--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path`: specify the path to be mounted as `persistentVolumeClaim` to executor pods.
+* `--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path`: specify the path to be mounted as `persistentVolumeClaim` into executor pods.
 #### 6.2.1 K8s Client
@ -378,19 +374,17 @@ ${SPARK_HOME}/bin/spark-submit \
    --name orca-k8s-client-tutorial \
    --conf spark.driver.host=${RUNTIME_DRIVER_HOST} \
    --conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
-    --conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
+    --num-executors 2 \
-    --conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
+    --executor-cores 4 \
-    --driver-cores ${RUNTIME_DRIVER_CORES} \
+    --total-executor-cores 8 \
-    --driver-memory ${RUNTIME_DRIVER_MEMORY} \
+    --executor-memory 2g \
-    --executor-cores ${RUNTIME_EXECUTOR_CORES} \
+    --driver-cores 2 \
-    --executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
+    --driver-memory 2g \
-    --total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
+    --archives /path/to/environment.tar.gz#environment \
    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
    --conf spark.pyspark.driver.python=python \
    --conf spark.pyspark.python=./environment/bin/python \
    --archives /path/to/environment.tar.gz#environment \
    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
-    --py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,/path/to/train.py,/path/to/model.py \
+    --py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,/path/to/model.py \
    --conf spark.driver.extraClassPath=${BIGDL_HOME}/jars/* \
    --conf spark.executor.extraClassPath=${BIGDL_HOME}/jars/* \
    --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
@ -429,18 +423,18 @@ ${SPARK_HOME}/bin/spark-submit \
    --name orca-k8s-cluster-tutorial \
    --conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
-    --conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
+    --num-executors 2 \
    --executor-cores 4 \
    --total-executor-cores 8 \
    --executor-memory 2g \
    --driver-cores 2 \
    --driver-memory 2g \
    --archives file:///bigdl/nfsdata/environment.tar.gz#environment \
    --conf spark.pyspark.driver.python=environment/bin/python \
    --conf spark.pyspark.python=environment/bin/python \
    --conf spark.executorEnv.PYTHONHOME=environment \
    --conf spark.kubernetes.file.upload.path=/bigdl/nfsdata \
    --executor-cores ${RUNTIME_EXECUTOR_CORES} \
    --executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
    --total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
    --driver-cores ${RUNTIME_DRIVER_CORES} \
    --driver-memory ${RUNTIME_DRIVER_MEMORY} \
    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
-    --py-files local://${BIGDL_HOME}/python/bigdl-spark_3.1.2-2.1.0-SNAPSHOT-python-api.zip,file:///bigdl/nfsdata/train.py,file:///bigdl/nfsdata/model.py \
+    --py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,file:///bigdl/nfsdata/train.py,file:///bigdl/nfsdata/model.py \
    --conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/* \
    --conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/* \
    --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
@ -452,9 +446,12 @@ ${SPARK_HOME}/bin/spark-submit \
 In the `spark-submit` script:
 * `deploy-mode`: set it to `cluster` when running programs on k8s-cluster mode.
-* `spark.pyspark.python`: sset the Python location in conda archive as each executor's Python environment.
+* `--conf spark.kubernetes.authenticate.driver.serviceAccountName`: the service account for the driver pod.
-* `spark.executorEnv.PYTHONHOME`: the search path of Python libraries on executor pods.
+* `--conf spark.pyspark.driver.python`: set the Python location in conda archive as the driver's Python environment.
-* `spark.kubernetes.file.upload.path`: the path to store files at spark submit side in k8s-cluster mode.
+* `--conf spark.pyspark.python`: also set the Python location in conda archive as each executor's Python environment.
 * `--conf spark.kubernetes.file.upload.path`: the path to store files at spark submit side in k8s-cluster mode.
 * `--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName`: specify the claim name of `persistentVolumeClaim` to mount `persistentVolume` into the driver pod.
 * `--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path`: specify the path to be mounted as `persistentVolumeClaim` into the driver pod.
 ### 6.3 Use Kubernetes Deployment (with Conda Archive)
--- a/docs/readthedocs/source/doc/Orca/Tutorial/yarn.md
+++ b/docs/readthedocs/source/doc/Orca/Tutorial/yarn.md
@ -91,7 +91,7 @@ __Note__:
 ### 2.2 Install Python Libraries
 - See [here](../Overview/install.md#install-anaconda) to install conda and prepare the Python environment on the __Client Node__.
- See [here](../Overview/install.md#install-bigdl-orca) to install BigDL Orca in the created conda environment. Note that if you use [`spark-submit`](#use-spark-submit), please skip this step and __DO NOT__ install BigDL Orca.
+- See [here](../Overview/install.md#install-bigdl-orca) to install BigDL Orca in the created conda environment. *Note that if you use [`spark-submit`](#use-spark-submit), please __skip__ this step and __DO NOT__ install BigDL Orca with pip install command in the conda environment.*
 - You should install all the other Python libraries that you need in your program in the conda environment as well. `torch` and `torchvision` are needed to run the Fashion-MNIST example:
    ```bash
@ -233,10 +233,12 @@ conda pack -o environment.tar.gz
 Some runtime configurations for Spark are as follows:
-* `--executor-memory`: the memory for each executor.
+* `--master`: the spark master, set it to "yarn".
 * `--driver-memory`: the memory for the driver node.
 * `--executor-cores`: the number of cores for each executor.
 * `--num_executors`: the number of executors.
 * `--executor-cores`: the number of cores for each executor.
 * `--executor-memory`: the memory for each executor.
 * `--driver-cores`: the number of cores for the driver.
 * `--driver-memory`: the memory for the driver.
 * `--py-files`: the extra Python dependency files to be uploaded to YARN.
 * `--archives`: the conda archive to be uploaded to YARN.
@ -246,10 +248,11 @@ Submit and run the example for `yarn-client` mode following the `bigdl-submit` s
 bigdl-submit \
    --master yarn \
    --deploy-mode client \
    --executor-memory 2g \
    --driver-memory 2g \
    --executor-cores 4 \
    --num-executors 2 \
    --executor-cores 4 \
    --executor-memory 2g \
    --driver-cores 2 \
    --driver-memory 2g \
    --py-files model.py \
    --archives /path/to/environment.tar.gz#environment \
    --conf spark.pyspark.driver.python=/path/to/python \
@ -257,7 +260,6 @@ bigdl-submit \
    train.py --cluster_mode bigdl-submit --data_dir hdfs://path/to/remote/data
 ```
 In the `bigdl-submit` script:
 * `--master`: the spark master, set it to "yarn".
 * `--deploy-mode`: set it to `client` when running programs on yarn-client mode.
 * `--conf spark.pyspark.driver.python`: set the activate Python location on __Client Node__ as the driver's Python environment. You can find it by running `which python`.
 * `--conf spark.pyspark.python`: set the Python location in conda archive as each executor's Python environment.
@ -269,10 +271,11 @@ Submit and run the program for `yarn-cluster` mode following the `bigdl-submit`
 bigdl-submit \
    --master yarn \
    --deploy-mode cluster \
    --executor-memory 2g \
    --driver-memory 2g \
    --executor-cores 4 \
    --num-executors 2 \
    --executor-cores 4 \
    --executor-memory 2g \
    --driver-cores 2 \
    --driver-memory 2g \
    --py-files model.py \
    --archives /path/to/environment.tar.gz#environment \
    --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=environment/bin/python \
@ -280,7 +283,6 @@ bigdl-submit \
    train.py --cluster_mode bigdl-submit --data_dir hdfs://path/to/remote/data
 ```
 In the `bigdl-submit` script:
 * `--master`: the spark master, set it to "yarn".
 * `--deploy-mode`: set it to `cluster` when running programs on yarn-cluster mode.
 * `--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON`: set the Python location in conda archive as the Python environment of the Application Master.
 * `--conf spark.executorEnv.PYSPARK_PYTHON`: also set the Python location in conda archive as each executor's Python environment. The Application Master and the executors will all use the archive for the Python environment.
@ -294,11 +296,11 @@ If you prefer to use `spark-submit` instead of `bigdl-submit`, please follow the
    sc = init_orca_context(cluster_mode="spark-submit")
    ```
-2. Download the requirement file from [here](https://github.com/intel-analytics/BigDL/tree/main/python/requirements/orca) and install the required Python libraries of BigDL Orca according to your needs.
+2. Download the requirement file(s) from [here](https://github.com/intel-analytics/BigDL/tree/main/python/requirements/orca) and install the required Python libraries of BigDL Orca according to your needs.
    ```bash
    pip install -r /path/to/requirements.txt
    ```
-    Note that you are recommended **NOT** to install BigDL Orca in the conda environment if you use spark-submit to avoid possible conflicts.
+    Note that you are recommended **NOT** to install BigDL Orca with pip install command in the conda environment if you use spark-submit to avoid possible conflicts.
 3. Pack the current activate conda environment to an archive before submitting the example:
    ```bash
@ -307,22 +309,24 @@ If you prefer to use `spark-submit` instead of `bigdl-submit`, please follow the
 4. Download the BigDL assembly package from [here](../Overview/install.html#download-bigdl-orca) and unzip it. Then setup the environment variables `${BIGDL_HOME}` and `${BIGDL_VERSION}`.
    ```bash
    export BIGDL_HOME=/path/to/unzipped_BigDL  # the folder path where you extract the BigDL package
    export BIGDL_VERSION="downloaded BigDL version"
    export BIGDL_HOME=/path/to/unzipped_BigDL  # the folder path where you extract the BigDL package
    ```
 5. Download and extract [Spark](https://archive.apache.org/dist/spark/). BigDL is currently released for [Spark 2.4](https://archive.apache.org/dist/spark/spark-2.4.6/spark-2.4.6-bin-hadoop2.7.tgz) and [Spark 3.1](https://archive.apache.org/dist/spark/spark-3.1.3/spark-3.1.3-bin-hadoop2.7.tgz). Make sure the version of your downloaded Spark matches the one that your downloaded BigDL is released with. Then setup the environment variables `${SPARK_HOME}` and `${SPARK_VERSION}`.
    ```bash
    export SPARK_HOME=/path/to/uncompressed_spark  # the folder path where you extract the Spark package
    export SPARK_VERSION="downloaded Spark version"
    export SPARK_HOME=/path/to/uncompressed_spark  # the folder path where you extract the Spark package
    ```
 Some runtime configurations for Spark are as follows:
-* `--executor-memory`: the memory for each executor.
+* `--master`: the spark master, set it to "yarn".
 * `--driver-memory`: the memory for the driver node.
 * `--executor-cores`: the number of cores for each executor.
 * `--num_executors`: the number of executors.
 * `--executor-cores`: the number of cores for each executor.
 * `--executor-memory`: the memory for each executor.
 * `--driver-cores`: the number of cores for the driver.
 * `--driver-memory`: the memory for the driver.
 * `--py-files`: the extra Python dependency files to be uploaded to YARN.
 * `--archives`: the conda archive to be uploaded to YARN.
 * `--properties-file`: the BigDL configuration properties to be uploaded to YARN.
@ -334,10 +338,11 @@ Submit and run the program for `yarn-client` mode following the `spark-submit` s
 ${SPARK_HOME}/bin/spark-submit \
    --master yarn \
    --deploy-mode client \
    --executor-memory 2g \
    --driver-memory 2g \
    --executor-cores 4 \
    --num-executors 2 \
    --executor-cores 4 \
    --executor-memory 2g \
    --driver-cores 2 \
    --driver-memory 2g \
    --archives /path/to/environment.tar.gz#environment \
    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
    --conf spark.pyspark.driver.python=/path/to/python \
@ -347,7 +352,6 @@ ${SPARK_HOME}/bin/spark-submit \
    train.py --cluster_mode spark-submit --data_dir hdfs://path/to/remote/data
 ```
 In the `spark-submit` script:
 * `--master`: the spark master, set it to "yarn".
 * `--deploy-mode`: set it to `client` when running programs on yarn-client mode.
 * `--conf spark.pyspark.driver.python`: set the activate Python location on __Client Node__ as the driver's Python environment. You can find the location by running `which python`.
 * `--conf spark.pyspark.python`: set the Python location in conda archive as each executor's Python environment.
@ -358,10 +362,11 @@ Submit and run the program for `yarn-cluster` mode following the `spark-submit`
 ${SPARK_HOME}/bin/spark-submit \
    --master yarn \
    --deploy-mode cluster \
    --executor-memory 2g \
    --driver-memory 2g \
    --executor-cores 4 \
    --num-executors 2 \
    --executor-cores 4 \
    --executor-memory 2g \
    --driver-cores 2 \
    --driver-memory 2g \
    --archives /path/to/environment.tar.gz#environment \
    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
    --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=environment/bin/python \
@ -371,7 +376,6 @@ ${SPARK_HOME}/bin/spark-submit \
    train.py --cluster_mode spark-submit --data_dir hdfs://path/to/remote/data
 ```
 In the `spark-submit` script:
 * `--master`: the spark master, set it to "yarn".
 * `--deploy-mode`: set it to `cluster` when running programs on yarn-cluster mode.
 * `--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON`: set the Python location in conda archive as the Python environment of the Application Master.
 * `--conf spark.executorEnv.PYSPARK_PYTHON`: also set the Python location in conda archive as each executor's Python environment. The Application Master and the executors will all use the archive for the Python environment.