Doc: fix spark-submit command for yarn client and cluster mode. (#6595)

* fix: fix yarn client and cluster spark submit command.

* fix: add properties to cluster mode spark submit command and related description.
This commit is contained in:
Cengguang Zhang 2022-11-14 20:41:03 +08:00 committed by GitHub
parent c80bb4b876
commit 4169d8cb05

View file

@ -330,11 +330,10 @@ ${SPARK_HOME}/bin/spark-submit \
--num-executors 2 \ --num-executors 2 \
--archives /path/to/environment.tar.gz#environment \ --archives /path/to/environment.tar.gz#environment \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \ --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,model.py \
--conf spark.pyspark.driver.python=/path/to/python \ --conf spark.pyspark.driver.python=/path/to/python \
--conf spark.pyspark.python=environment/bin/python \ --conf spark.pyspark.python=environment/bin/python \
--conf spark.driver.extraClassPath=${BIGDL_HOME}/jars/* \ --py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,model.py \
--conf spark.executor.extraClassPath=${BIGDL_HOME}/jars/* \ --jars ${BIGDL_HOME}/jars/bigdl-assembly-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
train.py --cluster_mode spark-submit --remote_dir hdfs://path/to/remote/data train.py --cluster_mode spark-submit --remote_dir hdfs://path/to/remote/data
``` ```
In the `spark-submit` script: In the `spark-submit` script:
@ -343,8 +342,7 @@ In the `spark-submit` script:
* `--properties-file`: the BigDL configuration properties to be uploaded to YARN. * `--properties-file`: the BigDL configuration properties to be uploaded to YARN.
* `--conf spark.pyspark.driver.python`: set the activate Python location on __Client Node__ as the driver's Python environment. You can find the location by running `which python`. * `--conf spark.pyspark.driver.python`: set the activate Python location on __Client Node__ as the driver's Python environment. You can find the location by running `which python`.
* `--conf spark.pyspark.python`: set the Python location in conda archive as each executor's Python environment. * `--conf spark.pyspark.python`: set the Python location in conda archive as each executor's Python environment.
* `--conf spark.driver.extraClassPath`: upload and register the BigDL jars to the driver's classpath. * `--jars`: upload and register BigDL jars to YARN.
* `--conf spark.executor.extraClassPath`: upload and register the BigDL jars to the executor's classpath.
#### 5.3.2 Yarn Cluster #### 5.3.2 Yarn Cluster
@ -358,6 +356,7 @@ ${SPARK_HOME}/bin/spark-submit \
--executor-cores 4 \ --executor-cores 4 \
--num-executors 2 \ --num-executors 2 \
--archives /path/to/environment.tar.gz#environment \ --archives /path/to/environment.tar.gz#environment \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=environment/bin/python \ --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=environment/bin/python \
--conf spark.executorEnv.PYSPARK_PYTHON=environment/bin/python \ --conf spark.executorEnv.PYSPARK_PYTHON=environment/bin/python \
--py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,model.py \ --py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,model.py \
@ -367,6 +366,7 @@ ${SPARK_HOME}/bin/spark-submit \
In the `spark-submit` script: In the `spark-submit` script:
* `--master`: the spark master, set it to "yarn". * `--master`: the spark master, set it to "yarn".
* `--deploy-mode`: set it to "cluster" when running programs on yarn-cluster mode. * `--deploy-mode`: set it to "cluster" when running programs on yarn-cluster mode.
* `--properties-file`: the BigDL configuration properties to be uploaded to YARN.
* `--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON`: set the Python location in conda archive as the Python environment of the Application Master. * `--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON`: set the Python location in conda archive as the Python environment of the Application Master.
* `--conf spark.executorEnv.PYSPARK_PYTHON`: also set the Python location in conda archive as each executor's Python environment. The Application Master and the executors will all use the archive for the Python environment. * `--conf spark.executorEnv.PYSPARK_PYTHON`: also set the Python location in conda archive as each executor's Python environment. The Application Master and the executors will all use the archive for the Python environment.
* `--jars`: upload and register BigDL jars to YARN. * `--jars`: upload and register BigDL jars to YARN.