Doc: fix spark-submit command for yarn client and cluster mode. (#6595)

* fix: fix yarn client and cluster spark submit command. * fix: add properties to cluster mode spark submit command and related description.
2022-11-14 20:41:03 +08:00 · 2022-11-14 20:41:03 +08:00 · 4169d8cb05
commit 4169d8cb05
parent c80bb4b876
1 changed files with 5 additions and 5 deletions
--- a/docs/readthedocs/source/doc/Orca/Tutorial/yarn.md
+++ b/docs/readthedocs/source/doc/Orca/Tutorial/yarn.md
@ -330,11 +330,10 @@ ${SPARK_HOME}/bin/spark-submit \
    --num-executors 2 \
    --archives /path/to/environment.tar.gz#environment \
    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
    --py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,model.py \
    --conf spark.pyspark.driver.python=/path/to/python \
    --conf spark.pyspark.python=environment/bin/python \
-    --conf spark.driver.extraClassPath=${BIGDL_HOME}/jars/* \
+    --py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,model.py \
-    --conf spark.executor.extraClassPath=${BIGDL_HOME}/jars/* \
+    --jars ${BIGDL_HOME}/jars/bigdl-assembly-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
    train.py --cluster_mode spark-submit --remote_dir hdfs://path/to/remote/data
 ```
 In the `spark-submit` script:
@ -343,8 +342,7 @@ In the `spark-submit` script:
 * `--properties-file`: the BigDL configuration properties to be uploaded to YARN.
 * `--conf spark.pyspark.driver.python`: set the activate Python location on __Client Node__ as the driver's Python environment. You can find the location by running `which python`.
 * `--conf spark.pyspark.python`: set the Python location in conda archive as each executor's Python environment.
-* `--conf spark.driver.extraClassPath`: upload and register the BigDL jars to the driver's classpath.
+* `--jars`: upload and register BigDL jars to YARN.
 * `--conf spark.executor.extraClassPath`: upload and register the BigDL jars to the executor's classpath.
 #### 5.3.2 Yarn Cluster
@ -358,6 +356,7 @@ ${SPARK_HOME}/bin/spark-submit \
    --executor-cores 4 \
    --num-executors 2 \
    --archives /path/to/environment.tar.gz#environment \
    --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
    --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=environment/bin/python \
    --conf spark.executorEnv.PYSPARK_PYTHON=environment/bin/python \
    --py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,model.py \
@ -367,6 +366,7 @@ ${SPARK_HOME}/bin/spark-submit \
 In the `spark-submit` script:
 * `--master`: the spark master, set it to "yarn".
 * `--deploy-mode`: set it to "cluster" when running programs on yarn-cluster mode.
 * `--properties-file`: the BigDL configuration properties to be uploaded to YARN.
 * `--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON`: set the Python location in conda archive as the Python environment of the Application Master.
 * `--conf spark.executorEnv.PYSPARK_PYTHON`: also set the Python location in conda archive as each executor's Python environment. The Application Master and the executors will all use the archive for the Python environment.
 * `--jars`: upload and register BigDL jars to YARN.