Doc: fix spark-submit command for yarn client and cluster mode. (#6595)
* fix: fix yarn client and cluster spark submit command. * fix: add properties to cluster mode spark submit command and related description.
This commit is contained in:
parent
c80bb4b876
commit
4169d8cb05
1 changed files with 5 additions and 5 deletions
|
|
@ -330,11 +330,10 @@ ${SPARK_HOME}/bin/spark-submit \
|
|||
--num-executors 2 \
|
||||
--archives /path/to/environment.tar.gz#environment \
|
||||
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
|
||||
--py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,model.py \
|
||||
--conf spark.pyspark.driver.python=/path/to/python \
|
||||
--conf spark.pyspark.python=environment/bin/python \
|
||||
--conf spark.driver.extraClassPath=${BIGDL_HOME}/jars/* \
|
||||
--conf spark.executor.extraClassPath=${BIGDL_HOME}/jars/* \
|
||||
--py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,model.py \
|
||||
--jars ${BIGDL_HOME}/jars/bigdl-assembly-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
|
||||
train.py --cluster_mode spark-submit --remote_dir hdfs://path/to/remote/data
|
||||
```
|
||||
In the `spark-submit` script:
|
||||
|
|
@ -343,8 +342,7 @@ In the `spark-submit` script:
|
|||
* `--properties-file`: the BigDL configuration properties to be uploaded to YARN.
|
||||
* `--conf spark.pyspark.driver.python`: set the activate Python location on __Client Node__ as the driver's Python environment. You can find the location by running `which python`.
|
||||
* `--conf spark.pyspark.python`: set the Python location in conda archive as each executor's Python environment.
|
||||
* `--conf spark.driver.extraClassPath`: upload and register the BigDL jars to the driver's classpath.
|
||||
* `--conf spark.executor.extraClassPath`: upload and register the BigDL jars to the executor's classpath.
|
||||
* `--jars`: upload and register BigDL jars to YARN.
|
||||
|
||||
|
||||
#### 5.3.2 Yarn Cluster
|
||||
|
|
@ -358,6 +356,7 @@ ${SPARK_HOME}/bin/spark-submit \
|
|||
--executor-cores 4 \
|
||||
--num-executors 2 \
|
||||
--archives /path/to/environment.tar.gz#environment \
|
||||
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
|
||||
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=environment/bin/python \
|
||||
--conf spark.executorEnv.PYSPARK_PYTHON=environment/bin/python \
|
||||
--py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,model.py \
|
||||
|
|
@ -367,6 +366,7 @@ ${SPARK_HOME}/bin/spark-submit \
|
|||
In the `spark-submit` script:
|
||||
* `--master`: the spark master, set it to "yarn".
|
||||
* `--deploy-mode`: set it to "cluster" when running programs on yarn-cluster mode.
|
||||
* `--properties-file`: the BigDL configuration properties to be uploaded to YARN.
|
||||
* `--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON`: set the Python location in conda archive as the Python environment of the Application Master.
|
||||
* `--conf spark.executorEnv.PYSPARK_PYTHON`: also set the Python location in conda archive as each executor's Python environment. The Application Master and the executors will all use the archive for the Python environment.
|
||||
* `--jars`: upload and register BigDL jars to YARN.
|
||||
|
|
|
|||
Loading…
Reference in a new issue