Update spark-submit section in yarn tutorial (#7209)

* update

* update
This commit is contained in:
Kai Huang 2023-01-10 09:55:30 +08:00 committed by GitHub
parent c4874f35c8
commit af9cdc6edd

View file

@ -295,26 +295,27 @@ If you prefer to use `spark-submit` instead of `bigdl-submit`, please follow the
sc = init_orca_context(cluster_mode="spark-submit")
```
2. Download requirement file [here](https://github.com/intel-analytics/BigDL/tree/main/python/requirements/orca) and install required Python libraries of BigDL Orca according to your needs.
2. Download the requirement file from [here](https://github.com/intel-analytics/BigDL/tree/main/python/requirements/orca) and install the required Python libraries of BigDL Orca according to your needs.
```bash
pip install -r /path/to/requirements.txt
```
Note that you are recommended **NOT** to install BigDL Orca in the conda environment if you use spark-submit to avoid possible conflicts.
3. Pack the current activate conda environment to an archive before submitting the example:
```bash
conda pack -o environment.tar.gz
```
4. Download and extract [Spark](https://archive.apache.org/dist/spark/). Then setup the environment variables `${SPARK_HOME}` and `${SPARK_VERSION}`.
4. Download the BigDL assembly package from [here](../Overview/install.html#download-bigdl-orca) and unzip it. Then setup the environment variables `${BIGDL_HOME}` and `${BIGDL_VERSION}`.
```bash
export SPARK_HOME=/path/to/spark # the folder path where you extract the Spark package
export SPARK_VERSION="downloaded spark version"
export BIGDL_HOME=/path/to/unzipped_BigDL # the folder path where you extract the BigDL package
export BIGDL_VERSION="downloaded BigDL version"
```
5. Refer to [here](../Overview/install.html#download-bigdl-orca) to download and unzip a BigDL assembly package. Make sure the Spark version of your downloaded BigDL matches your downloaded Spark. Then setup the environment variables `${BIGDL_HOME}` and `${BIGDL_VERSION}`.
5. Download and extract [Spark](https://archive.apache.org/dist/spark/). BigDL is currently released for [Spark 2.4](https://archive.apache.org/dist/spark/spark-2.4.6/spark-2.4.6-bin-hadoop2.7.tgz) and [Spark 3.1](https://archive.apache.org/dist/spark/spark-3.1.3/spark-3.1.3-bin-hadoop2.7.tgz). Make sure the version of your downloaded Spark matches the one that your downloaded BigDL is released with. Then setup the environment variables `${SPARK_HOME}` and `${SPARK_VERSION}`.
```bash
export BIGDL_HOME=/path/to/unzipped_BigDL
export BIGDL_VERSION="downloaded BigDL version"
export SPARK_HOME=/path/to/uncompressed_spark # the folder path where you extract the Spark package
export SPARK_VERSION="downloaded Spark version"
```
Some runtime configurations for Spark are as follows: