Orca: update yarn tutorial (#6857)

* doc: 1.  update yarn tutorial
2. upload deployment diagram.

* fix: code style.

* fix: remove unnecessary changes.

* fix: code style.

* fix: fix wording.

* fix: fix wording.
This commit is contained in:
Cengguang Zhang 2022-12-02 17:09:53 +08:00 committed by GitHub
parent b82975a8d1
commit b96d39db4a

View file

@ -77,7 +77,7 @@ def train_data_creator(config, batch_size):
Before running BigDL Orca programs on YARN, you need to properly setup the environment following the steps in this section. Before running BigDL Orca programs on YARN, you need to properly setup the environment following the steps in this section.
__Note__: __Note__:
* When using [`python` command](#use-python-command) or [`bigdl-submit`](#use-bigdl-submit), we would directly use the corresponding `pyspark` (which is a dependency of BigDL Orca) for the Spark environment. Thus to avoid possible conflicts, you *DON'T* need to download Spark by yourself or set the environment variable `SPARK_HOME` unless you [`spark-submit`](#use-spark-submit). * When using [`python` command](#use-python-command) or [`bigdl-submit`](#use-bigdl-submit), we would directly use the corresponding `pyspark` (which is a dependency of BigDL Orca) for the Spark environment. Thus to avoid possible conflicts, you *DON'T* need to download Spark by yourself or set the environment variable `SPARK_HOME` unless you use [`spark-submit`](#use-spark-submit).
### 2.1 Setup JAVA & Hadoop Environment ### 2.1 Setup JAVA & Hadoop Environment
@ -91,7 +91,7 @@ export HADOOP_CONF_DIR=/path/to/hadoop/conf
### 2.2 Install Python Libraries ### 2.2 Install Python Libraries
- See [here](../Overview/install.md#install-anaconda) to install conda and prepare the Python environment on the __Client Node__. - See [here](../Overview/install.md#install-anaconda) to install conda and prepare the Python environment on the __Client Node__.
- See [here](../Overview/install.md#install-bigdl-orca) to install BigDL Orca in the created conda environment. - See [here](../Overview/install.md#install-bigdl-orca) to install BigDL Orca in the created conda environment. Note that if you use [`spark-submit`](#use-spark-submit), please skip this step and __DO NOT__ install BigDL Orca.
- You should install all the other Python libraries that you need in your program in the conda environment as well. `torch` and `torchvision` are needed to run the Fashion-MNIST example: - You should install all the other Python libraries that you need in your program in the conda environment as well. `torch` and `torchvision` are needed to run the Fashion-MNIST example:
```bash ```bash
@ -288,33 +288,30 @@ In the `bigdl-submit` script:
### 5.3 Use `spark-submit` ### 5.3 Use `spark-submit`
When you are not able to install BigDL using conda on the __Client Node__ , please use the `spark-submit` script instead. If you prefer to use `spark-submit` instead of `bigdl-submit`, please follow the steps below to prepare the environment on the __Client Node__.
Set the cluster_mode to "spark-submit" in `init_orca_context`. 1. Set the cluster_mode to "spark-submit" in `init_orca_context`.
```python ```python
sc = init_orca_context(cluster_mode="spark-submit") sc = init_orca_context(cluster_mode="spark-submit")
``` ```
Before submitting the application on the __Client Node__, you need to: 2. Download requirement file [here](https://github.com/intel-analytics/BigDL/tree/main/python/requirements/orca) and install required Python libraries of BigDL Orca according to your needs.
- First, prepare the conda environment on a __Development Node__ where conda is available and pack the conda environment to an archive: ```bash
pip install -r /path/to/requirements.txt
```
3. Pack the current activate conda environment to an archive before submitting the example:
```bash ```bash
conda pack -o environment.tar.gz conda pack -o environment.tar.gz
``` ```
- Then send the conda archive to the __Client Node__; 4. Download and extract [Spark](https://archive.apache.org/dist/spark/). Then setup the environment variables `${SPARK_HOME}` and `${SPARK_VERSION}`.
```bash
scp /path/to/environment.tar.gz username@client_ip:/path/to/
```
On the __Client Node__:
- Download and extract [Spark](https://archive.apache.org/dist/spark/). Then setup the environment variables `${SPARK_HOME}` and `${SPARK_VERSION}`.
```bash ```bash
export SPARK_HOME=/path/to/spark # the folder path where you extract the Spark package export SPARK_HOME=/path/to/spark # the folder path where you extract the Spark package
export SPARK_VERSION="downloaded spark version" export SPARK_VERSION="downloaded spark version"
``` ```
- Refer to [here](../Overview/install.html#download-bigdl-orca) to download and unzip a BigDL assembly package. Make sure the Spark version of your downloaded BigDL matches your downloaded Spark. Then setup the environment variables `${BIGDL_HOME}` and `${BIGDL_VERSION}`. 5. Refer to [here](../Overview/install.html#download-bigdl-orca) to download and unzip a BigDL assembly package. Make sure the Spark version of your downloaded BigDL matches your downloaded Spark. Then setup the environment variables `${BIGDL_HOME}` and `${BIGDL_VERSION}`.
```bash ```bash
export BIGDL_HOME=/path/to/unzipped_BigDL export BIGDL_HOME=/path/to/unzipped_BigDL
export BIGDL_VERSION="downloaded BigDL version" export BIGDL_VERSION="downloaded BigDL version"