Orca: update yarn tutorial (#6857)
* doc: 1. update yarn tutorial 2. upload deployment diagram. * fix: code style. * fix: remove unnecessary changes. * fix: code style. * fix: fix wording. * fix: fix wording.
This commit is contained in:
parent
b82975a8d1
commit
b96d39db4a
1 changed files with 53 additions and 56 deletions
|
|
@ -77,7 +77,7 @@ def train_data_creator(config, batch_size):
|
||||||
Before running BigDL Orca programs on YARN, you need to properly setup the environment following the steps in this section.
|
Before running BigDL Orca programs on YARN, you need to properly setup the environment following the steps in this section.
|
||||||
|
|
||||||
__Note__:
|
__Note__:
|
||||||
* When using [`python` command](#use-python-command) or [`bigdl-submit`](#use-bigdl-submit), we would directly use the corresponding `pyspark` (which is a dependency of BigDL Orca) for the Spark environment. Thus to avoid possible conflicts, you *DON'T* need to download Spark by yourself or set the environment variable `SPARK_HOME` unless you [`spark-submit`](#use-spark-submit).
|
* When using [`python` command](#use-python-command) or [`bigdl-submit`](#use-bigdl-submit), we would directly use the corresponding `pyspark` (which is a dependency of BigDL Orca) for the Spark environment. Thus to avoid possible conflicts, you *DON'T* need to download Spark by yourself or set the environment variable `SPARK_HOME` unless you use [`spark-submit`](#use-spark-submit).
|
||||||
|
|
||||||
|
|
||||||
### 2.1 Setup JAVA & Hadoop Environment
|
### 2.1 Setup JAVA & Hadoop Environment
|
||||||
|
|
@ -91,7 +91,7 @@ export HADOOP_CONF_DIR=/path/to/hadoop/conf
|
||||||
### 2.2 Install Python Libraries
|
### 2.2 Install Python Libraries
|
||||||
- See [here](../Overview/install.md#install-anaconda) to install conda and prepare the Python environment on the __Client Node__.
|
- See [here](../Overview/install.md#install-anaconda) to install conda and prepare the Python environment on the __Client Node__.
|
||||||
|
|
||||||
- See [here](../Overview/install.md#install-bigdl-orca) to install BigDL Orca in the created conda environment.
|
- See [here](../Overview/install.md#install-bigdl-orca) to install BigDL Orca in the created conda environment. Note that if you use [`spark-submit`](#use-spark-submit), please skip this step and __DO NOT__ install BigDL Orca.
|
||||||
|
|
||||||
- You should install all the other Python libraries that you need in your program in the conda environment as well. `torch` and `torchvision` are needed to run the Fashion-MNIST example:
|
- You should install all the other Python libraries that you need in your program in the conda environment as well. `torch` and `torchvision` are needed to run the Fashion-MNIST example:
|
||||||
```bash
|
```bash
|
||||||
|
|
@ -288,33 +288,30 @@ In the `bigdl-submit` script:
|
||||||
|
|
||||||
|
|
||||||
### 5.3 Use `spark-submit`
|
### 5.3 Use `spark-submit`
|
||||||
When you are not able to install BigDL using conda on the __Client Node__ , please use the `spark-submit` script instead.
|
If you prefer to use `spark-submit` instead of `bigdl-submit`, please follow the steps below to prepare the environment on the __Client Node__.
|
||||||
|
|
||||||
Set the cluster_mode to "spark-submit" in `init_orca_context`.
|
1. Set the cluster_mode to "spark-submit" in `init_orca_context`.
|
||||||
```python
|
```python
|
||||||
sc = init_orca_context(cluster_mode="spark-submit")
|
sc = init_orca_context(cluster_mode="spark-submit")
|
||||||
```
|
```
|
||||||
|
|
||||||
Before submitting the application on the __Client Node__, you need to:
|
2. Download requirement file [here](https://github.com/intel-analytics/BigDL/tree/main/python/requirements/orca) and install required Python libraries of BigDL Orca according to your needs.
|
||||||
- First, prepare the conda environment on a __Development Node__ where conda is available and pack the conda environment to an archive:
|
```bash
|
||||||
|
pip install -r /path/to/requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Pack the current activate conda environment to an archive before submitting the example:
|
||||||
```bash
|
```bash
|
||||||
conda pack -o environment.tar.gz
|
conda pack -o environment.tar.gz
|
||||||
```
|
```
|
||||||
|
|
||||||
- Then send the conda archive to the __Client Node__;
|
4. Download and extract [Spark](https://archive.apache.org/dist/spark/). Then setup the environment variables `${SPARK_HOME}` and `${SPARK_VERSION}`.
|
||||||
```bash
|
|
||||||
scp /path/to/environment.tar.gz username@client_ip:/path/to/
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
On the __Client Node__:
|
|
||||||
- Download and extract [Spark](https://archive.apache.org/dist/spark/). Then setup the environment variables `${SPARK_HOME}` and `${SPARK_VERSION}`.
|
|
||||||
```bash
|
```bash
|
||||||
export SPARK_HOME=/path/to/spark # the folder path where you extract the Spark package
|
export SPARK_HOME=/path/to/spark # the folder path where you extract the Spark package
|
||||||
export SPARK_VERSION="downloaded spark version"
|
export SPARK_VERSION="downloaded spark version"
|
||||||
```
|
```
|
||||||
|
|
||||||
- Refer to [here](../Overview/install.html#download-bigdl-orca) to download and unzip a BigDL assembly package. Make sure the Spark version of your downloaded BigDL matches your downloaded Spark. Then setup the environment variables `${BIGDL_HOME}` and `${BIGDL_VERSION}`.
|
5. Refer to [here](../Overview/install.html#download-bigdl-orca) to download and unzip a BigDL assembly package. Make sure the Spark version of your downloaded BigDL matches your downloaded Spark. Then setup the environment variables `${BIGDL_HOME}` and `${BIGDL_VERSION}`.
|
||||||
```bash
|
```bash
|
||||||
export BIGDL_HOME=/path/to/unzipped_BigDL
|
export BIGDL_HOME=/path/to/unzipped_BigDL
|
||||||
export BIGDL_VERSION="downloaded BigDL version"
|
export BIGDL_VERSION="downloaded BigDL version"
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue