Update Howto guides (#7361)

* update ray

* update tf
This commit is contained in:
Kai Huang 2023-01-29 15:12:27 +08:00 committed by GitHub
parent 6a4c40fc1f
commit 6c07f95d1c
3 changed files with 42 additions and 47 deletions

View file

@ -10,27 +10,28 @@
### Step 0: Prepare Environment ### Step 0: Prepare Environment
We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../../UserGuide/python.md) for more details. We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../Overview/install.md) for more details.
```bash ```bash
conda create -n bigdl python=3.7 # "bigdl" is conda environment name, you can use any name you like. conda create -n py37 python=3.7 # "py37" is conda environment name, you can use any name you like.
conda activate bigdl conda activate py37
pip install bigdl-orca[ray] pip install bigdl-orca[ray]
``` ```
### Step 1: Init Orca Context ### Step 1: Init Orca Context
We recommend using `init_orca_context` to initiate and run BigDL on the underlying cluster. The Ray cluster would be launched automatically by specifying `init_ray_on_spark=True`. The Ray cluster would be launched automatically by specifying `init_ray_on_spark=True` in `init_orca_context`.
```python ```python
from bigdl.orca import init_orca_context from bigdl.orca import init_orca_context, stop_orca_context
if cluster_mode == "local": # For local machine if cluster_mode == "local": # For local machine
sc = init_orca_context(cluster_mode="local", cores=4, memory="10g", init_ray_on_spark=True) sc = init_orca_context(cluster_mode="local", cores=4, memory="4g", init_ray_on_spark=True)
elif cluster_mode == "k8s": # For K8s cluster elif cluster_mode == "k8s": # For K8s cluster
sc = init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2, memory="10g", driver_memory="10g", driver_cores=1, init_ray_on_spark=True) sc = init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2, memory="4g", init_ray_on_spark=True, master=..., container_image=...)
elif cluster_mode == "yarn": # For Hadoop/YARN cluster elif cluster_mode == "yarn": # For Hadoop/YARN cluster
sc = init_orca_context(cluster_mode="yarn", num_nodes=2, cores=2, memory="10g", driver_memory="10g", driver_cores=1, init_ray_on_spark=True) sc = init_orca_context(cluster_mode="yarn", num_nodes=2, cores=2, memory="4g", init_ray_on_spark=True)
``` ```
This is the only place where you need to specify local or distributed mode. See [here](../Overview/ray.md#initialize) for more RayOnSpark related arguments when you `init_orca_context`. This is the only place where you need to specify local or distributed mode. See [here](../Overview/ray.md#initialize) for more RayOnSpark related arguments when you `init_orca_context`.
@ -43,10 +44,6 @@ from bigdl.orca import OrcaContext
OrcaContext.barrier_mode = False OrcaContext.barrier_mode = False
``` ```
View [Orca Context](../Overview/orca-context.md) for more details.
**Note:** You should `export HADOOP_CONF_DIR=/path/to/hadoop/conf/dir` when running on Hadoop YARN cluster. View [Hadoop User Guide](../../UserGuide/hadoop.md) for more details.
You can retrieve the information of the Ray cluster via `OrcaContext`: You can retrieve the information of the Ray cluster via `OrcaContext`:
```python ```python
@ -57,11 +54,16 @@ address_info = ray_ctx.address_info # The dictionary information of the ray clu
redis_address = ray_ctx.redis_address # The redis address of the ray cluster. redis_address = ray_ctx.redis_address # The redis address of the ray cluster.
``` ```
View [Orca Context](../Overview/orca-context.md) for more details.
Please check the tutorials if you want to run on [Kubernetes](../Tutorial/k8s.md) or [Hadoop/YARN](../Tutorial/yarn.md) clusters.
### Step 2: Run Ray Applications ### Step 2: Run Ray Applications
After the initialization, you can directly write Ray code inline with your Spark code, and run Ray programs on the underlying existing Big Data clusters. Ray [tasks](https://docs.ray.io/en/master/walkthrough.html#remote-functions-tasks) and [actors](https://docs.ray.io/en/master/actors.html) would be launched across the cluster. After the initialization, you can directly write Ray code inline with your Spark code, and run Ray programs on the underlying existing Big Data clusters. Ray [tasks](https://docs.ray.io/en/master/walkthrough.html#remote-functions-tasks) and [actors](https://docs.ray.io/en/master/actors.html) would be launched across the cluster.
The following example uses actor handles to implement a parameter server example for distributed asynchronous stochastic gradient descent. This is a simple Ray example for demonstration purpose. Similarly, you can write other Ray applications as you wish. The following example uses actor handles to implement a parameter server example for distributed asynchronous stochastic gradient descent. This is a simple Ray example for demonstration purpose. You can write other Ray applications as you wish in a similar way.
A parameter server is simply an object that stores the parameters (or "weights") of a machine learning model (this could be a neural network, a linear model, or something else). It exposes two methods: one for getting the parameters and one for updating the parameters. A parameter server is simply an object that stores the parameters (or "weights") of a machine learning model (this could be a neural network, a linear model, or something else). It exposes two methods: one for getting the parameters and one for updating the parameters.
@ -109,7 +111,7 @@ def worker(ps, dim, num_iters):
# Sleep a little to simulate a real workload. # Sleep a little to simulate a real workload.
time.sleep(0.5) time.sleep(0.5)
# Test that worker is implemented correctly. You do not need to change this line. # Test that worker is implemented correctly.
ray.get(worker.remote(ps, dim, 1)) ray.get(worker.remote(ps, dim, 1))
# Start two workers. # Start two workers.
@ -122,10 +124,6 @@ As the worker tasks are executing, you can query the parameter server from the d
print(ray.get(ps.get_parameters.remote())) print(ray.get(ps.get_parameters.remote()))
``` ```
**Note:** You should call `stop_orca_context()` when your program finishes: **Note:** You should call `stop_orca_context()` when your program finishes.
```python That's it, the same code can run seamlessly on your local laptop and scale to [Kubernetes](../Tutorial/k8s.md) or [Hadoop/YARN](../Tutorial/yarn.md) clusters.
from bigdl.orca import stop_orca_context
stop_orca_context()
```

View file

@ -10,11 +10,12 @@
### Step 0: Prepare Environment ### Step 0: Prepare Environment
We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../../UserGuide/python.md) for more details. We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../Overview/install.md) for more details.
```bash ```bash
conda create -n py37 python=3.7 # "py37" is conda environment name, you can use any name you like. conda create -n py37 python=3.7 # "py37" is conda environment name, you can use any name you like.
conda activate py37 conda activate py37
pip install bigdl-orca pip install bigdl-orca
pip install tensorflow==1.15 pip install tensorflow==1.15
pip install tensorflow-datasets==2.0 pip install tensorflow-datasets==2.0
@ -26,18 +27,16 @@ pip install psutil
from bigdl.orca import init_orca_context, stop_orca_context from bigdl.orca import init_orca_context, stop_orca_context
if cluster_mode == "local": # For local machine if cluster_mode == "local": # For local machine
init_orca_context(cluster_mode="local", cores=4, memory="10g") init_orca_context(cluster_mode="local", cores=4, memory="4g")
dataset_dir = "~/tensorflow_datasets"
elif cluster_mode == "k8s": # For K8s cluster elif cluster_mode == "k8s": # For K8s cluster
init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2, memory="10g", driver_memory="10g", driver_cores=1) init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2, memory="4g", master=..., container_image=...)
elif cluster_mode == "yarn": # For Hadoop/YARN cluster elif cluster_mode == "yarn": # For Hadoop/YARN cluster
init_orca_context(cluster_mode="yarn", num_nodes=2, cores=2, memory="10g", driver_memory="10g", driver_cores=1) init_orca_context(cluster_mode="yarn", num_nodes=2, cores=2, memory="4g")
dataset_dir = "hdfs:///tensorflow_datasets"
``` ```
This is the only place where you need to specify local or distributed mode. View [Orca Context](../Overview/orca-context.md) for more details. This is the only place where you need to specify local or distributed mode. View [Orca Context](../Overview/orca-context.md) for more details.
**Note:** You should `export HADOOP_CONF_DIR=/path/to/hadoop/conf/dir` when running on Hadoop YARN cluster. View [Hadoop User Guide](../../UserGuide/hadoop.md) for more details. To use tensorflow_datasets on HDFS, you should correctly set HADOOP_HOME, HADOOP_HDFS_HOME, LD_LIBRARY_PATH, etc. For more details, please refer to TensorFlow documentation [link](https://github.com/tensorflow/docs/blob/r1.11/site/en/deploy/hadoop.md). Please check the tutorials if you want to run on [Kubernetes](../Tutorial/k8s.md) or [Hadoop/YARN](../Tutorial/yarn.md) clusters.
### Step 2: Define the Model ### Step 2: Define the Model
@ -73,8 +72,7 @@ acc = accuracy(logits, labels)
``` ```
### Step 3: Define Train Dataset ### Step 3: Define Train Dataset
You can define the dataset using standard [tf.data.Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset). Orca also supports [Spark DataFrame](https://spark.apache.org/docs/latest/sql-programming-guide.html) and [Orca XShards](../Overview/data-parallel-processing.md). You can define the dataset using standard [tf.data.Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset). Orca also supports [Spark DataFrame](./spark-dataframe.md) and [Orca XShards](./xshards-pandas.md).
```python ```python
import tensorflow_datasets as tfds import tensorflow_datasets as tfds
@ -84,8 +82,8 @@ def preprocess(data):
return data['image'], data['label'] return data['image'], data['label']
# get DataSet # get DataSet
mnist_train = tfds.load(name="mnist", split="train", data_dir=dataset_dir) mnist_train = tfds.load(name="mnist", split="train", data_dir=...)
mnist_test = tfds.load(name="mnist", split="test", data_dir=dataset_dir) mnist_test = tfds.load(name="mnist", split="test", data_dir=...)
mnist_train = mnist_train.map(preprocess) mnist_train = mnist_train.map(preprocess)
mnist_test = mnist_test.map(preprocess) mnist_test = mnist_test.map(preprocess)
@ -93,7 +91,7 @@ mnist_test = mnist_test.map(preprocess)
### Step 4: Fit with Orca Estimator ### Step 4: Fit with Orca Estimator
First, create an Estimator. First, create an Orca Estimator for TensorFlow.
```python ```python
from bigdl.orca.learn.tf.estimator import Estimator from bigdl.orca.learn.tf.estimator import Estimator
@ -117,6 +115,6 @@ result = est.evaluate(mnist_test)
print(result) print(result)
``` ```
That's it, the same code can run seamlessly in your local laptop and the distribute K8s or Hadoop cluster.
**Note:** You should call `stop_orca_context()` when your program finishes. **Note:** You should call `stop_orca_context()` when your program finishes.
That's it, the same code can run seamlessly on your local laptop and scale to [Kubernetes](../Tutorial/k8s.md) or [Hadoop/YARN](../Tutorial/yarn.md) clusters.

View file

@ -11,11 +11,12 @@
### Step 0: Prepare Environment ### Step 0: Prepare Environment
We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../../UserGuide/python.md) for more details. We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../Overview/install.md) for more details.
```bash ```bash
conda create -n py37 python=3.7 # "py37" is conda environment name, you can use any name you like. conda create -n py37 python=3.7 # "py37" is conda environment name, you can use any name you like.
conda activate py37 conda activate py37
pip install bigdl-orca pip install bigdl-orca
pip install tensorflow==1.15.0 pip install tensorflow==1.15.0
pip install tensorflow-datasets==2.1.0 pip install tensorflow-datasets==2.1.0
@ -29,18 +30,16 @@ pip install scikit-learn
from bigdl.orca import init_orca_context, stop_orca_context from bigdl.orca import init_orca_context, stop_orca_context
if cluster_mode == "local": # For local machine if cluster_mode == "local": # For local machine
init_orca_context(cluster_mode="local", cores=4, memory="10g") init_orca_context(cluster_mode="local", cores=4, memory="4g")
dataset_dir = "~/tensorflow_datasets"
elif cluster_mode == "k8s": # For K8s cluster elif cluster_mode == "k8s": # For K8s cluster
init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2, memory="10g", driver_memory="10g", driver_cores=1) init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2, memory="4g", master=..., container_image=...)
elif cluster_mode == "yarn": # For Hadoop/YARN cluster elif cluster_mode == "yarn": # For Hadoop/YARN cluster
init_orca_context(cluster_mode="yarn", num_nodes=2, cores=2, memory="10g", driver_memory="10g", driver_cores=1) init_orca_context(cluster_mode="yarn", num_nodes=2, cores=2, memory="4g")
dataset_dir = "hdfs:///tensorflow_datasets"
``` ```
This is the only place where you need to specify local or distributed mode. View [Orca Context](../Overview/orca-context.md) for more details. This is the only place where you need to specify local or distributed mode. View [Orca Context](../Overview/orca-context.md) for more details.
**Note:** You should `export HADOOP_CONF_DIR=/path/to/hadoop/conf/dir` when running on Hadoop YARN cluster. View [Hadoop User Guide](../../UserGuide/hadoop.md) for more details. To use tensorflow_datasets on HDFS, you should correctly set HADOOP_HOME, HADOOP_HDFS_HOME, LD_LIBRARY_PATH, etc. For more details, please refer to TensorFlow documentation [link](https://github.com/tensorflow/docs/blob/r1.11/site/en/deploy/hadoop.md). Please check the tutorials if you want to run on [Kubernetes](../Tutorial/k8s.md) or [Hadoop/YARN](../Tutorial/yarn.md) clusters.
### Step 2: Define the Model ### Step 2: Define the Model
@ -68,7 +67,7 @@ model.compile(optimizer=keras.optimizers.RMSprop(),
``` ```
### Step 3: Define Train Dataset ### Step 3: Define Train Dataset
You can define the dataset using standard [tf.data.Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset). Orca also supports [Spark DataFrame](https://spark.apache.org/docs/latest/sql-programming-guide.html) and [Orca XShards](../Overview/data-parallel-processing.md). You can define the dataset using standard [tf.data.Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset). Orca also supports [Spark DataFrame](./spark-dataframe.md) and [Orca XShards](./xshards-pandas.md).
```python ```python
import tensorflow as tf import tensorflow as tf
@ -79,8 +78,8 @@ def preprocess(data):
return data['image'], data['label'] return data['image'], data['label']
# get DataSet # get DataSet
mnist_train = tfds.load(name="mnist", split="train", data_dir=dataset_dir) mnist_train = tfds.load(name="mnist", split="train", data_dir=...)
mnist_test = tfds.load(name="mnist", split="test", data_dir=dataset_dir) mnist_test = tfds.load(name="mnist", split="test", data_dir=...)
mnist_train = mnist_train.map(preprocess) mnist_train = mnist_train.map(preprocess)
mnist_test = mnist_test.map(preprocess) mnist_test = mnist_test.map(preprocess)
@ -88,7 +87,7 @@ mnist_test = mnist_test.map(preprocess)
### Step 4: Fit with Orca Estimator ### Step 4: Fit with Orca Estimator
First, create an Estimator. First, create an Orca Estimator for TensorFlow.
```python ```python
from bigdl.orca.learn.tf.estimator import Estimator from bigdl.orca.learn.tf.estimator import Estimator
@ -107,6 +106,6 @@ result = est.evaluate(mnist_test)
print(result) print(result)
``` ```
That's it, the same code can run seamlessly in your local laptop and the distribute K8s or Hadoop cluster.
**Note:** You should call `stop_orca_context()` when your program finishes. **Note:** You should call `stop_orca_context()` when your program finishes.
That's it, the same code can run seamlessly on your local laptop and scale to [Kubernetes](../Tutorial/k8s.md) or [Hadoop/YARN](../Tutorial/yarn.md) clusters.