parent
6a4c40fc1f
commit
6c07f95d1c
3 changed files with 42 additions and 47 deletions
|
|
@ -10,27 +10,28 @@
|
||||||
|
|
||||||
### Step 0: Prepare Environment
|
### Step 0: Prepare Environment
|
||||||
|
|
||||||
We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../../UserGuide/python.md) for more details.
|
We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../Overview/install.md) for more details.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
conda create -n bigdl python=3.7 # "bigdl" is conda environment name, you can use any name you like.
|
conda create -n py37 python=3.7 # "py37" is conda environment name, you can use any name you like.
|
||||||
conda activate bigdl
|
conda activate py37
|
||||||
|
|
||||||
pip install bigdl-orca[ray]
|
pip install bigdl-orca[ray]
|
||||||
```
|
```
|
||||||
|
|
||||||
### Step 1: Init Orca Context
|
### Step 1: Init Orca Context
|
||||||
|
|
||||||
We recommend using `init_orca_context` to initiate and run BigDL on the underlying cluster. The Ray cluster would be launched automatically by specifying `init_ray_on_spark=True`.
|
The Ray cluster would be launched automatically by specifying `init_ray_on_spark=True` in `init_orca_context`.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from bigdl.orca import init_orca_context
|
from bigdl.orca import init_orca_context, stop_orca_context
|
||||||
|
|
||||||
if cluster_mode == "local": # For local machine
|
if cluster_mode == "local": # For local machine
|
||||||
sc = init_orca_context(cluster_mode="local", cores=4, memory="10g", init_ray_on_spark=True)
|
sc = init_orca_context(cluster_mode="local", cores=4, memory="4g", init_ray_on_spark=True)
|
||||||
elif cluster_mode == "k8s": # For K8s cluster
|
elif cluster_mode == "k8s": # For K8s cluster
|
||||||
sc = init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2, memory="10g", driver_memory="10g", driver_cores=1, init_ray_on_spark=True)
|
sc = init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2, memory="4g", init_ray_on_spark=True, master=..., container_image=...)
|
||||||
elif cluster_mode == "yarn": # For Hadoop/YARN cluster
|
elif cluster_mode == "yarn": # For Hadoop/YARN cluster
|
||||||
sc = init_orca_context(cluster_mode="yarn", num_nodes=2, cores=2, memory="10g", driver_memory="10g", driver_cores=1, init_ray_on_spark=True)
|
sc = init_orca_context(cluster_mode="yarn", num_nodes=2, cores=2, memory="4g", init_ray_on_spark=True)
|
||||||
```
|
```
|
||||||
|
|
||||||
This is the only place where you need to specify local or distributed mode. See [here](../Overview/ray.md#initialize) for more RayOnSpark related arguments when you `init_orca_context`.
|
This is the only place where you need to specify local or distributed mode. See [here](../Overview/ray.md#initialize) for more RayOnSpark related arguments when you `init_orca_context`.
|
||||||
|
|
@ -43,10 +44,6 @@ from bigdl.orca import OrcaContext
|
||||||
OrcaContext.barrier_mode = False
|
OrcaContext.barrier_mode = False
|
||||||
```
|
```
|
||||||
|
|
||||||
View [Orca Context](../Overview/orca-context.md) for more details.
|
|
||||||
|
|
||||||
**Note:** You should `export HADOOP_CONF_DIR=/path/to/hadoop/conf/dir` when running on Hadoop YARN cluster. View [Hadoop User Guide](../../UserGuide/hadoop.md) for more details.
|
|
||||||
|
|
||||||
You can retrieve the information of the Ray cluster via `OrcaContext`:
|
You can retrieve the information of the Ray cluster via `OrcaContext`:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
|
|
@ -57,11 +54,16 @@ address_info = ray_ctx.address_info # The dictionary information of the ray clu
|
||||||
redis_address = ray_ctx.redis_address # The redis address of the ray cluster.
|
redis_address = ray_ctx.redis_address # The redis address of the ray cluster.
|
||||||
```
|
```
|
||||||
|
|
||||||
|
View [Orca Context](../Overview/orca-context.md) for more details.
|
||||||
|
|
||||||
|
Please check the tutorials if you want to run on [Kubernetes](../Tutorial/k8s.md) or [Hadoop/YARN](../Tutorial/yarn.md) clusters.
|
||||||
|
|
||||||
|
|
||||||
### Step 2: Run Ray Applications
|
### Step 2: Run Ray Applications
|
||||||
|
|
||||||
After the initialization, you can directly write Ray code inline with your Spark code, and run Ray programs on the underlying existing Big Data clusters. Ray [tasks](https://docs.ray.io/en/master/walkthrough.html#remote-functions-tasks) and [actors](https://docs.ray.io/en/master/actors.html) would be launched across the cluster.
|
After the initialization, you can directly write Ray code inline with your Spark code, and run Ray programs on the underlying existing Big Data clusters. Ray [tasks](https://docs.ray.io/en/master/walkthrough.html#remote-functions-tasks) and [actors](https://docs.ray.io/en/master/actors.html) would be launched across the cluster.
|
||||||
|
|
||||||
The following example uses actor handles to implement a parameter server example for distributed asynchronous stochastic gradient descent. This is a simple Ray example for demonstration purpose. Similarly, you can write other Ray applications as you wish.
|
The following example uses actor handles to implement a parameter server example for distributed asynchronous stochastic gradient descent. This is a simple Ray example for demonstration purpose. You can write other Ray applications as you wish in a similar way.
|
||||||
|
|
||||||
A parameter server is simply an object that stores the parameters (or "weights") of a machine learning model (this could be a neural network, a linear model, or something else). It exposes two methods: one for getting the parameters and one for updating the parameters.
|
A parameter server is simply an object that stores the parameters (or "weights") of a machine learning model (this could be a neural network, a linear model, or something else). It exposes two methods: one for getting the parameters and one for updating the parameters.
|
||||||
|
|
||||||
|
|
@ -109,7 +111,7 @@ def worker(ps, dim, num_iters):
|
||||||
# Sleep a little to simulate a real workload.
|
# Sleep a little to simulate a real workload.
|
||||||
time.sleep(0.5)
|
time.sleep(0.5)
|
||||||
|
|
||||||
# Test that worker is implemented correctly. You do not need to change this line.
|
# Test that worker is implemented correctly.
|
||||||
ray.get(worker.remote(ps, dim, 1))
|
ray.get(worker.remote(ps, dim, 1))
|
||||||
|
|
||||||
# Start two workers.
|
# Start two workers.
|
||||||
|
|
@ -122,10 +124,6 @@ As the worker tasks are executing, you can query the parameter server from the d
|
||||||
print(ray.get(ps.get_parameters.remote()))
|
print(ray.get(ps.get_parameters.remote()))
|
||||||
```
|
```
|
||||||
|
|
||||||
**Note:** You should call `stop_orca_context()` when your program finishes:
|
**Note:** You should call `stop_orca_context()` when your program finishes.
|
||||||
|
|
||||||
```python
|
That's it, the same code can run seamlessly on your local laptop and scale to [Kubernetes](../Tutorial/k8s.md) or [Hadoop/YARN](../Tutorial/yarn.md) clusters.
|
||||||
from bigdl.orca import stop_orca_context
|
|
||||||
|
|
||||||
stop_orca_context()
|
|
||||||
```
|
|
||||||
|
|
|
||||||
|
|
@ -10,11 +10,12 @@
|
||||||
|
|
||||||
### Step 0: Prepare Environment
|
### Step 0: Prepare Environment
|
||||||
|
|
||||||
We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../../UserGuide/python.md) for more details.
|
We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../Overview/install.md) for more details.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
conda create -n py37 python=3.7 # "py37" is conda environment name, you can use any name you like.
|
conda create -n py37 python=3.7 # "py37" is conda environment name, you can use any name you like.
|
||||||
conda activate py37
|
conda activate py37
|
||||||
|
|
||||||
pip install bigdl-orca
|
pip install bigdl-orca
|
||||||
pip install tensorflow==1.15
|
pip install tensorflow==1.15
|
||||||
pip install tensorflow-datasets==2.0
|
pip install tensorflow-datasets==2.0
|
||||||
|
|
@ -26,18 +27,16 @@ pip install psutil
|
||||||
from bigdl.orca import init_orca_context, stop_orca_context
|
from bigdl.orca import init_orca_context, stop_orca_context
|
||||||
|
|
||||||
if cluster_mode == "local": # For local machine
|
if cluster_mode == "local": # For local machine
|
||||||
init_orca_context(cluster_mode="local", cores=4, memory="10g")
|
init_orca_context(cluster_mode="local", cores=4, memory="4g")
|
||||||
dataset_dir = "~/tensorflow_datasets"
|
|
||||||
elif cluster_mode == "k8s": # For K8s cluster
|
elif cluster_mode == "k8s": # For K8s cluster
|
||||||
init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2, memory="10g", driver_memory="10g", driver_cores=1)
|
init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2, memory="4g", master=..., container_image=...)
|
||||||
elif cluster_mode == "yarn": # For Hadoop/YARN cluster
|
elif cluster_mode == "yarn": # For Hadoop/YARN cluster
|
||||||
init_orca_context(cluster_mode="yarn", num_nodes=2, cores=2, memory="10g", driver_memory="10g", driver_cores=1)
|
init_orca_context(cluster_mode="yarn", num_nodes=2, cores=2, memory="4g")
|
||||||
dataset_dir = "hdfs:///tensorflow_datasets"
|
|
||||||
```
|
```
|
||||||
|
|
||||||
This is the only place where you need to specify local or distributed mode. View [Orca Context](../Overview/orca-context.md) for more details.
|
This is the only place where you need to specify local or distributed mode. View [Orca Context](../Overview/orca-context.md) for more details.
|
||||||
|
|
||||||
**Note:** You should `export HADOOP_CONF_DIR=/path/to/hadoop/conf/dir` when running on Hadoop YARN cluster. View [Hadoop User Guide](../../UserGuide/hadoop.md) for more details. To use tensorflow_datasets on HDFS, you should correctly set HADOOP_HOME, HADOOP_HDFS_HOME, LD_LIBRARY_PATH, etc. For more details, please refer to TensorFlow documentation [link](https://github.com/tensorflow/docs/blob/r1.11/site/en/deploy/hadoop.md).
|
Please check the tutorials if you want to run on [Kubernetes](../Tutorial/k8s.md) or [Hadoop/YARN](../Tutorial/yarn.md) clusters.
|
||||||
|
|
||||||
### Step 2: Define the Model
|
### Step 2: Define the Model
|
||||||
|
|
||||||
|
|
@ -73,8 +72,7 @@ acc = accuracy(logits, labels)
|
||||||
```
|
```
|
||||||
### Step 3: Define Train Dataset
|
### Step 3: Define Train Dataset
|
||||||
|
|
||||||
You can define the dataset using standard [tf.data.Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset). Orca also supports [Spark DataFrame](https://spark.apache.org/docs/latest/sql-programming-guide.html) and [Orca XShards](../Overview/data-parallel-processing.md).
|
You can define the dataset using standard [tf.data.Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset). Orca also supports [Spark DataFrame](./spark-dataframe.md) and [Orca XShards](./xshards-pandas.md).
|
||||||
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import tensorflow_datasets as tfds
|
import tensorflow_datasets as tfds
|
||||||
|
|
@ -84,8 +82,8 @@ def preprocess(data):
|
||||||
return data['image'], data['label']
|
return data['image'], data['label']
|
||||||
|
|
||||||
# get DataSet
|
# get DataSet
|
||||||
mnist_train = tfds.load(name="mnist", split="train", data_dir=dataset_dir)
|
mnist_train = tfds.load(name="mnist", split="train", data_dir=...)
|
||||||
mnist_test = tfds.load(name="mnist", split="test", data_dir=dataset_dir)
|
mnist_test = tfds.load(name="mnist", split="test", data_dir=...)
|
||||||
|
|
||||||
mnist_train = mnist_train.map(preprocess)
|
mnist_train = mnist_train.map(preprocess)
|
||||||
mnist_test = mnist_test.map(preprocess)
|
mnist_test = mnist_test.map(preprocess)
|
||||||
|
|
@ -93,7 +91,7 @@ mnist_test = mnist_test.map(preprocess)
|
||||||
|
|
||||||
### Step 4: Fit with Orca Estimator
|
### Step 4: Fit with Orca Estimator
|
||||||
|
|
||||||
First, create an Estimator.
|
First, create an Orca Estimator for TensorFlow.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from bigdl.orca.learn.tf.estimator import Estimator
|
from bigdl.orca.learn.tf.estimator import Estimator
|
||||||
|
|
@ -117,6 +115,6 @@ result = est.evaluate(mnist_test)
|
||||||
print(result)
|
print(result)
|
||||||
```
|
```
|
||||||
|
|
||||||
That's it, the same code can run seamlessly in your local laptop and the distribute K8s or Hadoop cluster.
|
|
||||||
|
|
||||||
**Note:** You should call `stop_orca_context()` when your program finishes.
|
**Note:** You should call `stop_orca_context()` when your program finishes.
|
||||||
|
|
||||||
|
That's it, the same code can run seamlessly on your local laptop and scale to [Kubernetes](../Tutorial/k8s.md) or [Hadoop/YARN](../Tutorial/yarn.md) clusters.
|
||||||
|
|
|
||||||
|
|
@ -11,11 +11,12 @@
|
||||||
|
|
||||||
### Step 0: Prepare Environment
|
### Step 0: Prepare Environment
|
||||||
|
|
||||||
We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../../UserGuide/python.md) for more details.
|
We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../Overview/install.md) for more details.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
conda create -n py37 python=3.7 # "py37" is conda environment name, you can use any name you like.
|
conda create -n py37 python=3.7 # "py37" is conda environment name, you can use any name you like.
|
||||||
conda activate py37
|
conda activate py37
|
||||||
|
|
||||||
pip install bigdl-orca
|
pip install bigdl-orca
|
||||||
pip install tensorflow==1.15.0
|
pip install tensorflow==1.15.0
|
||||||
pip install tensorflow-datasets==2.1.0
|
pip install tensorflow-datasets==2.1.0
|
||||||
|
|
@ -29,18 +30,16 @@ pip install scikit-learn
|
||||||
from bigdl.orca import init_orca_context, stop_orca_context
|
from bigdl.orca import init_orca_context, stop_orca_context
|
||||||
|
|
||||||
if cluster_mode == "local": # For local machine
|
if cluster_mode == "local": # For local machine
|
||||||
init_orca_context(cluster_mode="local", cores=4, memory="10g")
|
init_orca_context(cluster_mode="local", cores=4, memory="4g")
|
||||||
dataset_dir = "~/tensorflow_datasets"
|
|
||||||
elif cluster_mode == "k8s": # For K8s cluster
|
elif cluster_mode == "k8s": # For K8s cluster
|
||||||
init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2, memory="10g", driver_memory="10g", driver_cores=1)
|
init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2, memory="4g", master=..., container_image=...)
|
||||||
elif cluster_mode == "yarn": # For Hadoop/YARN cluster
|
elif cluster_mode == "yarn": # For Hadoop/YARN cluster
|
||||||
init_orca_context(cluster_mode="yarn", num_nodes=2, cores=2, memory="10g", driver_memory="10g", driver_cores=1)
|
init_orca_context(cluster_mode="yarn", num_nodes=2, cores=2, memory="4g")
|
||||||
dataset_dir = "hdfs:///tensorflow_datasets"
|
|
||||||
```
|
```
|
||||||
|
|
||||||
This is the only place where you need to specify local or distributed mode. View [Orca Context](../Overview/orca-context.md) for more details.
|
This is the only place where you need to specify local or distributed mode. View [Orca Context](../Overview/orca-context.md) for more details.
|
||||||
|
|
||||||
**Note:** You should `export HADOOP_CONF_DIR=/path/to/hadoop/conf/dir` when running on Hadoop YARN cluster. View [Hadoop User Guide](../../UserGuide/hadoop.md) for more details. To use tensorflow_datasets on HDFS, you should correctly set HADOOP_HOME, HADOOP_HDFS_HOME, LD_LIBRARY_PATH, etc. For more details, please refer to TensorFlow documentation [link](https://github.com/tensorflow/docs/blob/r1.11/site/en/deploy/hadoop.md).
|
Please check the tutorials if you want to run on [Kubernetes](../Tutorial/k8s.md) or [Hadoop/YARN](../Tutorial/yarn.md) clusters.
|
||||||
|
|
||||||
### Step 2: Define the Model
|
### Step 2: Define the Model
|
||||||
|
|
||||||
|
|
@ -68,7 +67,7 @@ model.compile(optimizer=keras.optimizers.RMSprop(),
|
||||||
```
|
```
|
||||||
### Step 3: Define Train Dataset
|
### Step 3: Define Train Dataset
|
||||||
|
|
||||||
You can define the dataset using standard [tf.data.Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset). Orca also supports [Spark DataFrame](https://spark.apache.org/docs/latest/sql-programming-guide.html) and [Orca XShards](../Overview/data-parallel-processing.md).
|
You can define the dataset using standard [tf.data.Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset). Orca also supports [Spark DataFrame](./spark-dataframe.md) and [Orca XShards](./xshards-pandas.md).
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import tensorflow as tf
|
import tensorflow as tf
|
||||||
|
|
@ -79,8 +78,8 @@ def preprocess(data):
|
||||||
return data['image'], data['label']
|
return data['image'], data['label']
|
||||||
|
|
||||||
# get DataSet
|
# get DataSet
|
||||||
mnist_train = tfds.load(name="mnist", split="train", data_dir=dataset_dir)
|
mnist_train = tfds.load(name="mnist", split="train", data_dir=...)
|
||||||
mnist_test = tfds.load(name="mnist", split="test", data_dir=dataset_dir)
|
mnist_test = tfds.load(name="mnist", split="test", data_dir=...)
|
||||||
|
|
||||||
mnist_train = mnist_train.map(preprocess)
|
mnist_train = mnist_train.map(preprocess)
|
||||||
mnist_test = mnist_test.map(preprocess)
|
mnist_test = mnist_test.map(preprocess)
|
||||||
|
|
@ -88,7 +87,7 @@ mnist_test = mnist_test.map(preprocess)
|
||||||
|
|
||||||
### Step 4: Fit with Orca Estimator
|
### Step 4: Fit with Orca Estimator
|
||||||
|
|
||||||
First, create an Estimator.
|
First, create an Orca Estimator for TensorFlow.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from bigdl.orca.learn.tf.estimator import Estimator
|
from bigdl.orca.learn.tf.estimator import Estimator
|
||||||
|
|
@ -107,6 +106,6 @@ result = est.evaluate(mnist_test)
|
||||||
print(result)
|
print(result)
|
||||||
```
|
```
|
||||||
|
|
||||||
That's it, the same code can run seamlessly in your local laptop and the distribute K8s or Hadoop cluster.
|
|
||||||
|
|
||||||
**Note:** You should call `stop_orca_context()` when your program finishes.
|
**Note:** You should call `stop_orca_context()` when your program finishes.
|
||||||
|
|
||||||
|
That's it, the same code can run seamlessly on your local laptop and scale to [Kubernetes](../Tutorial/k8s.md) or [Hadoop/YARN](../Tutorial/yarn.md) clusters.
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue