* refactor toc * refactor toc * Change to pydata-sphinx-theme and update packages requirement list for ReadtheDocs * Remove customized css for old theme * Add index page to each top bar section and limit dropdown maximum to be 4 * Use js to change 'More' to 'Libraries' * Add custom.css to conf.py for further css changes * Add BigDL logo and search bar * refactor toc * refactor toc and add overview * refactor toc and add overview * refactor toc and add overview * refactor get started * add paper and video section * add videos * add grid columns in landing page * add document roadmap to index * reapply search bar and github icon commit * reorg orca and chronos sections * Test: weaken ads by js * update: change left attrbute * update: add comments * update: change opacity to 0.7 * Remove useless theme template override for old theme * Add sidebar releases component in the home page * Remove sidebar search and restore top nav search button * Add BigDL handouts * Add back to homepage button to pages except from the home page * Update releases contents & styles in left sidebar * Add version badge to the top bar * Test: weaken ads by js * update: add comments * remove landing page contents * rfix chronos install * refactor install * refactor chronos section titles * refactor nano index * change chronos landing * revise chronos landing page * add document navigator to nano landing page * revise install landing page * Improve css of versions in sidebar * Make handouts image pointing to a page in new tab * add win guide to install * add dliib installation * revise title bar * rename index files * add index page for user guide * add dllib and orca API * update user guide landing page * refactor side bar * Remove extra style configuration of card components & make different card usage consistent * Remove extra styles for Nano how-to guides * Remove extra styles for Chronos how-to guides * Remove dark mode for now * Update index page description * Add decision tree for choosing BigDL libraries in index page * add dllib models api, revise core layers formats * Change primary & info color in light mode * Restyle card components * Restructure Chronos landing page * Update card style * Update BigDL library selection decision tree * Fix failed Chronos tutorials filter * refactor PPML documents * refactor and add friesian documents * add friesian arch diagram * update landing pages and fill key features guide index page * Restyle link card component * Style video frames in PPML sections * Adjust Nano landing page * put api docs to the last in index for convinience * Make badge horizontal padding smaller & small changes * Change the second letter of all header titles to be small capitalizd * Small changes on Chronos index page * Revise decision tree to make it smaller * Update: try to change the position of ads. * Bugfix: deleted nonexist file config * Update: update ad JS/CSS/config * Update: change ad. * Update: delete my template and change files. * Update: change chronos installation table color. * Update: change table font color to --pst-color-primary-text * Remove old contents in landing page sidebar * Restyle badge for usage in card footer again * Add quicklinks template on landing page sidebar * add quick links * Add scala logo * move tf, pytorch out of the link * change orca key features cards * fix typo * fix a mistake in wording * Restyle badge for card footer * Update decision tree * Remove useless html templates * add more api docs and update tutorials in dllib * update chronos install using new style * merge changes in nano doc from master * fix quickstart links in sidebar quicklinks * Make tables responsive * Fix overflow in api doc * Fix list indents problems in [User guide] section * Further fixes to nested bullets contents in [User Guide] section * Fix strange title in Nano 5-min doc * Fix list indent problems in [DLlib] section * Fix misnumbered list problems and other small fixes for [Chronos] section * Fix list indent problems and other small fixes for [Friesian] section * Fix list indent problem and other small fixes for [PPML] section * Fix list indent problem for developer guide * Fix list indent problem for [Cluster Serving] section * fix dllib links * Fix wrong relative link in section landing page Co-authored-by: Yuwen Hu <yuwen.hu@intel.com> Co-authored-by: Juntao Luo <1072087358@qq.com>
131 lines
5.5 KiB
Markdown
131 lines
5.5 KiB
Markdown
# RayOnSpark Quickstart
|
|
|
|
---
|
|
|
|
[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/ray_parameter_server.ipynb) [View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/ray_parameter_server.ipynb)
|
|
|
|
---
|
|
|
|
**In this guide, we will describe how to use RayOnSpark to directly run Ray programs on Big Data clusters in 2 simple steps.**
|
|
|
|
### **Step 0: Prepare Environment**
|
|
|
|
We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../../UserGuide/python.md) for more details.
|
|
|
|
```bash
|
|
conda create -n bigdl python=3.7 # "bigdl" is conda environment name, you can use any name you like.
|
|
conda activate bigdl
|
|
pip install bigdl-orca[ray]
|
|
```
|
|
|
|
### **Step 1: Initialize**
|
|
|
|
We recommend using `init_orca_context` to initiate and run BigDL on the underlying cluster. The Ray cluster would be launched automatically by specifying `init_ray_on_spark=True`.
|
|
|
|
```python
|
|
from bigdl.orca import init_orca_context
|
|
|
|
if cluster_mode == "local": # For local machine
|
|
sc = init_orca_context(cluster_mode="local", cores=4, memory="10g", init_ray_on_spark=True)
|
|
elif cluster_mode == "k8s": # For K8s cluster
|
|
sc = init_orca_context(cluster_mode="k8s", num_nodes=2, cores=2, memory="10g", driver_memory="10g", driver_cores=1, init_ray_on_spark=True)
|
|
elif cluster_mode == "yarn": # For Hadoop/YARN cluster
|
|
sc = init_orca_context(cluster_mode="yarn", num_nodes=2, cores=2, memory="10g", driver_memory="10g", driver_cores=1, init_ray_on_spark=True)
|
|
```
|
|
|
|
This is the only place where you need to specify local or distributed mode. See [here](./../Overview/ray.md#initialize) for more RayOnSpark related arguments when you `init_orca_context`.
|
|
|
|
By default, the Ray cluster would be launched using Spark barrier execution mode, you can turn it off via the configurations of `OrcaContext`:
|
|
|
|
```python
|
|
from bigdl.orca import OrcaContext
|
|
|
|
OrcaContext.barrier_mode = False
|
|
```
|
|
|
|
View [Orca Context](./../Overview/orca-context.md) for more details.
|
|
|
|
**Note:** You should `export HADOOP_CONF_DIR=/path/to/hadoop/conf/dir` when running on Hadoop YARN cluster. View [Hadoop User Guide](./../../UserGuide/hadoop.md) for more details.
|
|
|
|
You can retrieve the information of the Ray cluster via `OrcaContext`:
|
|
|
|
```python
|
|
from bigdl.orca import OrcaContext
|
|
|
|
ray_ctx = OrcaContext.get_ray_context()
|
|
address_info = ray_ctx.address_info # The dictionary information of the ray cluster, including node_ip_address, object_store_address, webui_url, etc.
|
|
redis_address = ray_ctx.redis_address # The redis address of the ray cluster.
|
|
```
|
|
|
|
### **Step 2: Run Ray Applications**
|
|
|
|
After the initialization, you can directly write Ray code inline with your Spark code, and run Ray programs on the underlying existing Big Data clusters. Ray [tasks](https://docs.ray.io/en/master/walkthrough.html#remote-functions-tasks) and [actors](https://docs.ray.io/en/master/actors.html) would be launched across the cluster.
|
|
|
|
The following example uses actor handles to implement a parameter server example for distributed asynchronous stochastic gradient descent. This is a simple Ray example for demonstration purpose. Similarly, you can write other Ray applications as you wish.
|
|
|
|
A parameter server is simply an object that stores the parameters (or "weights") of a machine learning model (this could be a neural network, a linear model, or something else). It exposes two methods: one for getting the parameters and one for updating the parameters.
|
|
|
|
By adding the `@ray.remote` decorator, the `ParameterServer` class becomes a Ray actor.
|
|
|
|
```python
|
|
import ray
|
|
import numpy as np
|
|
|
|
dim = 10
|
|
@ray.remote
|
|
class ParameterServer(object):
|
|
def __init__(self, dim):
|
|
self.parameters = np.zeros(dim)
|
|
|
|
def get_parameters(self):
|
|
return self.parameters
|
|
|
|
def update_parameters(self, update):
|
|
self.parameters += update
|
|
|
|
ps = ParameterServer.remote(dim)
|
|
```
|
|
|
|
In a typical machine learning training application, worker processes will run in an infinite loop that does the following:
|
|
|
|
1. Get the latest parameters from the parameter server.
|
|
2. Compute an update to the parameters (using the current parameters and some data).
|
|
3. Send the update to the parameter server.
|
|
|
|
By adding the `@ray.remote` decorator, the `worker` function becomes a Ray remote function.
|
|
|
|
```python
|
|
import time
|
|
|
|
@ray.remote
|
|
def worker(ps, dim, num_iters):
|
|
for _ in range(num_iters):
|
|
# Get the latest parameters.
|
|
parameters = ray.get(ps.get_parameters.remote())
|
|
# Compute an update.
|
|
update = 1e-3 * parameters + np.ones(dim)
|
|
# Update the parameters.
|
|
ps.update_parameters.remote(update)
|
|
# Sleep a little to simulate a real workload.
|
|
time.sleep(0.5)
|
|
|
|
# Test that worker is implemented correctly. You do not need to change this line.
|
|
ray.get(worker.remote(ps, dim, 1))
|
|
|
|
# Start two workers.
|
|
worker_results = [worker.remote(ps, dim, 100) for _ in range(2)]
|
|
```
|
|
|
|
As the worker tasks are executing, you can query the parameter server from the driver and see the parameters changing in the background.
|
|
|
|
```
|
|
print(ray.get(ps.get_parameters.remote()))
|
|
```
|
|
|
|
**Note:** You should call `stop_orca_context()` when your program finishes:
|
|
|
|
```python
|
|
from bigdl.orca import stop_orca_context
|
|
|
|
stop_orca_context()
|
|
```
|