update orca data quickstart and ray md (#3498)

This commit is contained in:
Le-Zheng 2021-11-16 17:41:48 +08:00 committed by GitHub
parent 238a434950
commit c75ad40e67
3 changed files with 13 additions and 13 deletions

View file

@ -13,9 +13,9 @@
We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../../UserGuide/python.md) for more details.
```bash
conda create -n zoo python=3.7 # "zoo" is conda environment name, you can use any name you like.
conda activate zoo
pip install analytics-zoo[ray]
conda create -n bigdl python=3.7 # "bigdl" is conda environment name, you can use any name you like.
conda activate bigdl
pip install bigdl-orca[ray]
```
### **Step 1: Initialize**
@ -23,7 +23,7 @@ pip install analytics-zoo[ray]
We recommend using `init_orca_context` to initiate and run Analytics Zoo on the underlying cluster. The Ray cluster would be launched automatically by specifying `init_ray_on_spark=True`.
```python
from zoo.orca import init_orca_context
from bigdl.orca import init_orca_context
if cluster_mode == "local": # For local machine
sc = init_orca_context(cluster_mode="local", cores=4, memory="10g", init_ray_on_spark=True)
@ -38,7 +38,7 @@ This is the only place where you need to specify local or distributed mode.
By default, the Ray cluster would be launched using Spark barrier execution mode, you can turn it off via the configurations of `OrcaContext`:
```python
from zoo.orca import OrcaContext
from bigdl.orca import OrcaContext
OrcaContext.barrier_mode = False
```
@ -50,7 +50,7 @@ View [Orca Context](./../../Orca/Overview/orca-context.md) for more details.
You can retrieve the information of the Ray cluster via `OrcaContext`:
```python
from zoo.orca import OrcaContext
from bigdl.orca import OrcaContext
ray_ctx = OrcaContext.get_ray_context()
address_info = ray_ctx.address_info # The dictionary information of the ray cluster, including node_ip_address, object_store_address, webui_url, etc.
@ -125,7 +125,7 @@ print(ray.get(ps.get_parameters.remote()))
**Note:** You should call `stop_orca_context()` when your program finishes:
```python
from zoo.orca import stop_orca_context
from bigdl.orca import stop_orca_context
stop_orca_context()
```

View file

@ -2,7 +2,7 @@
---
![](../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/analytics-zoo/blob/master/docs/docs/colab-notebook/orca/quickstart/ncf_dataframe.ipynb)  ![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/analytics-zoo/blob/master/docs/docs/colab-notebook/orca/quickstart/ncf_dataframe.ipynb)
![](../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/ncf_dataframe.ipynb)  ![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/ncf_dataframe.ipynb)
---
@ -15,7 +15,7 @@ The dataset used in this guide is [movielens-1M](https://grouplens.org/datasets/
First, read the input data into Spark Dataframes.
```python
from zoo.orca import OrcaContext
from bigdl.orca import OrcaContext
spark = OrcaContext.get_spark_session()
# read csv with specifying column names
@ -82,7 +82,7 @@ def model_creator(config):
Finally, run distributed model training/inference on the Spark Dataframes directly.
```python
from zoo.orca.learn.tf2 import Estimator
from bigdl.orca.learn.tf2 import Estimator
# create an Estimator
est = Estimator.from_keras(model_creator=model_creator) # the model accept two inputs and one label

View file

@ -2,7 +2,7 @@
---
![](../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/analytics-zoo/blob/master/docs/docs/colab-notebook/orca/quickstart/ncf_xshards_pandas.ipynb)  ![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/analytics-zoo/blob/master/docs/docs/colab-notebook/orca/quickstart/ncf_xshards_pandas.ipynb)
![](../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/ncf_xshards_pandas.ipynb)  ![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/ncf_xshards_pandas.ipynb)
---
@ -13,7 +13,7 @@
First, read CVS, JSON or Parquet files into an `XShards` of Pandas Dataframe (i.e., a distributed and sharded dataset where each partition contained a Pandas Dataframe), as shown below:
```python
from zoo.orca.data.pandas import read_csv
from bigdl.orca.data.pandas import read_csv
full_data = read_csv(new_rating_files, sep=':', header=None,
names=['user', 'item', 'label'], usecols=[0, 1, 2],
dtype={0: np.int32, 1: np.int32, 2: np.int32})
@ -99,7 +99,7 @@ model = NCF(embedding_size, max_user_id, max_item_id)
Finally, directly run distributed model training/inference on the XShards of Pandas DataFrames.
```python
from zoo.orca.learn.tf.estimator import Estimator
from bigdl.orca.learn.tf.estimator import Estimator
# create an Estimator.
estimator = Estimator.from_graph(