diff --git a/docs/readthedocs/source/_toc.yml b/docs/readthedocs/source/_toc.yml index 6002adbd..bbfeaa36 100644 --- a/docs/readthedocs/source/_toc.yml +++ b/docs/readthedocs/source/_toc.yml @@ -42,11 +42,11 @@ subtrees: title: "Quickstarts" subtrees: - entries: - - file: doc/Orca/Quickstart/orca-tf-quickstart - - file: doc/Orca/Quickstart/orca-tf2keras-quickstart - - file: doc/Orca/Quickstart/orca-keras-quickstart - - file: doc/Orca/Quickstart/orca-pytorch-quickstart - - file: doc/Orca/Quickstart/ray-quickstart + - file: doc/Orca/QuickStart/orca-tf-quickstart + - file: doc/Orca/QuickStart/orca-tf2keras-quickstart + - file: doc/Orca/QuickStart/orca-keras-quickstart + - file: doc/Orca/QuickStart/orca-pytorch-quickstart + - file: doc/Orca/QuickStart/ray-quickstart - file: doc/Orca/Howto/index title: "How-to Guides" subtrees: diff --git a/docs/readthedocs/source/doc/Orca/Overview/orca.md b/docs/readthedocs/source/doc/Orca/Overview/orca.md index 226fd26e..5f7de428 100644 --- a/docs/readthedocs/source/doc/Orca/Overview/orca.md +++ b/docs/readthedocs/source/doc/Orca/Overview/orca.md @@ -10,50 +10,75 @@ Most AI projects start with a Python notebook running on a single laptop; howeve First of all, follow the steps [here](install.md#to-use-basic-orca-features) to install Orca in your environment. -This section uses TensorFlow 1.15, and you should also install TensorFlow before running this example: +This section uses **TensorFlow 2.x**, and you should also install TensorFlow before running this example: ```bash -pip install tensorflow==1.15 +pip install tensorflow ``` First, initialize [Orca Context](orca-context.md): ```python -from bigdl.orca import init_orca_context, OrcaContext +from bigdl.orca import init_orca_context, stop_orca_context, OrcaContext # cluster_mode can be "local", "k8s" or "yarn" sc = init_orca_context(cluster_mode="local", cores=4, memory="10g", num_nodes=1) ``` -Next, perform [data-parallel processing in Orca](data-parallel-processing.md) (supporting standard Spark Dataframes, TensorFlow Dataset, PyTorch DataLoader, Pandas, etc.): +Next, perform [data-parallel processing in Orca](data-parallel-processing.md) (supporting standard Spark Dataframes, TensorFlow Dataset, PyTorch DataLoader, Pandas, etc.). Here to make things simple, we just generate some random data with Spark DataFrame: ```python +import random from pyspark.sql.functions import array +from pyspark.sql.types import StructType, StructField, IntegerType +from bigdl.orca import OrcaContext spark = OrcaContext.get_spark_session() -df = spark.read.parquet(file_path) -df = df.withColumn('user', array('user')) \ - .withColumn('item', array('item')) + +num_users, num_items = 200, 100 +rdd = sc.range(0, 512).map( + lambda x: [random.randint(0, num_users-1), random.randint(0, num_items-1), random.randint(0, 1)]) +schema = StructType([StructField("user", IntegerType(), False), + StructField("item", IntegerType(), False), + StructField("label", IntegerType(), False)]) +df = spark.createDataFrame(rdd, schema) +train, test = df.randomSplit([0.8, 0.2], seed=1) ``` Finally, use [sklearn-style Estimator APIs in Orca](distributed-training-inference.md) to perform distributed _TensorFlow_, _PyTorch_, _Keras_ and _BigDL_ training and inference: ```python from tensorflow import keras -from bigdl.orca.learn.tf.estimator import Estimator +from bigdl.orca.learn.tf2.estimator import Estimator -user = keras.layers.Input(shape=[1]) -item = keras.layers.Input(shape=[1]) -feat = keras.layers.concatenate([user, item], axis=1) -predictions = keras.layers.Dense(2, activation='softmax')(feat) -model = keras.models.Model(inputs=[user, item], outputs=predictions) -model.compile(optimizer='rmsprop', - loss='sparse_categorical_crossentropy', - metrics=['accuracy']) +def model_creator(config): + user_input = keras.layers.Input(shape=(1,), dtype="int32", name="use_input") + item_input = keras.layers.Input(shape=(1,), dtype="int32", name="item_input") -est = Estimator.from_keras(keras_model=model) -est.fit(data=df, + mlp_embed_user = keras.layers.Embedding(input_dim=num_users, output_dim=config["embed_dim"], + input_length=1)(user_input) + mlp_embed_item = keras.layers.Embedding(input_dim=num_items, output_dim=config["embed_dim"], + input_length=1)(item_input) + + user_latent = keras.layers.Flatten()(mlp_embed_user) + item_latent = keras.layers.Flatten()(mlp_embed_item) + + mlp_latent = keras.layers.concatenate([user_latent, item_latent], axis=1) + predictions = keras.layers.Dense(2, activation="sigmoid")(mlp_latent) + model = keras.models.Model(inputs=[user_input, item_input], outputs=predictions) + model.compile(optimizer='adam', + loss='sparse_categorical_crossentropy', + metrics=['accuracy']) + return model + +est = Estimator.from_keras(model_creator=model_creator, backend="spark", config={"embed_dim": 8}) +est.fit(data=train, batch_size=64, epochs=4, feature_cols=['user', 'item'], - label_cols=['label']) + label_cols=['label'], + steps_per_epoch=int(train.count()/64), + validation_data=test, + validation_steps=int(test.count()/64)) + +stop_orca_context() ``` diff --git a/docs/readthedocs/source/doc/Orca/QuickStart/index.md b/docs/readthedocs/source/doc/Orca/QuickStart/index.md index 5dd79202..3e71cbbb 100644 --- a/docs/readthedocs/source/doc/Orca/QuickStart/index.md +++ b/docs/readthedocs/source/doc/Orca/QuickStart/index.md @@ -36,7 +36,7 @@ - [**Orca RayOnSpark Quickstart**](./ray-quickstart.html) - > ![](../../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/ray_parameter_server.ipynb)  ![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/ray_parameter_server.ipynb) + > ![](../../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/ray_parameter_server.ipynb)  ![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/ray_parameter_server.ipynb) In this guide, we will describe how to use RayOnSpark to directly run Ray programs on Big Data clusters in 2 simple steps.