Orca: update orca-in-5-minutes to tf2 estimator. (#6404)
* feat: update orca-in-5-minutes to tf2 estimator. * fix: fix code and tensorflow version. * fix: fix typo. * fix: add config in estiamtor construction. * feat: add random split to dataframe. * fix: fix typo. * feat: add test data in fit. * fix: update link branch to main * fix: fix typo.
This commit is contained in:
		
							parent
							
								
									7cb86aeddd
								
							
						
					
					
						commit
						5346ef45d8
					
				
					 3 changed files with 50 additions and 25 deletions
				
			
		| 
						 | 
					@ -42,11 +42,11 @@ subtrees:
 | 
				
			||||||
            title: "Quickstarts"
 | 
					            title: "Quickstarts"
 | 
				
			||||||
            subtrees:
 | 
					            subtrees:
 | 
				
			||||||
              - entries:
 | 
					              - entries:
 | 
				
			||||||
                - file: doc/Orca/Quickstart/orca-tf-quickstart
 | 
					                - file: doc/Orca/QuickStart/orca-tf-quickstart
 | 
				
			||||||
                - file: doc/Orca/Quickstart/orca-tf2keras-quickstart
 | 
					                - file: doc/Orca/QuickStart/orca-tf2keras-quickstart
 | 
				
			||||||
                - file: doc/Orca/Quickstart/orca-keras-quickstart
 | 
					                - file: doc/Orca/QuickStart/orca-keras-quickstart
 | 
				
			||||||
                - file: doc/Orca/Quickstart/orca-pytorch-quickstart
 | 
					                - file: doc/Orca/QuickStart/orca-pytorch-quickstart
 | 
				
			||||||
                - file: doc/Orca/Quickstart/ray-quickstart
 | 
					                - file: doc/Orca/QuickStart/ray-quickstart
 | 
				
			||||||
          - file: doc/Orca/Howto/index
 | 
					          - file: doc/Orca/Howto/index
 | 
				
			||||||
            title: "How-to Guides"
 | 
					            title: "How-to Guides"
 | 
				
			||||||
            subtrees:
 | 
					            subtrees:
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -10,50 +10,75 @@ Most AI projects start with a Python notebook running on a single laptop; howeve
 | 
				
			||||||
 | 
					
 | 
				
			||||||
First of all, follow the steps [here](install.md#to-use-basic-orca-features) to install Orca in your environment.
 | 
					First of all, follow the steps [here](install.md#to-use-basic-orca-features) to install Orca in your environment.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
This section uses TensorFlow 1.15, and you should also install TensorFlow before running this example:
 | 
					This section uses **TensorFlow 2.x**, and you should also install TensorFlow before running this example:
 | 
				
			||||||
```bash
 | 
					```bash
 | 
				
			||||||
pip install tensorflow==1.15
 | 
					pip install tensorflow
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
First, initialize [Orca Context](orca-context.md):
 | 
					First, initialize [Orca Context](orca-context.md):
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```python
 | 
					```python
 | 
				
			||||||
from bigdl.orca import init_orca_context, OrcaContext
 | 
					from bigdl.orca import init_orca_context, stop_orca_context, OrcaContext
 | 
				
			||||||
 | 
					
 | 
				
			||||||
# cluster_mode can be "local", "k8s" or "yarn"
 | 
					# cluster_mode can be "local", "k8s" or "yarn"
 | 
				
			||||||
sc = init_orca_context(cluster_mode="local", cores=4, memory="10g", num_nodes=1)
 | 
					sc = init_orca_context(cluster_mode="local", cores=4, memory="10g", num_nodes=1)
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Next, perform [data-parallel processing in Orca](data-parallel-processing.md) (supporting standard Spark Dataframes, TensorFlow Dataset, PyTorch DataLoader, Pandas, etc.):
 | 
					Next, perform [data-parallel processing in Orca](data-parallel-processing.md) (supporting standard Spark Dataframes, TensorFlow Dataset, PyTorch DataLoader, Pandas, etc.). Here to make things simple, we just generate some random data with Spark DataFrame:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```python
 | 
					```python
 | 
				
			||||||
 | 
					import random
 | 
				
			||||||
from pyspark.sql.functions import array
 | 
					from pyspark.sql.functions import array
 | 
				
			||||||
 | 
					from pyspark.sql.types import StructType, StructField, IntegerType
 | 
				
			||||||
 | 
					from bigdl.orca import OrcaContext
 | 
				
			||||||
 | 
					
 | 
				
			||||||
spark = OrcaContext.get_spark_session()
 | 
					spark = OrcaContext.get_spark_session()
 | 
				
			||||||
df = spark.read.parquet(file_path)
 | 
					
 | 
				
			||||||
df = df.withColumn('user', array('user')) \
 | 
					num_users, num_items = 200, 100
 | 
				
			||||||
       .withColumn('item', array('item'))
 | 
					rdd = sc.range(0, 512).map(
 | 
				
			||||||
 | 
					    lambda x: [random.randint(0, num_users-1), random.randint(0, num_items-1), random.randint(0, 1)])
 | 
				
			||||||
 | 
					schema = StructType([StructField("user", IntegerType(), False),
 | 
				
			||||||
 | 
					                     StructField("item", IntegerType(), False),
 | 
				
			||||||
 | 
					                     StructField("label", IntegerType(), False)])
 | 
				
			||||||
 | 
					df = spark.createDataFrame(rdd, schema)
 | 
				
			||||||
 | 
					train, test = df.randomSplit([0.8, 0.2], seed=1)
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Finally, use [sklearn-style Estimator APIs in Orca](distributed-training-inference.md) to perform distributed _TensorFlow_, _PyTorch_, _Keras_ and _BigDL_ training and inference:
 | 
					Finally, use [sklearn-style Estimator APIs in Orca](distributed-training-inference.md) to perform distributed _TensorFlow_, _PyTorch_, _Keras_ and _BigDL_ training and inference:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```python
 | 
					```python
 | 
				
			||||||
from tensorflow import keras
 | 
					from tensorflow import keras
 | 
				
			||||||
from bigdl.orca.learn.tf.estimator import Estimator
 | 
					from bigdl.orca.learn.tf2.estimator import Estimator
 | 
				
			||||||
 | 
					
 | 
				
			||||||
user = keras.layers.Input(shape=[1])
 | 
					def model_creator(config):
 | 
				
			||||||
item = keras.layers.Input(shape=[1])
 | 
					  user_input = keras.layers.Input(shape=(1,), dtype="int32", name="use_input")
 | 
				
			||||||
feat = keras.layers.concatenate([user, item], axis=1)
 | 
					  item_input = keras.layers.Input(shape=(1,), dtype="int32", name="item_input")
 | 
				
			||||||
predictions = keras.layers.Dense(2, activation='softmax')(feat)
 | 
					
 | 
				
			||||||
model = keras.models.Model(inputs=[user, item], outputs=predictions)
 | 
					  mlp_embed_user = keras.layers.Embedding(input_dim=num_users, output_dim=config["embed_dim"],
 | 
				
			||||||
model.compile(optimizer='rmsprop',
 | 
					                               input_length=1)(user_input)
 | 
				
			||||||
 | 
					  mlp_embed_item = keras.layers.Embedding(input_dim=num_items, output_dim=config["embed_dim"],
 | 
				
			||||||
 | 
					                               input_length=1)(item_input)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  user_latent = keras.layers.Flatten()(mlp_embed_user)
 | 
				
			||||||
 | 
					  item_latent = keras.layers.Flatten()(mlp_embed_item)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  mlp_latent = keras.layers.concatenate([user_latent, item_latent], axis=1)
 | 
				
			||||||
 | 
					  predictions = keras.layers.Dense(2, activation="sigmoid")(mlp_latent)
 | 
				
			||||||
 | 
					  model = keras.models.Model(inputs=[user_input, item_input], outputs=predictions)
 | 
				
			||||||
 | 
					  model.compile(optimizer='adam',
 | 
				
			||||||
                loss='sparse_categorical_crossentropy',
 | 
					                loss='sparse_categorical_crossentropy',
 | 
				
			||||||
                metrics=['accuracy'])
 | 
					                metrics=['accuracy'])
 | 
				
			||||||
 | 
					  return model
 | 
				
			||||||
 | 
					
 | 
				
			||||||
est = Estimator.from_keras(keras_model=model)
 | 
					est = Estimator.from_keras(model_creator=model_creator, backend="spark", config={"embed_dim": 8})
 | 
				
			||||||
est.fit(data=df,
 | 
					est.fit(data=train,
 | 
				
			||||||
        batch_size=64,
 | 
					        batch_size=64,
 | 
				
			||||||
        epochs=4,
 | 
					        epochs=4,
 | 
				
			||||||
        feature_cols=['user', 'item'],
 | 
					        feature_cols=['user', 'item'],
 | 
				
			||||||
        label_cols=['label'])
 | 
					        label_cols=['label'],
 | 
				
			||||||
 | 
					        steps_per_epoch=int(train.count()/64),
 | 
				
			||||||
 | 
					        validation_data=test,
 | 
				
			||||||
 | 
					        validation_steps=int(test.count()/64))
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					stop_orca_context()
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -36,7 +36,7 @@
 | 
				
			||||||
 | 
					
 | 
				
			||||||
- [**Orca RayOnSpark Quickstart**](./ray-quickstart.html)
 | 
					- [**Orca RayOnSpark Quickstart**](./ray-quickstart.html)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
    > [Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/ray_parameter_server.ipynb)  [View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/ray_parameter_server.ipynb)
 | 
					    > [Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/ray_parameter_server.ipynb)  [View source on GitHub](https://github.com/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/ray_parameter_server.ipynb)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
    In this guide, we will describe how to use RayOnSpark to directly run Ray programs on Big Data clusters in 2 simple steps.
 | 
					    In this guide, we will describe how to use RayOnSpark to directly run Ray programs on Big Data clusters in 2 simple steps.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in a new issue