# Orca in 5 minutes ### Overview Most AI projects start with a Python notebook running on a single laptop; however, one usually needs to go through a mountain of pains to scale it to handle larger data set in a distributed fashion. The _**Orca**_ library seamlessly scales out your single node Python notebook across large clusters (so as to process distributed Big Data). --- ### **Tensorflow Bite-sized Example** This section uses TensorFlow 1.15, and you should install TensorFlow before running this example: ```bash pip install tensorflow==1.15 ``` First, initialize [Orca Context](orca-context.md): ```python from bigdl.orca import init_orca_context, OrcaContext # cluster_mode can be "local", "k8s" or "yarn" sc = init_orca_context(cluster_mode="local", cores=4, memory="10g", num_nodes=1) ``` Next, perform [data-parallel processing in Orca](data-parallel-processing.md) (supporting standard Spark Dataframes, TensorFlow Dataset, PyTorch DataLoader, Pandas, etc.): ```python from pyspark.sql.functions import array spark = OrcaContext.get_spark_session() df = spark.read.parquet(file_path) df = df.withColumn('user', array('user')) \ .withColumn('item', array('item')) ``` Finally, use [sklearn-style Estimator APIs in Orca](distributed-training-inference.md) to perform distributed _TensorFlow_, _PyTorch_, _Keras_ and _BigDL_ training and inference: ```python from tensorflow import keras from bigdl.orca.learn.tf.estimator import Estimator user = keras.layers.Input(shape=[1]) item = keras.layers.Input(shape=[1]) feat = keras.layers.concatenate([user, item], axis=1) predictions = keras.layers.Dense(2, activation='softmax')(feat) model = keras.models.Model(inputs=[user, item], outputs=predictions) model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy', metrics=['accuracy']) est = Estimator.from_keras(keras_model=model) est.fit(data=df, batch_size=64, epochs=4, feature_cols=['user', 'item'], label_cols=['label']) ```