diff --git a/README.md b/README.md index e5029382..165fa31d 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ --- BigDL makes it easy for data scientists and data engineers to build end-to-end, distributed AI applications. The **BigDL 2.0** release combines the original [BigDL](https://github.com/intel-analytics/BigDL/tree/branch-0.14) and [Analytics Zoo](https://github.com/intel-analytics/analytics-zoo) projects, providing the following features: - * [DLlib](#getting-started-with-bigdl-extensions): distributed deep learning library for Apache Spark *(i.e., the original BigDL framework with Keras-style API and Spark ML pipeline support)* + * [DLlib](#getting-started-with-dllib): distributed deep learning library for Apache Spark *(i.e., the original BigDL framework with Keras-style API and Spark ML pipeline support)* * [Orca](#getting-started-with-orca): seamlessly scale out TensorFlow and PyTorch pipelines for distributed Big Data @@ -19,12 +19,12 @@ BigDL makes it easy for data scientists and data engineers to build end-to-end, * [PPML](#ppml-privacy-preserving-machine-learning): privacy preserving big data analysis and machine learning (*experimental*) -For more information, you may [read the docs](https://analytics-zoo.readthedocs.io/). +For more information, you may [read the docs](https://bigdl.readthedocs.io/). --- ## Installing -You can use BigDL on [Google Colab](https://analytics-zoo.readthedocs.io/en/latest/doc/UserGuide/colab.html) without any installation. BigDL also includes a set of [notebooks](https://analytics-zoo.readthedocs.io/en/latest/doc/UserGuide/notebooks.html) that you can directly open and run in Colab. +You can use BigDL on [Google Colab](https://bigdl.readthedocs.io/en/latest/doc/UserGuide/colab.html) without any installation. BigDL also includes a set of [notebooks](https://bigdl.readthedocs.io/en/latest/doc/UserGuide/notebooks.html) that you can directly open and run in Colab. To install BigDL, we recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) environments. @@ -34,7 +34,7 @@ conda activate my_env pip install bigdl ``` -To install latest nightly build, use ```pip install --pre --upgrade bigdl```; see [Python](https://analytics-zoo.readthedocs.io/en/latest/doc/UserGuide/python.html) and [Scala](https://analytics-zoo.readthedocs.io/en/latest/doc/UserGuide/scala.html) user guide for more details. +To install latest nightly build, use ```pip install --pre --upgrade bigdl```; see [Python](https://bigdl.readthedocs.io/en/latest/doc/UserGuide/python.html) and [Scala](https://bigdl.readthedocs.io/en/latest/doc/UserGuide/scala.html) user guide for more details. ## Getting Started with DLlib **DLlib** is a distributed deep learning library for Apache Spark; with DLlib, users can write distributed deep learning applications as standard Spark programs (using either Scala or Python APIs). @@ -68,13 +68,13 @@ val pipeline = new Pipeline().setStages(Array(scaler, estimator)) val pipelineModel = pipeline.fit(trainingDF) val predictions = pipelineModel.transform(validationDF) ``` -See the [NNframes](https://analytics-zoo.readthedocs.io/en/latest/doc/UseCase/nnframes.html) and [Keras API](https://analytics-zoo.readthedocs.io/en/latest/doc/UseCase/keras-api.html) user guides for more details. +See the [NNframes](https://bigdl.readthedocs.io/en/latest/doc/DLlib/Overview/nnframes.html) and [Keras API](https://bigdl.readthedocs.io/en/latest/doc/DLlib/Overview/keras-api.html) user guides for more details. ## Getting Started with Orca Most AI projects start with a Python notebook running on a single laptop; however, one usually needs to go through a mountain of pains to scale it to handle larger data set in a distributed fashion. The _**Orca**_ library seamlessly scales out your single node TensorFlow or PyTorch notebook across large clusters (so as to process distributed Big Data). -First, initialize [Orca Context](https://analytics-zoo.readthedocs.io/en/latest/doc/Orca/Overview/orca-context.html): +First, initialize [Orca Context](https://bigdl.readthedocs.io/en/latest/doc/Orca/Overview/orca-context.html): ```python from bigdl.orca import init_orca_context @@ -83,7 +83,7 @@ from bigdl.orca import init_orca_context sc = init_orca_context(cluster_mode="yarn", cores=4, memory="10g", num_nodes=2) ``` -Next, perform [data-parallel processing in Orca](https://analytics-zoo.readthedocs.io/en/latest/doc/Orca/Overview/data-parallel-processing.html) (supporting standard Spark Dataframes, TensorFlow Dataset, PyTorch DataLoader, Pandas, Pillow, etc.): +Next, perform [data-parallel processing in Orca](https://bigdl.readthedocs.io/en/latest/doc/Orca/Overview/data-parallel-processing.html) (supporting standard Spark Dataframes, TensorFlow Dataset, PyTorch DataLoader, Pandas, Pillow, etc.): ```python from pyspark.sql.functions import array @@ -93,7 +93,7 @@ df = df.withColumn('user', array('user')) \ .withColumn('item', array('item')) ``` -Finally, use [sklearn-style Estimator APIs in Orca](https://analytics-zoo.readthedocs.io/en/latest/doc/Orca/Overview/distributed-training-inference.html) to perform distributed _TensorFlow_, _PyTorch_ or _Keras_ training and inference: +Finally, use [sklearn-style Estimator APIs in Orca](https://bigdl.readthedocs.io/en/latest/doc/Orca/Overview/distributed-training-inference.html) to perform distributed _TensorFlow_, _PyTorch_ or _Keras_ training and inference: ```python from tensorflow import keras @@ -116,7 +116,7 @@ est.fit(data=df, label_cols=['label']) ``` -See [TensorFlow](https://analytics-zoo.readthedocs.io/en/latest/doc/Orca/QuickStart/orca-tf-quickstart.html) and [PyTorch](https://analytics-zoo.readthedocs.io/en/latest/doc/Orca/QuickStart/orca-pytorch-quickstart.html) quickstart, as well as the [document website](https://analytics-zoo.readthedocs.io/), for more details. +See [TensorFlow](https://bigdl.readthedocs.io/en/latest/doc/Orca/QuickStart/orca-tf-quickstart.html) and [PyTorch](https://bigdl.readthedocs.io/en/latest/doc/Orca/QuickStart/orca-pytorch-quickstart.html) quickstart, as well as the [document website](https://bigdl.readthedocs.io/), for more details. ## Getting Started with RayOnSpark @@ -143,13 +143,13 @@ counters = [Counter.remote() for i in range(5)] print(ray.get([c.increment.remote() for c in counters])) ``` -See the RayOnSpark [user guide](https://analytics-zoo.readthedocs.io/en/latest/doc/Ray/Overview/ray.html) and [quickstart](https://analytics-zoo.readthedocs.io/en/latest/doc/Ray/QuickStart/ray-quickstart.html) for more details. +See the RayOnSpark [user guide](https://bigdl.readthedocs.io/en/latest/doc/Ray/Overview/ray.html) and [quickstart](https://bigdl.readthedocs.io/en/latest/doc/Ray/QuickStart/ray-quickstart.html) for more details. ## Getting Started with Chronos Time series prediction takes observations from previous time steps as input and predicts the values at future time steps. The _**Chronos**_ library makes it easy to build end-to-end time series analysis by applying AutoML to extremely large-scale time series prediction. -To train a time series model with AutoML, first initialize [Orca Context](https://analytics-zoo.readthedocs.io/en/latest/doc/Orca/Overview/orca-context.html): +To train a time series model with AutoML, first initialize [Orca Context](https://bigdl.readthedocs.io/en/latest/doc/Orca/Overview/orca-context.html): ```python from bigdl.orca import init_orca_context @@ -176,21 +176,21 @@ ts_pipeline = trainer.fit(train_df, validation_df) ts_pipeline.predict(test_df) ``` -See the Chronos [user guide](https://analytics-zoo.readthedocs.io/en/latest/doc/Chronos/Overview/chronos.html) and [example](https://analytics-zoo.readthedocs.io/en/latest/doc/Chronos/QuickStart/chronos-autots-quickstart.html) for more details. +See the Chronos [user guide](https://bigdl.readthedocs.io/en/latest/doc/Chronos/Overview/chronos.html) and [example](https://bigdl.readthedocs.io/en/latest/doc/Chronos/QuickStart/chronos-autotsest-quickstart.html) for more details. ## PPML (Privacy Preserving Machine Learning) ***BigDL PPML*** provides a *Trusted Cluster Environment* for protecting the end-to-end Big Data AI pipeline. It combines various low level hardware and software security technologies (e.g., Intel SGX, LibOS such as Graphene and Occlum, Federated Learning, etc.), and allows users to run unmodified Big Data analysis and ML/DL programs (such as Apache Spark, Apache Flink, Tensorflow, PyTorch, etc.) in a secure fashion on (private or public) cloud. -See the [PPML user guide](https://analytics-zoo.readthedocs.io/en/latest/doc/PPML/Overview/ppml.html) for more details. +See the [PPML user guide](https://bigdl.readthedocs.io/en/latest/doc/PPML/Overview/ppml.html) for more details. ## More information -- [Document Website](https://analytics-zoo.readthedocs.io/) +- [Document Website](https://bigdl.readthedocs.io/) - [Mail List](mailto:bigdl-user-group+subscribe@googlegroups.com) - [User Group](https://groups.google.com/forum/#!forum/bigdl-user-group) -- [Powered-By](https://analytics-zoo.readthedocs.io/en/latest/doc/Application/powered-by.html) -- [Presentations](https://analytics-zoo.readthedocs.io/en/latest/doc/Application/presentations.html) +- [Powered-By](https://bigdl.readthedocs.io/en/latest/doc/Application/powered-by.html) +- [Presentations](https://bigdl.readthedocs.io/en/latest/doc/Application/presentations.html) ## Citing BigDL If you've found BigDL useful for your project, you may cite the [paper](https://arxiv.org/abs/1804.05839) as follows: diff --git a/docs/readthedocs/source/doc/Chronos/QuickStart/chronos-anomaly-detector.md b/docs/readthedocs/source/doc/Chronos/QuickStart/chronos-anomaly-detector.md index d3b3078f..c76f57ac 100644 --- a/docs/readthedocs/source/doc/Chronos/QuickStart/chronos-anomaly-detector.md +++ b/docs/readthedocs/source/doc/Chronos/QuickStart/chronos-anomaly-detector.md @@ -1,4 +1,4 @@ -# Use Anomaly Detector for Unsupervised Anomaly Detection +# Anomaly Detector --- diff --git a/docs/readthedocs/source/doc/Chronos/QuickStart/chronos-autotsest-quickstart.md b/docs/readthedocs/source/doc/Chronos/QuickStart/chronos-autotsest-quickstart.md index 1d8e2052..9fafba11 100644 --- a/docs/readthedocs/source/doc/Chronos/QuickStart/chronos-autotsest-quickstart.md +++ b/docs/readthedocs/source/doc/Chronos/QuickStart/chronos-autotsest-quickstart.md @@ -1,4 +1,4 @@ -# Use AutoTSEstimator for Time-Series Forecasting +# AutoTSEstimator --- diff --git a/docs/readthedocs/source/doc/Chronos/QuickStart/chronos-tsdataset-forecaster-quickstart.md b/docs/readthedocs/source/doc/Chronos/QuickStart/chronos-tsdataset-forecaster-quickstart.md index f52db491..696655a5 100644 --- a/docs/readthedocs/source/doc/Chronos/QuickStart/chronos-tsdataset-forecaster-quickstart.md +++ b/docs/readthedocs/source/doc/Chronos/QuickStart/chronos-tsdataset-forecaster-quickstart.md @@ -1,4 +1,4 @@ -# Use TSDataset and Forecaster for Time-Series Forecasting +# TSDataset and Forecaster --- diff --git a/docs/readthedocs/source/doc/DLlib/Overview/dllib.md b/docs/readthedocs/source/doc/DLlib/Overview/dllib.md new file mode 100644 index 00000000..a43c9eb2 --- /dev/null +++ b/docs/readthedocs/source/doc/DLlib/Overview/dllib.md @@ -0,0 +1,8 @@ +# DLlib User Guide + +DLlib is a distributed deep learning library for Apache Spark; with DLlib, users can write their deep learning applications as standard Spark programs (using either Scala or Python APIs). + +It includes the functionalities of the original [BigDL](https://github.com/intel-analytics/BigDL/tree/branch-0.14) project, and provides following high-level APIs for distributed deep learning on Spark: + +* [Keras-like API](keras-api.md) +* [Spark ML pipeline support](nnframes.md) diff --git a/docs/readthedocs/source/doc/UseCase/keras-api.md b/docs/readthedocs/source/doc/DLlib/Overview/keras-api.md similarity index 99% rename from docs/readthedocs/source/doc/UseCase/keras-api.md rename to docs/readthedocs/source/doc/DLlib/Overview/keras-api.md index b7ec6a1e..3584e85a 100644 --- a/docs/readthedocs/source/doc/UseCase/keras-api.md +++ b/docs/readthedocs/source/doc/DLlib/Overview/keras-api.md @@ -1,9 +1,9 @@ -# Use Keras-Like API for BigDL +# Keras-Like API ## 1. Introduction -Analytics Zoo provides __Keras-like API__ based on [__Keras 1.2.2__](https://faroit.github.io/keras-docs/1.2.2/) for BigDL. Users, especially those familiar with Keras, can easily use the Keras-like API to create a BigDL model and train, evaluate or tune it in a distributed fashion. +[DLlib](dllib.md) provides __Keras-like API__ based on [__Keras 1.2.2__](https://faroit.github.io/keras-docs/1.2.2/) for distributed deep learning on Apache Spark. Users can easily use the Keras-like API to create a neural network model, and train, evaluate or tune it in a distributed fashion on Spark. -To define a model in Scala using the Keras-like API, now one just need to import the following packages: +To define a model in Scala using the Keras-like API, one just needs to import the following packages: ```scala import com.intel.analytics.zoo.pipeline.api.keras.layers._ diff --git a/docs/readthedocs/source/doc/UseCase/nnframes.md b/docs/readthedocs/source/doc/DLlib/Overview/nnframes.md similarity index 95% rename from docs/readthedocs/source/doc/UseCase/nnframes.md rename to docs/readthedocs/source/doc/DLlib/Overview/nnframes.md index b1a509b8..23cddba0 100644 --- a/docs/readthedocs/source/doc/UseCase/nnframes.md +++ b/docs/readthedocs/source/doc/DLlib/Overview/nnframes.md @@ -1,24 +1,12 @@ -# Use Spark ML Pipeline for BigDL +# Spark ML Pipeline Support ## 1. NNFrames Overview -`NNFrames` in Analytics Zoo provides to provide Spark DataFrame and and ML Pipeline support for [BigDL](https://github.com/intel-analytics/bigdl). It provides both Python and Scala interfaces, and is compatible with both Spark 2.x and Spark 3.x. - - -**Highlights** - -- Easy-to-use DataFrame(DataSet)-based API for training, prediction and evaluation with deep learning models. - -- Effortless integration with Spark ML pipeline and compatibility with other feature transformers and algorithms in Spark ML. - -- In a few lines, run large scale inference or transfer learning from pre-trained models of Keras, Tensorflow, PyTorch or BigDL. - -- Rich toolset for feature extraction and processing, including image, audio and texts. - +`NNFrames` in [DLlib](dllib.md) provides Spark DataFrame and ML Pipeline support of distributed deep learning on Apache Spark. It includes both Python and Scala interfaces, and is compatible with both Spark 2.x and Spark 3.x. **Examples** -The examples are included in the Analytics Zoo source code. +The examples are included in the DLlib source code. - image classification: model inference using pre-trained Inception v1 model. (See [Scala version](https://github.com/intel-analytics/analytics-zoo/tree/master/zoo/src/main/scala/com/intel/analytics/zoo/examples/nnframes/imageInference) and [Python version](https://github.com/intel-analytics/analytics-zoo/tree/master/pyzoo/zoo/examples/nnframes/imageInference)) - image classification: transfer learning from pre-trained Inception v1 model. (See [Scala version](https://github.com/intel-analytics/analytics-zoo/tree/master/zoo/src/main/scala/com/intel/analytics/zoo/examples/nnframes/imageTransferLearning) and [Python version](https://github.com/intel-analytics/analytics-zoo/tree/master/pyzoo/zoo/examples/nnframes/imageTransferLearning)) diff --git a/docs/readthedocs/source/doc/Orca/Overview/data-parallel-processing.md b/docs/readthedocs/source/doc/Orca/Overview/data-parallel-processing.md index 988e2d06..6667ec6c 100644 --- a/docs/readthedocs/source/doc/Orca/Overview/data-parallel-processing.md +++ b/docs/readthedocs/source/doc/Orca/Overview/data-parallel-processing.md @@ -1,4 +1,4 @@ -# Distributed Data-Parallel Processing +# Distributed Data Processing --- diff --git a/docs/readthedocs/source/doc/Orca/Overview/distributed-tuning.md b/docs/readthedocs/source/doc/Orca/Overview/distributed-tuning.md index 8cfbe457..e3a38070 100644 --- a/docs/readthedocs/source/doc/Orca/Overview/distributed-tuning.md +++ b/docs/readthedocs/source/doc/Orca/Overview/distributed-tuning.md @@ -1,4 +1,4 @@ -# Distributed Hyper-parameter Tuning +# Distributed Hyper-Parameter Tuning --- diff --git a/docs/readthedocs/source/doc/Ray/Overview/ray.md b/docs/readthedocs/source/doc/Ray/Overview/ray.md index bd1303a4..202776a0 100644 --- a/docs/readthedocs/source/doc/Ray/Overview/ray.md +++ b/docs/readthedocs/source/doc/Ray/Overview/ray.md @@ -1,4 +1,4 @@ -# RayOnSpark User Guide +# RayOnSpark --- diff --git a/docs/readthedocs/source/index.rst b/docs/readthedocs/source/index.rst index 40014916..e9a402b6 100644 --- a/docs/readthedocs/source/index.rst +++ b/docs/readthedocs/source/index.rst @@ -5,7 +5,7 @@ BigDL Documentation `BigDL `_ makes it easy for data scientists and data engineers to build end-to-end, distributed AI applications. The **BigDL 2.0** release combines the original `BigDL `_ and `Analytics Zoo `_ projects, providing the following features: -* **DLlib**: distributed deep learning library for Apache Spark (including support for `Spark ML pipeline `_ and `Keras-like `_ APIs) +* `DLlib `_: distributed deep learning library for Apache Spark * `Orca `_: seamlessly scale out TensorFlow and PyTorch pipelines for distributed Big Data * `RayOnSpark `_: run Ray programs directly on Big Data clusters * `Chronos `_: scalable time series analysis using AutoML @@ -39,26 +39,16 @@ BigDL Documentation doc/UserGuide/hadoop.md doc/UserGuide/k8s.md doc/UserGuide/databricks.md - doc/Ray/Overview/ray.md - doc/Chronos/Overview/chronos.md - doc/PPML/Overview/ppml.md doc/UserGuide/develop.md .. toctree:: :maxdepth: 1 - :caption: Common Use Case - - doc/Orca/QuickStart/orca-pytorch-distributed-quickstart.md - doc/UseCase/spark-dataframe.md - doc/UseCase/xshards-pandas.md - doc/Chronos/QuickStart/chronos-autotsest-quickstart.md - doc/Chronos/QuickStart/chronos-tsdataset-forecaster-quickstart.md - doc/Chronos/QuickStart/chronos-anomaly-detector.md - doc/UseCase/keras-api.md - doc/UseCase/nnframes.md - doc/Orca/QuickStart/orca-autoestimator-pytorch-quickstart.md - doc/Orca/QuickStart/orca-autoxgboost-quickstart.md - + :caption: DLlib Overview + + doc/DLlib/Overview/dllib.md + doc/DLlib/Overview/keras-api.md + doc/DLlib/Overview/nnframes.md + .. toctree:: :maxdepth: 1 :caption: Orca Overview @@ -68,7 +58,33 @@ BigDL Documentation doc/Orca/Overview/data-parallel-processing.md doc/Orca/Overview/distributed-training-inference.md doc/Orca/Overview/distributed-tuning.md + doc/Ray/Overview/ray.md +.. toctree:: + :maxdepth: 1 + :caption: Chronos Overview + + doc/Chronos/Overview/chronos.md + doc/Chronos/QuickStart/chronos-autotsest-quickstart.md + doc/Chronos/QuickStart/chronos-tsdataset-forecaster-quickstart.md + doc/Chronos/QuickStart/chronos-anomaly-detector.md + +.. toctree:: + :maxdepth: 1 + :caption: PPML Overview + + doc/PPML/Overview/ppml.md + +.. toctree:: + :maxdepth: 1 + :caption: Common Use Case + + doc/Orca/QuickStart/orca-pytorch-distributed-quickstart.md + doc/UseCase/spark-dataframe.md + doc/UseCase/xshards-pandas.md + doc/Orca/QuickStart/orca-autoestimator-pytorch-quickstart.md + doc/Orca/QuickStart/orca-autoxgboost-quickstart.md + .. toctree:: :maxdepth: 1 :caption: Python API