diff --git a/docs/readthedocs/source/doc/UserGuide/databricks.md b/docs/readthedocs/source/doc/UserGuide/databricks.md index 3a222144..32c13e8d 100644 --- a/docs/readthedocs/source/doc/UserGuide/databricks.md +++ b/docs/readthedocs/source/doc/UserGuide/databricks.md @@ -2,37 +2,53 @@ --- -You can run Analytics Zoo program on the [Databricks](https://databricks.com/) cluster as follows. +You can run BigDL program on the [Databricks](https://databricks.com/) cluster as follows. ### **1. Create a Databricks Cluster** - Create either [AWS Databricks](https://docs.databricks.com/getting-started/try-databricks.html) workspace or [Azure Databricks](https://docs.microsoft.com/en-us/azure/azure-databricks/) workspace. -- Create a Databricks [clusters](https://docs.databricks.com/clusters/create.html) using the UI. Choose Databricks runtime version. This guide is tested on Runtime 7.5 (includes Apache Spark 3.0.1, Scala 2.12). +- Create a Databricks [clusters](https://docs.databricks.com/clusters/create.html) using the UI. Choose Databricks runtime version. This guide is tested on Runtime 7.3 (includes Apache Spark 3.0.1, Scala 2.12). -### **2. Installing Analytics Zoo libraries** +### **2. Installing BigDL Python libraries** In the left pane, click **Clusters** and select your cluster. -![](images/Databricks1.PNG) +![](images/cluster.png) -Install Analytics Zoo python environment using prebuilt release Wheel package. Click **Libraries > Install New > Upload > Python Whl**. Download Analytics Zoo prebuilt Wheel [here](https://sourceforge.net/projects/analytics-zoo/files/zoo-py). Choose a wheel with timestamp for the same Spark version and platform as Databricks runtime. Download and drop it on Databricks. +Install BigDL DLLib python environment using prebuilt release Wheel package. Click **Libraries > Install New > Upload > Python Whl**. Download BigDL DLLib prebuilt Wheel [here](https://sourceforge.net/projects/analytics-zoo/files/dllib-py). Choose a wheel with timestamp for the same Spark version and platform as Databricks runtime. Download and drop it on Databricks. -![](images/Databricks2.PNG) +![](images/dllib-whl.png) -Install Analytics Zoo prebuilt jar package. Click **Libraries > Install New > Upload > Jar**. Download Analytics Zoo prebuilt package from [Release Page](../release.md). Please note that you should choose the same spark version of package as your Databricks runtime version. Find jar named "analytics-zoo-bigdl_*-spark_*-jar-with-dependencies.jar" in the lib directory. Drop the jar on Databricks. +Install BigDL Orca python environment using prebuilt release Wheel package. Click **Libraries > Install New > Upload > Python Whl**. Download Bigdl Orca prebuilt Wheel [here](https://sourceforge.net/projects/analytics-zoo/files/dllib-py). Choose a wheel with timestamp for the same Spark version and platform as Databricks runtime. Download and drop it on Databricks. -![](images/Databricks3.PNG) +![](images/orca-whl.png) -Make sure the jar file and analytics-zoo (whl) are installed on all clusters. In **Libraries** tab of your cluster, check installed libraries and click “Install automatically on all clusters” option in **Admin Settings**. +If you want to use other BigDL libraries (Friesian, Chronos, Nano, Serving, etc.), download prebuilt release Wheel package from [here](https://sourceforge.net/projects/analytics-zoo/files/) and install to cluster in the similar ways. -![](images/Databricks4.PNG) -### **3. Setting Spark configuration** +### **3. Installing BigDL Java libraries** + +Install BigDL DLLib prebuilt jar package. Click **Libraries > Install New > Upload > Jar**. Download BigDL DLLib prebuilt package from [Release Page](../release.md). Please note that you should choose the same spark version of package as your Databricks runtime version. Find jar named "bigdl-dllib-spark_*-jar-with-dependencies.jar" in the lib directory. Drop the jar on Databricks. + +![](images/dllib-jar.png) + +Install BigDL Orca prebuilt jar package. Click **Libraries > Install New > Upload > Jar**. Download BigDL Orca prebuilt package from [Release Page](../release.md). Please note that you should choose the same spark version of package as your Databricks runtime version. Find jar named "bigdl-orca-spark_*-jar-with-dependencies.jar" in the lib directory. Drop the jar on Databricks. + +![](images/orca-jar.png) + +If you want to use other BigDL libraries (Friesian, Chronos, Nano, Serving, etc.), download prebuilt jar package from [Release Page](../release.md) and install to cluster in the similar ways. + + +Make sure the jar files and whl files are installed on all clusters. In **Libraries** tab of your cluster, check installed libraries and click “Install automatically on all clusters” option in **Admin Settings**. + +![](images/apply-all.png) + +### **4. Setting Spark configuration** On the cluster configuration page, click the **Advanced Options** toggle. Click the **Spark** tab. You can provide custom [Spark configuration properties](https://spark.apache.org/docs/latest/configuration.html) in a cluster configuration. Please set it according to your cluster resource and program needs. ![](images/Databricks5.PNG) -See below for an example of Spark config setting needed by Analytics Zoo. Here it sets 2 core per executor. Note that "spark.cores.max" needs to be properly set below. +See below for an example of Spark config setting needed by BigDL. Here it sets 2 core per executor. Note that "spark.cores.max" needs to be properly set below. ``` spark.shuffle.reduceLocality.enabled false @@ -42,18 +58,24 @@ spark.databricks.delta.preview.enabled true spark.executor.cores 2 spark.speculation false spark.scheduler.minRegisteredResourcesRatio 1.0 +spark.scheduler.maxRegisteredResourcesWaitingTime 3600s spark.cores.max 4 ``` -### **4. Running Analytics Zoo on Databricks** +### **5. Running BigDL on Databricks** Open a new notebook, and call `init_orca_context` at the beginning of your code (with `cluster_mode` set to "spark-submit"). ```python -from zoo.orca import init_orca_context, stop_orca_context +from bigdl.orca import init_orca_context, stop_orca_context init_orca_context(cluster_mode="spark-submit") ``` Output on Databricks: -![](images/Databricks6.PNG) +![](images/spark-context.png) + + +### **6. Install other third-party libraries on Databricks if necessary** + +If you want to use other third-party libraries, check related Databricks documentation of [libraries for AWS Databricks](https://docs.databricks.com/libraries/index.html) and [libraries for Azure Databricks](https://docs.microsoft.com/en-us/azure/databricks/libraries/). diff --git a/docs/readthedocs/source/doc/UserGuide/images/Databricks1.PNG b/docs/readthedocs/source/doc/UserGuide/images/Databricks1.PNG deleted file mode 100644 index 7f4e2e53..00000000 Binary files a/docs/readthedocs/source/doc/UserGuide/images/Databricks1.PNG and /dev/null differ diff --git a/docs/readthedocs/source/doc/UserGuide/images/Databricks2.PNG b/docs/readthedocs/source/doc/UserGuide/images/Databricks2.PNG deleted file mode 100644 index 5657d918..00000000 Binary files a/docs/readthedocs/source/doc/UserGuide/images/Databricks2.PNG and /dev/null differ diff --git a/docs/readthedocs/source/doc/UserGuide/images/Databricks3.PNG b/docs/readthedocs/source/doc/UserGuide/images/Databricks3.PNG deleted file mode 100644 index 8995edce..00000000 Binary files a/docs/readthedocs/source/doc/UserGuide/images/Databricks3.PNG and /dev/null differ diff --git a/docs/readthedocs/source/doc/UserGuide/images/Databricks4.PNG b/docs/readthedocs/source/doc/UserGuide/images/Databricks4.PNG deleted file mode 100644 index 6d1cbd94..00000000 Binary files a/docs/readthedocs/source/doc/UserGuide/images/Databricks4.PNG and /dev/null differ diff --git a/docs/readthedocs/source/doc/UserGuide/images/Databricks6.PNG b/docs/readthedocs/source/doc/UserGuide/images/Databricks6.PNG deleted file mode 100644 index d7aa86dc..00000000 Binary files a/docs/readthedocs/source/doc/UserGuide/images/Databricks6.PNG and /dev/null differ diff --git a/docs/readthedocs/source/doc/UserGuide/images/apply-all.png b/docs/readthedocs/source/doc/UserGuide/images/apply-all.png new file mode 100644 index 00000000..b2fc0182 Binary files /dev/null and b/docs/readthedocs/source/doc/UserGuide/images/apply-all.png differ diff --git a/docs/readthedocs/source/doc/UserGuide/images/cluster.png b/docs/readthedocs/source/doc/UserGuide/images/cluster.png new file mode 100644 index 00000000..848b86e5 Binary files /dev/null and b/docs/readthedocs/source/doc/UserGuide/images/cluster.png differ diff --git a/docs/readthedocs/source/doc/UserGuide/images/dllib-jar.png b/docs/readthedocs/source/doc/UserGuide/images/dllib-jar.png new file mode 100644 index 00000000..5e4f1718 Binary files /dev/null and b/docs/readthedocs/source/doc/UserGuide/images/dllib-jar.png differ diff --git a/docs/readthedocs/source/doc/UserGuide/images/dllib-whl.png b/docs/readthedocs/source/doc/UserGuide/images/dllib-whl.png new file mode 100644 index 00000000..18efdacc Binary files /dev/null and b/docs/readthedocs/source/doc/UserGuide/images/dllib-whl.png differ diff --git a/docs/readthedocs/source/doc/UserGuide/images/orca-jar.png b/docs/readthedocs/source/doc/UserGuide/images/orca-jar.png new file mode 100644 index 00000000..6aace893 Binary files /dev/null and b/docs/readthedocs/source/doc/UserGuide/images/orca-jar.png differ diff --git a/docs/readthedocs/source/doc/UserGuide/images/orca-whl.png b/docs/readthedocs/source/doc/UserGuide/images/orca-whl.png new file mode 100644 index 00000000..c643643f Binary files /dev/null and b/docs/readthedocs/source/doc/UserGuide/images/orca-whl.png differ diff --git a/docs/readthedocs/source/doc/UserGuide/images/spark-context.png b/docs/readthedocs/source/doc/UserGuide/images/spark-context.png new file mode 100644 index 00000000..89fedc90 Binary files /dev/null and b/docs/readthedocs/source/doc/UserGuide/images/spark-context.png differ