From a3dc9a0158baae81cdc0986da06d35d3fcdfeafd Mon Sep 17 00:00:00 2001 From: Xin Qiu Date: Tue, 26 Oct 2021 09:16:22 +0800 Subject: [PATCH] Update scala docs (#3248) * Update scala.md * Update scala.md * Update scala.md * Update scala.md * Update scala.md * Update scala.md --- .../readthedocs/source/doc/UserGuide/scala.md | 123 +++++++++--------- 1 file changed, 64 insertions(+), 59 deletions(-) diff --git a/docs/readthedocs/source/doc/UserGuide/scala.md b/docs/readthedocs/source/doc/UserGuide/scala.md index a408f8dd..66137add 100644 --- a/docs/readthedocs/source/doc/UserGuide/scala.md +++ b/docs/readthedocs/source/doc/UserGuide/scala.md @@ -2,22 +2,22 @@ --- -### **1. Try Analytics Zoo Examples** -This section will show you how to download Analytics Zoo prebuild packages and run the build-in examples. +### **1. Try BigDL Examples** +This section will show you how to download BigDL prebuild packages and run the build-in examples. #### **1.1 Download and config** -You can download the Analytics Zoo official releases and nightly build from the [Release Page](../release.md). After extracting the prebuild package, you need to set environment variables **ANALYTICS_ZOO_HOME** and **SPARK_HOME** as follows: +You can download the BigDL official releases and nightly build from the [Release Page](../release.md). After extracting the prebuild package, you need to set environment variables **BIGDL_HOME** and **SPARK_HOME** as follows: ```bash export SPARK_HOME=folder path where you extract the Spark package -export ANALYTICS_ZOO_HOME=folder path where you extract the Analytics Zoo package +export BIGDL_HOME=folder path where you extract the BigDL package ``` #### **1.2 Use Spark interactive shell** -You can try Analytics Zoo using the Spark interactive shell as follows: +You can try BigDL using the Spark interactive shell as follows: ```bash -${ANALYTICS_ZOO_HOME}/bin/spark-shell-with-zoo.sh --master local[2] +${BIGDL_HOME}/bin/spark-shell-with-bigdl.sh --master local[2] ``` You will then see a welcome message like below: @@ -27,7 +27,7 @@ Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ - /___/ .__/\_,_/_/ /_/\_\ version 2.4.3 + /___/ .__/\_,_/_/ /_/\_\ version 2.4.6 /_/ Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112) @@ -35,11 +35,11 @@ Type in expressions to have them evaluated. Type :help for more information. ``` -Before you try Analytics Zoo APIs, you should use `initNNcontext` to verify your environment: +Before you try BigDL APIs, you should use `initNNcontext` to verify your environment: ```scala -scala> import com.intel.analytics.zoo.common.NNContext -import com.intel.analytics.zoo.common.NNContext +scala> import com.intel.analytics.bigdl.dllib.NNContext +import com.intel.analytics.bigdl.dllib.NNContext scala> val sc = NNContext.initNNContext("Run Example") 2021-01-26 10:19:52 WARN SparkContext:66 - Using an existing SparkContext; some configuration may not take effect. @@ -47,95 +47,100 @@ scala> val sc = NNContext.initNNContext("Run Example") sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@487f025 ``` -#### **1.3 Run Analytics Zoo examples** +#### **1.3 Run BigDL examples** -You can run an Analytics Zoo example, e.g., the [Wide & Deep Recommendation](https://github.com/intel-analytics/analytics-zoo/tree/master/zoo/src/main/scala/com/intel/analytics/zoo/examples/recommendation), as a standard Spark program (running in either local mode or cluster mode) as follows: +You can run an BigDL example, e.g., the [Lenet](https://github.com/intel-analytics/BigDL/tree/branch-2.0/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/models/lenet), as a standard Spark program (running in either local mode or cluster mode) as follows: -1. Download Census Income Dataset to `./data/census` from [here](https://archive.ics.uci.edu/ml/datasets/Census+Income). +1. You can download the MNIST Data from [here](http://yann.lecun.com/exdb/mnist/). Unzip all the +files and put them in one folder(e.g. mnist). + +There're four files. **train-images-idx3-ubyte** contains train images, +**train-labels-idx1-ubyte** is train label file, **t10k-images-idx3-ubyte** has validation images + and **t10k-labels-idx1-ubyte** contains validation labels. For more detail, please refer to the + download page. + +After you uncompress the gzip files, these files may be renamed by some uncompress tools, e.g. **train-images-idx3-ubyte** is renamed +to **train-images.idx3-ubyte**. Please change the name back before you run the example. 2. Run the following command: ```bash # Spark local mode -${ANALYTICS_ZOO_HOME}/bin/spark-submit-scala-with-zoo.sh \ +${BIGDL_HOME}/bin/spark-submit-scala-with-bigdl.sh \ --master local[2] \ - --class com.intel.analytics.zoo.examples.recommendation.WideAndDeepExample \ - dist/lib/analytics-zoo-bigdl_0.12.1-spark_2.4.3-0.9.0-jar-with-dependencies.jar \ #change to your jar file if your download is not spark_2.4.3-0.9.0 - --inputDir ./data/census \ - --batchSize 320 \ - --maxEpoch 20 \ - --dataset census + --class com.intel.analytics.bigdl.dllib.models.lenet.Train \ + ${BIGDL_HOME}/jars/bigdl-dllib-spark_2.4.6-0.14.0-SNAPSHOT-jar-with-dependencies.jar \ #change to your jar file if your download is not the same version + -f ./data/mnist \ + -b 320 \ + -e 20 # Spark standalone mode -${ANALYTICS_ZOO_HOME}/bin/spark-submit-scala-with-zoo.sh \ +${BIGDL_HOME}/bin/spark-submit-scala-with-bigdl.sh \ --master spark://... \ #add your spark master address - --executor-cores cores_per_executor \ - --total-executor-cores total_cores_for_the_job \ - --class com.intel.analytics.zoo.examples.recommendation.WideAndDeepExample \ - dist/lib/analytics-zoo-bigdl_0.12.1-spark_2.4.3-0.9.0-jar-with-dependencies.jar \ #change to your jar file if your download is not spark_2.4.3-0.9.0 - --inputDir ./data/census \ - --batchSize 320 \ - --maxEpoch 20 \ - --dataset census + --executor-cores 2 \ + --total-executor-cores 4 \ + --class com.intel.analytics.bigdl.dllib.models.lenet.Train \ + ${BIGDL_HOME}/jars/bigdl-dllib-spark_2.4.6-0.14.0-SNAPSHOT-jar-with-dependencies.jar \ #change to your jar file if your download is not the same version + -f ./data/mnist \ + -b 320 \ + -e 20 # Spark yarn client mode, please make sure the right HADOOP_CONF_DIR is set -${ANALYTICS_ZOO_HOME}/bin/spark-submit-scala-with-zoo.sh \ +${BIGDL_HOME}/bin/spark-submit-scala-with-bigdl.sh \ --master yarn \ --deploy-mode client \ - --executor-cores cores_per_executor \ - --num-executors executors_number \ - --class com.intel.analytics.zoo.examples.recommendation.WideAndDeepExample \ - dist/lib/analytics-zoo-bigdl_0.12.1-spark_2.4.3-0.9.0-jar-with-dependencies.jar \ #change to your jar file if your download is not spark_2.4.3-0.9.0 - --inputDir ./data/census \ - --batchSize 320 \ - --maxEpoch 20 \ - --dataset census + --executor-cores 2 \ + --num-executors 2 \ + --class com.intel.analytics.bigdl.dllib.models.lenet.Train \ + ${BIGDL_HOME}/jars/bigdl-dllib-spark_2.4.6-0.14.0-SNAPSHOT-jar-with-dependencies.jar \ #change to your jar file if your download is not the same version + -f ./data/mnist \ + -b 320 \ + -e 20 # Spark yarn cluster mode, please make sure the right HADOOP_CONF_DIR is set -${ANALYTICS_ZOO_HOME}/bin/spark-submit-scala-with-zoo.sh \ +${BIGDL_HOME}/bin/spark-submit-scala-with-bigdl.sh \ --master yarn \ --deploy-mode cluster \ - --executor-cores cores_per_executor \ - --num-executors executors_number \ - --class com.intel.analytics.zoo.examples.recommendation.WideAndDeepExample \ - dist/lib/analytics-zoo-bigdl_0.12.1-spark_2.4.3-0.9.0-jar-with-dependencies.jar \ #change to your jar file if your download is not spark_2.4.3-0.9.0 - --inputDir ./data/census \ - --batchSize 320 \ - --maxEpoch 20 \ - --dataset census + --executor-cores 2 \ + --num-executors 2 \ + --class com.intel.analytics.bigdl.dllib.models.lenet.Train \ + ${BIGDL_HOME}/jars/bigdl-dllib-spark_2.4.6-0.14.0-SNAPSHOT-jar-with-dependencies.jar \ #change to your jar file if your download is not the same version + -f ./data/mnist \ + -b 320 \ + -e 20 ``` --- -### **2. Build Analytics Zoo Applications** +### **2. Build BigDL Applications** -This section will show you how to build your own deep learning project with Analytics Zoo. +This section will show you how to build your own deep learning project with BigDL. -#### **2.1 Add Analytics Zoo dependency** +#### **2.1 Add BigDL dependency** ##### **2.1.1 official Release** -Currently, Analytics Zoo releases are hosted on maven central; below is an example to add the Analytics Zoo dependency to your own project: +Currently, BigDL releases are hosted on maven central; below is an example to add the BigDL dllib dependency to your own project: ```xml - com.intel.analytics.zoo - analytics-zoo-bigdl_0.12.1-spark_2.4.3 - 0.9.0 + com.intel.analytics.bigdl + bigdl-dllib-spark_2.4.6 + 2.0.0 ``` -You can find the other SPARK version [here](https://search.maven.org/search?q=analytics-zoo-bigdl), such as `spark_2.1.1`, `spark_2.2.1`, `spark_2.3.1`, `spark_3.0.0`. +You can find the other SPARK version [here](https://search.maven.org/search?q=bigdl-dllib), such as `spark_3.1.2`. SBT developers can use ```sbt -libraryDependencies += "com.intel.analytics.zoo" % "analytics-zoo-bigdl_0.12.1-spark_2.4.3" % "0.9.0" +libraryDependencies += "com.intel.analytics.bigdl" % "bigdl-dllib-spark_2.4.6" % "2.0.0" ``` ##### **2.1.2 Nightly Build** -Currently, Analytics Zoo nightly build is hosted on [SonaType](https://oss.sonatype.org/content/groups/public/com/intel/analytics/zoo/). +Currently, BigDL nightly build is hosted on [SonaType](https://oss.sonatype.org/content/groups/public/com/intel/analytics/bigdl/). -To link your application with the latest Analytics Zoo nightly build, you should add some dependencies like [official releases](#11-official-release), but change `0.9.0` to the snapshot version (such as 0.10.0-snapshot), and add below repository to your pom.xml. +To link your application with the latest BigDL nightly build, you should add some dependencies like [official releases](#11-official-release), but change `2.0.0` to the snapshot version (such as 0.14.0-snapshot), and add below repository to your pom.xml. ```xml @@ -159,5 +164,5 @@ resolvers += "ossrh repository" at "https://oss.sonatype.org/content/repositorie #### **2.2 Build a Scala project** -To enable Analytics Zoo in project, you should add Analytics Zoo to your project's dependencies using maven or sbt. Here is a [simple MLP example](https://github.com/intel-analytics/zoo-tutorials/tree/master/scala/SimpleMlp) to show you how to use Analytics Zoo to build your own deep learning project using maven or sbt, and how to run the simple example in IDEA and spark-submit. +To enable BigDL in project, you should add BigDL to your project's dependencies using maven or sbt. Here is a [simple MLP example](https://github.com/intel-analytics/zoo-tutorials/tree/master/scala/SimpleMlp) to show you how to use BigDL to build your own deep learning project using maven or sbt, and how to run the simple example in IDEA and spark-submit.