Xin Qiu a3dc9a0158 Update scala docs (#3248 )

* Update scala.md

* Update scala.md

* Update scala.md

* Update scala.md

* Update scala.md

* Update scala.md

2021-10-26 09:16:22 +08:00

6.3 KiB

Raw Blame History

Scala User Guide

1. Try BigDL Examples

This section will show you how to download BigDL prebuild packages and run the build-in examples.

1.1 Download and config

You can download the BigDL official releases and nightly build from the Release Page. After extracting the prebuild package, you need to set environment variables BIGDL_HOME and SPARK_HOME as follows:

export SPARK_HOME=folder path where you extract the Spark package
export BIGDL_HOME=folder path where you extract the BigDL package

1.2 Use Spark interactive shell

You can try BigDL using the Spark interactive shell as follows:

${BIGDL_HOME}/bin/spark-shell-with-bigdl.sh --master local[2]

You will then see a welcome message like below:

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.6
      /_/
         
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
Type in expressions to have them evaluated.
Type :help for more information.

Before you try BigDL APIs, you should use initNNcontext to verify your environment:

scala> import com.intel.analytics.bigdl.dllib.NNContext
import com.intel.analytics.bigdl.dllib.NNContext

scala> val sc = NNContext.initNNContext("Run Example")
2021-01-26 10:19:52 WARN  SparkContext:66 - Using an existing SparkContext; some configuration may not take effect.
2021-01-26 10:19:53 WARN  SparkContext:66 - Using an existing SparkContext; some configuration may not take effect.
sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@487f025

1.3 Run BigDL examples

You can run an BigDL example, e.g., the Lenet, as a standard Spark program (running in either local mode or cluster mode) as follows:

You can download the MNIST Data from here. Unzip all the files and put them in one folder(e.g. mnist).

There're four files. train-images-idx3-ubyte contains train images, train-labels-idx1-ubyte is train label file, t10k-images-idx3-ubyte has validation images and t10k-labels-idx1-ubyte contains validation labels. For more detail, please refer to the download page.

After you uncompress the gzip files, these files may be renamed by some uncompress tools, e.g. train-images-idx3-ubyte is renamed to train-images.idx3-ubyte. Please change the name back before you run the example.

Run the following command:

# Spark local mode
${BIGDL_HOME}/bin/spark-submit-scala-with-bigdl.sh \ 
  --master local[2] \
  --class com.intel.analytics.bigdl.dllib.models.lenet.Train \
  ${BIGDL_HOME}/jars/bigdl-dllib-spark_2.4.6-0.14.0-SNAPSHOT-jar-with-dependencies.jar \   #change to your jar file if your download is not the same version
  -f ./data/mnist \
  -b 320 \
  -e 20

# Spark standalone mode
${BIGDL_HOME}/bin/spark-submit-scala-with-bigdl.sh \
  --master spark://... \         #add your spark master address
  --executor-cores 2 \
  --total-executor-cores 4 \
  --class com.intel.analytics.bigdl.dllib.models.lenet.Train \
  ${BIGDL_HOME}/jars/bigdl-dllib-spark_2.4.6-0.14.0-SNAPSHOT-jar-with-dependencies.jar \   #change to your jar file if your download is not the same version
   -f ./data/mnist \
  -b 320 \
  -e 20

# Spark yarn client mode, please make sure the right HADOOP_CONF_DIR is set
${BIGDL_HOME}/bin/spark-submit-scala-with-bigdl.sh \
  --master yarn \
  --deploy-mode client \
  --executor-cores 2 \
  --num-executors 2 \
  --class com.intel.analytics.bigdl.dllib.models.lenet.Train \
  ${BIGDL_HOME}/jars/bigdl-dllib-spark_2.4.6-0.14.0-SNAPSHOT-jar-with-dependencies.jar \   #change to your jar file if your download is not the same version
  -f ./data/mnist \
  -b 320 \
  -e 20

# Spark yarn cluster mode, please make sure the right HADOOP_CONF_DIR is set
${BIGDL_HOME}/bin/spark-submit-scala-with-bigdl.sh \
  --master yarn \
  --deploy-mode cluster \
  --executor-cores 2 \
  --num-executors 2 \
  --class com.intel.analytics.bigdl.dllib.models.lenet.Train \
  ${BIGDL_HOME}/jars/bigdl-dllib-spark_2.4.6-0.14.0-SNAPSHOT-jar-with-dependencies.jar \   #change to your jar file if your download is not the same version
  -f ./data/mnist \
  -b 320 \
  -e 20

2. Build BigDL Applications

This section will show you how to build your own deep learning project with BigDL.

2.1 Add BigDL dependency

2.1.1 official Release

Currently, BigDL releases are hosted on maven central; below is an example to add the BigDL dllib dependency to your own project:

<dependency>
    <groupId>com.intel.analytics.bigdl</groupId>
    <artifactId>bigdl-dllib-spark_2.4.6</artifactId>
    <version>2.0.0</version>
</dependency>

You can find the other SPARK version here, such as spark_3.1.2.

SBT developers can use

libraryDependencies += "com.intel.analytics.bigdl" % "bigdl-dllib-spark_2.4.6" % "2.0.0"

2.1.2 Nightly Build

Currently, BigDL nightly build is hosted on SonaType.

To link your application with the latest BigDL nightly build, you should add some dependencies like official releases, but change 2.0.0 to the snapshot version (such as 0.14.0-snapshot), and add below repository to your pom.xml.

<repository>
    <id>sonatype</id>
    <name>sonatype repository</name>
    <url>https://oss.sonatype.org/content/groups/public/</url>
    <releases>
        <enabled>true</enabled>
    </releases>
    <snapshots>
        <enabled>true</enabled>
    </snapshots>
</repository>