ipex-llm/docs/readthedocs/source/doc/DLlib/Overview/dllib.md
Shengsheng Huang f2e4c40cee change the readthedocs theme and reorg the sections (#6056)
* refactor toc

* refactor toc

* Change to pydata-sphinx-theme and update packages requirement list for ReadtheDocs

* Remove customized css for old theme

* Add index page to each top bar section and limit dropdown maximum to be 4

* Use js to change 'More' to 'Libraries'

* Add custom.css to conf.py for further css changes

* Add BigDL logo and search bar

* refactor toc

* refactor toc and add overview

* refactor toc and add overview

* refactor toc and add overview

* refactor get started

* add paper and video section

* add videos

* add grid columns in landing page

* add document roadmap to index

* reapply search bar and github icon commit

* reorg orca and chronos sections

* Test: weaken ads by js

* update: change left attrbute

* update: add comments

* update: change opacity to 0.7

* Remove useless theme template override for old theme

* Add sidebar releases component in the home page

* Remove sidebar search and restore top nav search button

* Add BigDL handouts

* Add back to homepage button to pages except from the home page

* Update releases contents & styles in left sidebar

* Add version badge to the top bar

* Test: weaken ads by js

* update: add comments

* remove landing page contents

* rfix chronos install

* refactor install

* refactor chronos section titles

* refactor nano index

* change chronos landing

* revise chronos landing page

* add document navigator to nano landing page

* revise install landing page

* Improve css of versions in sidebar

* Make handouts image pointing to a page in new tab

* add win guide to install

* add dliib installation

* revise title bar

* rename index files

* add index page for user guide

* add dllib and orca API

* update user guide landing page

* refactor side bar

* Remove extra style configuration of card components & make different card usage consistent

* Remove extra styles for Nano how-to guides

* Remove extra styles for Chronos how-to guides

* Remove dark mode for now

* Update index page description

* Add decision tree for choosing BigDL libraries in index page

* add dllib models api, revise core layers formats

* Change primary & info color in light mode

* Restyle card components

* Restructure Chronos landing page

* Update card style

* Update BigDL library selection decision tree

* Fix failed Chronos tutorials filter

* refactor PPML documents

* refactor and add friesian documents

* add friesian arch diagram

* update landing pages and fill key features guide index page

* Restyle link card component

* Style video frames in PPML sections

* Adjust Nano landing page

* put api docs to the last in index for convinience

* Make badge horizontal padding smaller & small changes

* Change the second letter of all header titles to be small capitalizd

* Small changes on Chronos index page

* Revise decision tree to make it smaller

* Update: try to change the position of ads.

* Bugfix: deleted nonexist file config

* Update: update ad JS/CSS/config

* Update: change ad.

* Update: delete my template and change files.

* Update: change chronos installation table color.

* Update: change table font color to --pst-color-primary-text

* Remove old contents in landing page sidebar

* Restyle badge for usage in card footer again

* Add quicklinks template on landing page sidebar

* add quick links

* Add scala logo

* move tf, pytorch out of the link

* change orca key features cards

* fix typo

* fix a mistake in wording

* Restyle badge for card footer

* Update decision tree

* Remove useless html templates

* add more api docs and update tutorials in dllib

* update chronos install using new style

* merge changes in nano doc from master

* fix quickstart links in sidebar quicklinks

* Make tables responsive

* Fix overflow in api doc

* Fix list indents problems in [User guide] section

* Further fixes to nested bullets contents in [User Guide] section

* Fix strange title in Nano 5-min doc

* Fix list indent problems in [DLlib] section

* Fix misnumbered list problems and other small fixes for [Chronos] section

* Fix list indent problems and other small fixes for [Friesian] section

* Fix list indent problem and other small fixes for [PPML] section

* Fix list indent problem for developer guide

* Fix list indent problem for [Cluster Serving] section

* fix dllib links

* Fix wrong relative link in section landing page

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
Co-authored-by: Juntao Luo <1072087358@qq.com>
2022-10-18 15:35:31 +08:00

139 lines
No EOL
5.1 KiB
Markdown

# DLlib in 5 minutes
## Overview
DLlib is a distributed deep learning library for Apache Spark; with DLlib, users can write their deep learning applications as standard Spark programs (using either Scala or Python APIs).
It includes the functionalities of the [original BigDL](https://github.com/intel-analytics/BigDL/tree/branch-0.14) project, and provides following high-level APIs for distributed deep learning on Spark:
* [Keras-like API](keras-api.md)
* [Spark ML pipeline support](nnframes.md)
---
## Scala Example
This section show a single example of how to use dllib to build a deep learning application on Spark, using Keras APIs
#### **LeNet Model on MNIST using Keras-Style API**
This tutorial is an explanation of what is happening in the [lenet](https://github.com/intel-analytics/BigDL/tree/branch-2.0/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/example/keras) example
A bigdl-dllib program starts with initialize as follows.
````scala
val conf = Engine.createSparkConf()
.setAppName("Train Lenet on MNIST")
.set("spark.task.maxFailures", "1")
val sc = new SparkContext(conf)
Engine.init
````
After the initialization, we need to:
1. Load train and validation data by _**creating the [```DataSet```](https://github.com/intel-analytics/BigDL/blob/branch-2.0/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/feature/dataset/DataSet.scala)**_ (e.g., ````SampleToGreyImg````, ````GreyImgNormalizer```` and ````GreyImgToBatch````):
````scala
val trainSet = (if (sc.isDefined) {
DataSet.array(load(trainData, trainLabel), sc.get, param.nodeNumber)
} else {
DataSet.array(load(trainData, trainLabel))
}) -> SampleToGreyImg(28, 28) -> GreyImgNormalizer(trainMean, trainStd) -> GreyImgToBatch(
param.batchSize)
val validationSet = DataSet.array(load(validationData, validationLabel), sc) ->
BytesToGreyImg(28, 28) -> GreyImgNormalizer(testMean, testStd) -> GreyImgToBatch(
param.batchSize)
````
2. We then define Lenet model using Keras-style api
````scala
val input = Input(inputShape = Shape(28, 28, 1))
val reshape = Reshape(Array(1, 28, 28)).inputs(input)
val conv1 = Convolution2D(6, 5, 5, activation = "tanh").setName("conv1_5x5").inputs(reshape)
val pool1 = MaxPooling2D().inputs(conv1)
val conv2 = Convolution2D(12, 5, 5, activation = "tanh").setName("conv2_5x5").inputs(pool1)
val pool2 = MaxPooling2D().inputs(conv2)
val flatten = Flatten().inputs(pool2)
val fc1 = Dense(100, activation = "tanh").setName("fc1").inputs(flatten)
val fc2 = Dense(classNum, activation = "softmax").setName("fc2").inputs(fc1)
Model(input, fc2)
````
3. After that, we configure the learning process. Set the ````optimization method```` and the ````Criterion```` (which, given input and target, computes gradient per given loss function):
````scala
model.compile(optimizer = optimMethod,
loss = ClassNLLCriterion[Float](logProbAsInput = false),
metrics = Array(new Top1Accuracy[Float](), new Top5Accuracy[Float](), new Loss[Float]))
````
Finally we _**train the model**_ by calling ````model.fit````:
````scala
model.fit(trainSet, nbEpoch = param.maxEpoch, validationData = validationSet)
````
---
## Python Example
#### **Initialize NN Context**
`NNContext` is the main entry for provisioning the dllib program on the underlying cluster (such as K8s or Hadoop cluster), or just on a single laptop.
An dlllib program usually starts with the initialization of `NNContext` as follows:
```python
from bigdl.dllib.nncontext import *
init_nncontext()
```
In `init_nncontext`, the user may specify cluster mode for the dllib program:
- *Cluster mode=*: "local", "yarn-client", "yarn-cluster", "k8s-client", "standalone" and "spark-submit". Default to be "local".
The dllib program simply runs `init_nncontext` on the local machine, which will automatically provision the runtime Python environment and distributed execution engine on the underlying computing environment (such as a single laptop, a large K8s or Hadoop cluster, etc.).
#### **Autograd Examples using bigdl-dllb keras Python API**
This tutorial describes the [Autograd](https://github.com/intel-analytics/BigDL/tree/branch-2.0/python/dllib/examples/autograd).
The example first do the initializton using `init_nncontext()`:
```python
sc = init_nncontext()
```
It then generate the input data X_, Y_
```python
data_len = 1000
X_ = np.random.uniform(0, 1, (1000, 2))
Y_ = ((2 * X_).sum(1) + 0.4).reshape([data_len, 1])
```
It then define the custom loss
```python
def mean_absolute_error(y_true, y_pred):
result = mean(abs(y_true - y_pred), axis=1)
return result
```
After that, the example creates the model as follows and set the criterion as the custom loss:
```python
a = Input(shape=(2,))
b = Dense(1)(a)
c = Lambda(function=add_one_func)(b)
model = Model(input=a, output=c)
model.compile(optimizer=SGD(learningrate=1e-2),
loss=mean_absolute_error)
```
Finally the example trains the model by calling `model.fit`:
```python
model.fit(x=X_,
y=Y_,
batch_size=32,
nb_epoch=int(options.nb_epoch),
distributed=False)
```