ipex-llm/docs/readthedocs/source/doc/PythonAPI/DLlib/regularizers.md
Shengsheng Huang f2e4c40cee change the readthedocs theme and reorg the sections (#6056)
* refactor toc

* refactor toc

* Change to pydata-sphinx-theme and update packages requirement list for ReadtheDocs

* Remove customized css for old theme

* Add index page to each top bar section and limit dropdown maximum to be 4

* Use js to change 'More' to 'Libraries'

* Add custom.css to conf.py for further css changes

* Add BigDL logo and search bar

* refactor toc

* refactor toc and add overview

* refactor toc and add overview

* refactor toc and add overview

* refactor get started

* add paper and video section

* add videos

* add grid columns in landing page

* add document roadmap to index

* reapply search bar and github icon commit

* reorg orca and chronos sections

* Test: weaken ads by js

* update: change left attrbute

* update: add comments

* update: change opacity to 0.7

* Remove useless theme template override for old theme

* Add sidebar releases component in the home page

* Remove sidebar search and restore top nav search button

* Add BigDL handouts

* Add back to homepage button to pages except from the home page

* Update releases contents & styles in left sidebar

* Add version badge to the top bar

* Test: weaken ads by js

* update: add comments

* remove landing page contents

* rfix chronos install

* refactor install

* refactor chronos section titles

* refactor nano index

* change chronos landing

* revise chronos landing page

* add document navigator to nano landing page

* revise install landing page

* Improve css of versions in sidebar

* Make handouts image pointing to a page in new tab

* add win guide to install

* add dliib installation

* revise title bar

* rename index files

* add index page for user guide

* add dllib and orca API

* update user guide landing page

* refactor side bar

* Remove extra style configuration of card components & make different card usage consistent

* Remove extra styles for Nano how-to guides

* Remove extra styles for Chronos how-to guides

* Remove dark mode for now

* Update index page description

* Add decision tree for choosing BigDL libraries in index page

* add dllib models api, revise core layers formats

* Change primary & info color in light mode

* Restyle card components

* Restructure Chronos landing page

* Update card style

* Update BigDL library selection decision tree

* Fix failed Chronos tutorials filter

* refactor PPML documents

* refactor and add friesian documents

* add friesian arch diagram

* update landing pages and fill key features guide index page

* Restyle link card component

* Style video frames in PPML sections

* Adjust Nano landing page

* put api docs to the last in index for convinience

* Make badge horizontal padding smaller & small changes

* Change the second letter of all header titles to be small capitalizd

* Small changes on Chronos index page

* Revise decision tree to make it smaller

* Update: try to change the position of ads.

* Bugfix: deleted nonexist file config

* Update: update ad JS/CSS/config

* Update: change ad.

* Update: delete my template and change files.

* Update: change chronos installation table color.

* Update: change table font color to --pst-color-primary-text

* Remove old contents in landing page sidebar

* Restyle badge for usage in card footer again

* Add quicklinks template on landing page sidebar

* add quick links

* Add scala logo

* move tf, pytorch out of the link

* change orca key features cards

* fix typo

* fix a mistake in wording

* Restyle badge for card footer

* Update decision tree

* Remove useless html templates

* add more api docs and update tutorials in dllib

* update chronos install using new style

* merge changes in nano doc from master

* fix quickstart links in sidebar quicklinks

* Make tables responsive

* Fix overflow in api doc

* Fix list indents problems in [User guide] section

* Further fixes to nested bullets contents in [User Guide] section

* Fix strange title in Nano 5-min doc

* Fix list indent problems in [DLlib] section

* Fix misnumbered list problems and other small fixes for [Chronos] section

* Fix list indent problems and other small fixes for [Friesian] section

* Fix list indent problem and other small fixes for [PPML] section

* Fix list indent problem for developer guide

* Fix list indent problem for [Cluster Serving] section

* fix dllib links

* Fix wrong relative link in section landing page

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
Co-authored-by: Juntao Luo <1072087358@qq.com>
2022-10-18 15:35:31 +08:00

270 lines
11 KiB
Markdown

# Regularizer
--------
## L1 Regularizer ##
**Scala:**
```scala
val l1Regularizer = L1Regularizer(rate)
```
**Python:**
```python
regularizerl1 = L1Regularizer(rate)
```
L1 regularizer is used to add penalty to the gradWeight to avoid overfitting.
In our code implementation, gradWeight = gradWeight + alpha * abs(weight)
For more details, please refer to [wiki](https://en.wikipedia.org/wiki/Regularization_(mathematics)).
**Scala example:**
```scala
import com.intel.analytics.bigdl.dllib.utils.RandomGenerator.RNG
import com.intel.analytics.bigdl.dllib.tensor._
import com.intel.analytics.bigdl.dllib.optim._
import com.intel.analytics.bigdl.numeric.NumericFloat
import com.intel.analytics.bigdl.dllib.nn._
RNG.setSeed(100)
val input = Tensor(3, 5).rand
val gradOutput = Tensor(3, 5).rand
val linear = Linear(5, 5, wRegularizer = L1Regularizer(0.2), bRegularizer = L1Regularizer(0.2))
val output = linear.forward(input)
val gradInput = linear.backward(input, gradOutput)
scala> input
input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
0.54340494 0.67115563 0.2783694 0.4120464 0.4245176
0.52638245 0.84477615 0.14860484 0.004718862 0.15671109
0.12156912 0.18646719 0.67074907 0.21010774 0.82585275
[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]
scala> gradOutput
gradOutput: com.intel.analytics.bigdl.tensor.Tensor[Float] =
0.4527399 0.13670659 0.87014264 0.5750933 0.063681036
0.89132196 0.62431186 0.20920213 0.52334774 0.18532822
0.5622963 0.10837689 0.0058171963 0.21969749 0.3074232
[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]
scala> linear.gradWeight
res2: com.intel.analytics.bigdl.tensor.Tensor[Float] =
0.9835552 1.3616763 0.83564335 0.108898684 0.59625006
0.21608911 0.8393639 0.0035243928 -0.11795368 0.4453743
0.38366735 0.9618148 0.47721142 0.5607486 0.6069793
0.81469804 0.6690552 0.18522228 0.08559488 0.7075894
-0.030468717 0.056625083 0.051471338 0.2917061 0.109963015
[com.intel.analytics.bigdl.tensor.DenseTensor of size 5x5]
```
**Python example:**
```python
from bigdl.dllib.nn.layer import *
from bigdl.dllib.nn.criterion import *
from bigdl.dllib.optim.optimizer import *
from bigdl.dllib.util.common import *
input = np.random.uniform(0, 1, (3, 5)).astype("float32")
gradOutput = np.random.uniform(0, 1, (3, 5)).astype("float32")
linear = Linear(5, 5, wRegularizer = L1Regularizer(0.2), bRegularizer = L1Regularizer(0.2))
output = linear.forward(input)
gradInput = linear.backward(input, gradOutput)
> linear.parameters()
{u'Linear@596d857b': {u'bias': array([ 0.3185505 , -0.02004393, 0.34620118, -0.09206461, 0.40776938], dtype=float32),
u'gradBias': array([ 2.14087653, 1.82181644, 1.90674937, 1.37307787, 0.81534696], dtype=float32),
u'gradWeight': array([[ 0.34909648, 0.85083449, 1.44904375, 0.90150446, 0.57136625],
[ 0.3745544 , 0.42218602, 1.53656614, 1.1836741 , 1.00702667],
[ 0.30529332, 0.26813674, 0.85559171, 0.61224306, 0.34721529],
[ 0.22859855, 0.8535381 , 1.19809723, 1.37248564, 0.50041491],
[ 0.36197871, 0.03069445, 0.64837945, 0.12765063, 0.12872688]], dtype=float32),
u'weight': array([[-0.12423037, 0.35694697, 0.39038274, -0.34970999, -0.08283543],
[-0.4186025 , -0.33235055, 0.34948507, 0.39953214, 0.16294235],
[-0.25171402, -0.28955361, -0.32243955, -0.19771226, -0.29320192],
[-0.39263198, 0.37766701, 0.14673658, 0.24882999, -0.0779015 ],
[ 0.0323218 , -0.31266898, 0.31543773, -0.0898933 , -0.33485892]], dtype=float32)}}
```
## L2 Regularizer ##
**Scala:**
```scala
val l2Regularizer = L2Regularizer(rate)
```
**Python:**
```python
regularizerl2 = L2Regularizer(rate)
```
L2 regularizer is used to add penalty to the gradWeight to avoid overfitting.
In our code implementation, gradWeight = gradWeight + alpha * weight * weight
For more details, please refer to [wiki](https://en.wikipedia.org/wiki/Regularization_(mathematics)).
**Scala example:**
```scala
import com.intel.analytics.bigdl.dllib.utils.RandomGenerator.RNG
import com.intel.analytics.bigdl.dllib.tensor._
import com.intel.analytics.bigdl.dllib.optim._
import com.intel.analytics.bigdl.numeric.NumericFloat
import com.intel.analytics.bigdl.dllib.nn._
RNG.setSeed(100)
val input = Tensor(3, 5).rand
val gradOutput = Tensor(3, 5).rand
val linear = Linear(5, 5, wRegularizer = L2Regularizer(0.2), bRegularizer = L2Regularizer(0.2))
val output = linear.forward(input)
val gradInput = linear.backward(input, gradOutput)
scala> input
input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
0.54340494 0.67115563 0.2783694 0.4120464 0.4245176
0.52638245 0.84477615 0.14860484 0.004718862 0.15671109
0.12156912 0.18646719 0.67074907 0.21010774 0.82585275
[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]
scala> gradOutput
gradOutput: com.intel.analytics.bigdl.tensor.Tensor[Float] =
0.4527399 0.13670659 0.87014264 0.5750933 0.063681036
0.89132196 0.62431186 0.20920213 0.52334774 0.18532822
0.5622963 0.10837689 0.0058171963 0.21969749 0.3074232
[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]
scala> linear.gradWeight
res0: com.intel.analytics.bigdl.tensor.Tensor[Float] =
1.0329735 0.047239657 0.8979603 0.53614384 1.2781229
0.5621818 0.29772854 0.69706535 0.30559152 0.8352279
1.3044653 0.43065858 0.9896795 0.7435816 1.6003494
0.94218314 0.6793372 0.97101355 0.62892824 1.3458569
0.73134506 0.5975239 0.9109101 0.59374434 1.1656629
[com.intel.analytics.bigdl.tensor.DenseTensor of size 5x5]
```
**Python example:**
```python
from bigdl.dllib.nn.layer import *
from bigdl.dllib.nn.criterion import *
from bigdl.dllib.optim.optimizer import *
from bigdl.dllib.util.common import *
input = np.random.uniform(0, 1, (3, 5)).astype("float32")
gradOutput = np.random.uniform(0, 1, (3, 5)).astype("float32")
linear = Linear(5, 5, wRegularizer = L2Regularizer(0.2), bRegularizer = L2Regularizer(0.2))
output = linear.forward(input)
gradInput = linear.backward(input, gradOutput)
> linear.parameters()
{u'Linear@787aab5e': {u'bias': array([-0.43960261, -0.12444571, 0.22857292, -0.43216187, 0.27770036], dtype=float32),
u'gradBias': array([ 0.51726723, 1.32883406, 0.57567948, 1.7791357 , 1.2887038 ], dtype=float32),
u'gradWeight': array([[ 0.45477036, 0.22262168, 0.21923628, 0.26152173, 0.19836383],
[ 1.12261093, 0.72921795, 0.08405925, 0.78192139, 0.48798928],
[ 0.34581488, 0.21195598, 0.26357424, 0.18987852, 0.2465664 ],
[ 1.18659711, 1.11271608, 0.72589797, 1.19098675, 0.33769298],
[ 0.82314551, 0.71177536, 0.4428404 , 0.764337 , 0.3500182 ]], dtype=float32),
u'weight': array([[ 0.03727285, -0.39697152, 0.42733836, -0.34291714, -0.13833708],
[ 0.09232076, -0.09720675, -0.33625153, 0.06477787, -0.34739712],
[ 0.17145753, 0.10128133, 0.16679128, -0.33541158, 0.40437087],
[-0.03005157, -0.36412898, 0.0629965 , 0.13443278, -0.38414535],
[-0.16630849, 0.06934392, 0.40328237, 0.22299488, -0.1178569 ]], dtype=float32)}}
```
## L1L2 Regularizer ##
**Scala:**
```scala
val l1l2Regularizer = L1L2Regularizer(l1rate, l2rate)
```
**Python:**
```python
regularizerl1l2 = L1L2Regularizer(l1rate, l2rate)
```
L1L2 regularizer is used to add penalty to the gradWeight to avoid overfitting.
In our code implementation, we will apply L1regularizer and L2regularizer sequentially.
For more details, please refer to [wiki](https://en.wikipedia.org/wiki/Regularization_(mathematics)).
**Scala example:**
```scala
import com.intel.analytics.bigdl.dllib.utils.RandomGenerator.RNG
import com.intel.analytics.bigdl.dllib.tensor._
import com.intel.analytics.bigdl.dllib.optim._
import com.intel.analytics.bigdl.numeric.NumericFloat
import com.intel.analytics.bigdl.dllib.nn._
RNG.setSeed(100)
val input = Tensor(3, 5).rand
val gradOutput = Tensor(3, 5).rand
val linear = Linear(5, 5, wRegularizer = L1L2Regularizer(0.2, 0.2), bRegularizer = L1L2Regularizer(0.2, 0.2))
val output = linear.forward(input)
val gradInput = linear.backward(input, gradOutput)
scala> input
input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
0.54340494 0.67115563 0.2783694 0.4120464 0.4245176
0.52638245 0.84477615 0.14860484 0.004718862 0.15671109
0.12156912 0.18646719 0.67074907 0.21010774 0.82585275
[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]
scala> gradOutput
gradOutput: com.intel.analytics.bigdl.tensor.Tensor[Float] =
0.4527399 0.13670659 0.87014264 0.5750933 0.063681036
0.89132196 0.62431186 0.20920213 0.52334774 0.18532822
0.5622963 0.10837689 0.0058171963 0.21969749 0.3074232
[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]
scala> linear.gradWeight
res1: com.intel.analytics.bigdl.tensor.Tensor[Float] =
1.069174 1.4422078 0.8913989 0.042112567 0.53756505
0.14077617 0.8959319 -0.030221784 -0.1583686 0.4690558
0.37145022 0.99747723 0.5559263 0.58614403 0.66380215
0.88983417 0.639738 0.14924419 0.027530536 0.71988696
-0.053217214 -8.643427E-4 -0.036953792 0.29753304 0.06567569
[com.intel.analytics.bigdl.tensor.DenseTensor of size 5x5]
```
**Python example:**
```python
from bigdl.dllib.nn.layer import *
from bigdl.dllib.nn.criterion import *
from bigdl.dllib.optim.optimizer import *
from bigdl.dllib.util.common import *
input = np.random.uniform(0, 1, (3, 5)).astype("float32")
gradOutput = np.random.uniform(0, 1, (3, 5)).astype("float32")
linear = Linear(5, 5, wRegularizer = L1L2Regularizer(0.2, 0.2), bRegularizer = L1L2Regularizer(0.2, 0.2))
output = linear.forward(input)
gradInput = linear.backward(input, gradOutput)
> linear.parameters()
{u'Linear@1356aa91': {u'bias': array([-0.05799473, -0.0548001 , 0.00408955, -0.22004321, -0.07143869], dtype=float32),
u'gradBias': array([ 0.89119786, 1.09953558, 1.03394508, 1.19511735, 2.02241182], dtype=float32),
u'gradWeight': array([[ 0.89061081, 0.58810186, -0.10087357, 0.19108151, 0.60029608],
[ 0.95275503, 0.2333075 , 0.46897018, 0.74429053, 1.16038764],
[ 0.22894514, 0.60031962, 0.3836292 , 0.15895618, 0.83136207],
[ 0.49079862, 0.80913013, 0.55491877, 0.69608945, 0.80458677],
[ 0.98890561, 0.49226439, 0.14861123, 1.37666655, 1.47615671]], dtype=float32),
u'weight': array([[ 0.44654208, 0.16320795, -0.36029238, -0.25365737, -0.41974261],
[ 0.18809238, -0.28065765, 0.27677274, -0.29904234, 0.41338971],
[-0.03731538, 0.22493915, 0.10021331, -0.19495697, 0.25470355],
[-0.30836752, 0.12083009, 0.3773002 , 0.24059358, -0.40325543],
[-0.13601269, -0.39310011, -0.05292636, 0.20001481, -0.08444868]], dtype=float32)}}
```