ipex-llm/docs/readthedocs/source/doc/PythonAPI/DLlib/regularizers.md

# Regularizer

--------

## L1 Regularizer ##

**Scala:**
```scala
val l1Regularizer = L1Regularizer(rate)
```
**Python:**
```python
regularizerl1 = L1Regularizer(rate)
```

L1 regularizer is used to add penalty to the gradWeight to avoid overfitting.

In our code implementation, gradWeight = gradWeight + alpha * abs(weight)

For more details, please refer to [wiki](https://en.wikipedia.org/wiki/Regularization_(mathematics)).

**Scala example:**
```scala

import com.intel.analytics.bigdl.dllib.utils.RandomGenerator.RNG
import com.intel.analytics.bigdl.dllib.tensor._
import com.intel.analytics.bigdl.dllib.optim._
import com.intel.analytics.bigdl.numeric.NumericFloat
import com.intel.analytics.bigdl.dllib.nn._

RNG.setSeed(100)

val input = Tensor(3, 5).rand
val gradOutput = Tensor(3, 5).rand
val linear = Linear(5, 5, wRegularizer = L1Regularizer(0.2), bRegularizer = L1Regularizer(0.2))

val output = linear.forward(input)
val gradInput = linear.backward(input, gradOutput)

scala> input
input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
0.54340494      0.67115563      0.2783694       0.4120464       0.4245176
0.52638245      0.84477615      0.14860484      0.004718862     0.15671109
0.12156912      0.18646719      0.67074907      0.21010774      0.82585275
[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]

scala> gradOutput
gradOutput: com.intel.analytics.bigdl.tensor.Tensor[Float] =
0.4527399       0.13670659      0.87014264      0.5750933       0.063681036
0.89132196      0.62431186      0.20920213      0.52334774      0.18532822
0.5622963       0.10837689      0.0058171963    0.21969749      0.3074232
[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]

scala> linear.gradWeight
res2: com.intel.analytics.bigdl.tensor.Tensor[Float] =
0.9835552       1.3616763       0.83564335      0.108898684     0.59625006
0.21608911      0.8393639       0.0035243928    -0.11795368     0.4453743
0.38366735      0.9618148       0.47721142      0.5607486       0.6069793
0.81469804      0.6690552       0.18522228      0.08559488      0.7075894
-0.030468717    0.056625083     0.051471338     0.2917061       0.109963015
[com.intel.analytics.bigdl.tensor.DenseTensor of size 5x5]

```

**Python example:**
```python

from bigdl.dllib.nn.layer import *
from bigdl.dllib.nn.criterion import *
from bigdl.dllib.optim.optimizer import *
from bigdl.dllib.util.common import *

input = np.random.uniform(0, 1, (3, 5)).astype("float32")
gradOutput = np.random.uniform(0, 1, (3, 5)).astype("float32")
linear = Linear(5, 5, wRegularizer = L1Regularizer(0.2), bRegularizer = L1Regularizer(0.2))
output = linear.forward(input)
gradInput = linear.backward(input, gradOutput)

> linear.parameters()
{u'Linear@596d857b': {u'bias': array([ 0.3185505 , -0.02004393,  0.34620118, -0.09206461,  0.40776938], dtype=float32),
  u'gradBias': array([ 2.14087653,  1.82181644,  1.90674937,  1.37307787,  0.81534696], dtype=float32),
  u'gradWeight': array([[ 0.34909648,  0.85083449,  1.44904375,  0.90150446,  0.57136625],
         [ 0.3745544 ,  0.42218602,  1.53656614,  1.1836741 ,  1.00702667],
         [ 0.30529332,  0.26813674,  0.85559171,  0.61224306,  0.34721529],
         [ 0.22859855,  0.8535381 ,  1.19809723,  1.37248564,  0.50041491],
         [ 0.36197871,  0.03069445,  0.64837945,  0.12765063,  0.12872688]], dtype=float32),
  u'weight': array([[-0.12423037,  0.35694697,  0.39038274, -0.34970999, -0.08283543],
         [-0.4186025 , -0.33235055,  0.34948507,  0.39953214,  0.16294235],
         [-0.25171402, -0.28955361, -0.32243955, -0.19771226, -0.29320192],
         [-0.39263198,  0.37766701,  0.14673658,  0.24882999, -0.0779015 ],
         [ 0.0323218 , -0.31266898,  0.31543773, -0.0898933 , -0.33485892]], dtype=float32)}}
```


## L2 Regularizer ##

**Scala:**
```scala
val l2Regularizer = L2Regularizer(rate)
```
**Python:**
```python
regularizerl2 = L2Regularizer(rate)
```

L2 regularizer is used to add penalty to the gradWeight to avoid overfitting.

In our code implementation, gradWeight = gradWeight + alpha * weight * weight

For more details, please refer to [wiki](https://en.wikipedia.org/wiki/Regularization_(mathematics)).

**Scala example:**
```scala

import com.intel.analytics.bigdl.dllib.utils.RandomGenerator.RNG
import com.intel.analytics.bigdl.dllib.tensor._
import com.intel.analytics.bigdl.dllib.optim._
import com.intel.analytics.bigdl.numeric.NumericFloat
import com.intel.analytics.bigdl.dllib.nn._

RNG.setSeed(100)

val input = Tensor(3, 5).rand
val gradOutput = Tensor(3, 5).rand
val linear = Linear(5, 5, wRegularizer = L2Regularizer(0.2), bRegularizer = L2Regularizer(0.2))

val output = linear.forward(input)
val gradInput = linear.backward(input, gradOutput)

scala> input
input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
0.54340494      0.67115563      0.2783694       0.4120464       0.4245176
0.52638245      0.84477615      0.14860484      0.004718862     0.15671109
0.12156912      0.18646719      0.67074907      0.21010774      0.82585275
[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]

scala> gradOutput
gradOutput: com.intel.analytics.bigdl.tensor.Tensor[Float] =
0.4527399       0.13670659      0.87014264      0.5750933       0.063681036
0.89132196      0.62431186      0.20920213      0.52334774      0.18532822
0.5622963       0.10837689      0.0058171963    0.21969749      0.3074232
[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]

scala> linear.gradWeight
res0: com.intel.analytics.bigdl.tensor.Tensor[Float] =
1.0329735       0.047239657     0.8979603       0.53614384      1.2781229
0.5621818       0.29772854      0.69706535      0.30559152      0.8352279
1.3044653       0.43065858      0.9896795       0.7435816       1.6003494
0.94218314      0.6793372       0.97101355      0.62892824      1.3458569
0.73134506      0.5975239       0.9109101       0.59374434      1.1656629
[com.intel.analytics.bigdl.tensor.DenseTensor of size 5x5]

```

**Python example:**
```python
from bigdl.dllib.nn.layer import *
from bigdl.dllib.nn.criterion import *
from bigdl.dllib.optim.optimizer import *
from bigdl.dllib.util.common import *

input = np.random.uniform(0, 1, (3, 5)).astype("float32")
gradOutput = np.random.uniform(0, 1, (3, 5)).astype("float32")
linear = Linear(5, 5, wRegularizer = L2Regularizer(0.2), bRegularizer = L2Regularizer(0.2))
output = linear.forward(input)
gradInput = linear.backward(input, gradOutput)

> linear.parameters()
{u'Linear@787aab5e': {u'bias': array([-0.43960261, -0.12444571,  0.22857292, -0.43216187,  0.27770036], dtype=float32),
  u'gradBias': array([ 0.51726723,  1.32883406,  0.57567948,  1.7791357 ,  1.2887038 ], dtype=float32),
  u'gradWeight': array([[ 0.45477036,  0.22262168,  0.21923628,  0.26152173,  0.19836383],
         [ 1.12261093,  0.72921795,  0.08405925,  0.78192139,  0.48798928],
         [ 0.34581488,  0.21195598,  0.26357424,  0.18987852,  0.2465664 ],
         [ 1.18659711,  1.11271608,  0.72589797,  1.19098675,  0.33769298],
         [ 0.82314551,  0.71177536,  0.4428404 ,  0.764337  ,  0.3500182 ]], dtype=float32),
  u'weight': array([[ 0.03727285, -0.39697152,  0.42733836, -0.34291714, -0.13833708],
         [ 0.09232076, -0.09720675, -0.33625153,  0.06477787, -0.34739712],
         [ 0.17145753,  0.10128133,  0.16679128, -0.33541158,  0.40437087],
         [-0.03005157, -0.36412898,  0.0629965 ,  0.13443278, -0.38414535],
         [-0.16630849,  0.06934392,  0.40328237,  0.22299488, -0.1178569 ]], dtype=float32)}}
```

## L1L2 Regularizer ##

**Scala:**
```scala
val l1l2Regularizer = L1L2Regularizer(l1rate, l2rate)
```
**Python:**
```python
regularizerl1l2 = L1L2Regularizer(l1rate, l2rate)
```

L1L2 regularizer is used to add penalty to the gradWeight to avoid overfitting.

In our code implementation, we will apply L1regularizer and L2regularizer sequentially.

For more details, please refer to [wiki](https://en.wikipedia.org/wiki/Regularization_(mathematics)).

**Scala example:**
```scala

import com.intel.analytics.bigdl.dllib.utils.RandomGenerator.RNG
import com.intel.analytics.bigdl.dllib.tensor._
import com.intel.analytics.bigdl.dllib.optim._
import com.intel.analytics.bigdl.numeric.NumericFloat
import com.intel.analytics.bigdl.dllib.nn._

RNG.setSeed(100)

val input = Tensor(3, 5).rand
val gradOutput = Tensor(3, 5).rand
val linear = Linear(5, 5, wRegularizer = L1L2Regularizer(0.2, 0.2), bRegularizer = L1L2Regularizer(0.2, 0.2))

val output = linear.forward(input)
val gradInput = linear.backward(input, gradOutput)

scala> input
input: com.intel.analytics.bigdl.tensor.Tensor[Float] =
0.54340494      0.67115563      0.2783694       0.4120464       0.4245176
0.52638245      0.84477615      0.14860484      0.004718862     0.15671109
0.12156912      0.18646719      0.67074907      0.21010774      0.82585275
[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]

scala> gradOutput
gradOutput: com.intel.analytics.bigdl.tensor.Tensor[Float] =
0.4527399       0.13670659      0.87014264      0.5750933       0.063681036
0.89132196      0.62431186      0.20920213      0.52334774      0.18532822
0.5622963       0.10837689      0.0058171963    0.21969749      0.3074232
[com.intel.analytics.bigdl.tensor.DenseTensor$mcF$sp of size 3x5]

scala> linear.gradWeight
res1: com.intel.analytics.bigdl.tensor.Tensor[Float] =
1.069174        1.4422078       0.8913989       0.042112567     0.53756505
0.14077617      0.8959319       -0.030221784    -0.1583686      0.4690558
0.37145022      0.99747723      0.5559263       0.58614403      0.66380215
0.88983417      0.639738        0.14924419      0.027530536     0.71988696
-0.053217214    -8.643427E-4    -0.036953792    0.29753304      0.06567569
[com.intel.analytics.bigdl.tensor.DenseTensor of size 5x5]
```

**Python example:**
```python
from bigdl.dllib.nn.layer import *
from bigdl.dllib.nn.criterion import *
from bigdl.dllib.optim.optimizer import *
from bigdl.dllib.util.common import *

input = np.random.uniform(0, 1, (3, 5)).astype("float32")
gradOutput = np.random.uniform(0, 1, (3, 5)).astype("float32")
linear = Linear(5, 5, wRegularizer = L1L2Regularizer(0.2, 0.2), bRegularizer = L1L2Regularizer(0.2, 0.2))
output = linear.forward(input)
gradInput = linear.backward(input, gradOutput)

> linear.parameters()
{u'Linear@1356aa91': {u'bias': array([-0.05799473, -0.0548001 ,  0.00408955, -0.22004321, -0.07143869], dtype=float32),
  u'gradBias': array([ 0.89119786,  1.09953558,  1.03394508,  1.19511735,  2.02241182], dtype=float32),
  u'gradWeight': array([[ 0.89061081,  0.58810186, -0.10087357,  0.19108151,  0.60029608],
         [ 0.95275503,  0.2333075 ,  0.46897018,  0.74429053,  1.16038764],
         [ 0.22894514,  0.60031962,  0.3836292 ,  0.15895618,  0.83136207],
         [ 0.49079862,  0.80913013,  0.55491877,  0.69608945,  0.80458677],
         [ 0.98890561,  0.49226439,  0.14861123,  1.37666655,  1.47615671]], dtype=float32),
  u'weight': array([[ 0.44654208,  0.16320795, -0.36029238, -0.25365737, -0.41974261],
         [ 0.18809238, -0.28065765,  0.27677274, -0.29904234,  0.41338971],
         [-0.03731538,  0.22493915,  0.10021331, -0.19495697,  0.25470355],
         [-0.30836752,  0.12083009,  0.3773002 ,  0.24059358, -0.40325543],
         [-0.13601269, -0.39310011, -0.05292636,  0.20001481, -0.08444868]], dtype=float32)}}
```