ipex-llm/docs/readthedocs/source/doc/PythonAPI/DLlib/learningrate-Scheduler.md
Shengsheng Huang f2e4c40cee change the readthedocs theme and reorg the sections (#6056)
* refactor toc

* refactor toc

* Change to pydata-sphinx-theme and update packages requirement list for ReadtheDocs

* Remove customized css for old theme

* Add index page to each top bar section and limit dropdown maximum to be 4

* Use js to change 'More' to 'Libraries'

* Add custom.css to conf.py for further css changes

* Add BigDL logo and search bar

* refactor toc

* refactor toc and add overview

* refactor toc and add overview

* refactor toc and add overview

* refactor get started

* add paper and video section

* add videos

* add grid columns in landing page

* add document roadmap to index

* reapply search bar and github icon commit

* reorg orca and chronos sections

* Test: weaken ads by js

* update: change left attrbute

* update: add comments

* update: change opacity to 0.7

* Remove useless theme template override for old theme

* Add sidebar releases component in the home page

* Remove sidebar search and restore top nav search button

* Add BigDL handouts

* Add back to homepage button to pages except from the home page

* Update releases contents & styles in left sidebar

* Add version badge to the top bar

* Test: weaken ads by js

* update: add comments

* remove landing page contents

* rfix chronos install

* refactor install

* refactor chronos section titles

* refactor nano index

* change chronos landing

* revise chronos landing page

* add document navigator to nano landing page

* revise install landing page

* Improve css of versions in sidebar

* Make handouts image pointing to a page in new tab

* add win guide to install

* add dliib installation

* revise title bar

* rename index files

* add index page for user guide

* add dllib and orca API

* update user guide landing page

* refactor side bar

* Remove extra style configuration of card components & make different card usage consistent

* Remove extra styles for Nano how-to guides

* Remove extra styles for Chronos how-to guides

* Remove dark mode for now

* Update index page description

* Add decision tree for choosing BigDL libraries in index page

* add dllib models api, revise core layers formats

* Change primary & info color in light mode

* Restyle card components

* Restructure Chronos landing page

* Update card style

* Update BigDL library selection decision tree

* Fix failed Chronos tutorials filter

* refactor PPML documents

* refactor and add friesian documents

* add friesian arch diagram

* update landing pages and fill key features guide index page

* Restyle link card component

* Style video frames in PPML sections

* Adjust Nano landing page

* put api docs to the last in index for convinience

* Make badge horizontal padding smaller & small changes

* Change the second letter of all header titles to be small capitalizd

* Small changes on Chronos index page

* Revise decision tree to make it smaller

* Update: try to change the position of ads.

* Bugfix: deleted nonexist file config

* Update: update ad JS/CSS/config

* Update: change ad.

* Update: delete my template and change files.

* Update: change chronos installation table color.

* Update: change table font color to --pst-color-primary-text

* Remove old contents in landing page sidebar

* Restyle badge for usage in card footer again

* Add quicklinks template on landing page sidebar

* add quick links

* Add scala logo

* move tf, pytorch out of the link

* change orca key features cards

* fix typo

* fix a mistake in wording

* Restyle badge for card footer

* Update decision tree

* Remove useless html templates

* add more api docs and update tutorials in dllib

* update chronos install using new style

* merge changes in nano doc from master

* fix quickstart links in sidebar quicklinks

* Make tables responsive

* Fix overflow in api doc

* Fix list indents problems in [User guide] section

* Further fixes to nested bullets contents in [User Guide] section

* Fix strange title in Nano 5-min doc

* Fix list indent problems in [DLlib] section

* Fix misnumbered list problems and other small fixes for [Chronos] section

* Fix list indent problems and other small fixes for [Friesian] section

* Fix list indent problem and other small fixes for [PPML] section

* Fix list indent problem for developer guide

* Fix list indent problem for [Cluster Serving] section

* fix dllib links

* Fix wrong relative link in section landing page

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
Co-authored-by: Juntao Luo <1072087358@qq.com>
2022-10-18 15:35:31 +08:00

12 KiB

Learning Rate Scheduler


Poly

Scala:

val lrScheduler = Poly(power=0.5, maxIteration=1000)

Python:

lr_scheduler = Poly(power=0.5, max_iteration=1000, bigdl_type="float")

A learning rate decay policy, where the effective learning rate follows a polynomial decay, to be zero by the max_iteration. Calculation: base_lr (1 - iter/maxIteration) ^ (power)

power coeffient of decay, refer to calculation formula

maxIteration max iteration when lr becomes zero

Scala example:

import com.intel.analytics.bigdl.dllib.optim.SGD._
import com.intel.analytics.bigdl.dllib.optim._
import com.intel.analytics.bigdl.dllib.tensor.{Storage, Tensor}
import com.intel.analytics.bigdl.dllib.tensor.TensorNumericMath.TensorNumeric.NumericFloat
import com.intel.analytics.bigdl.dllib.utils.T

val optimMethod = new SGD[Double](0.1)
optimMethod.learningRateSchedule = Poly(3, 100)
def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
  return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
}
val x = Tensor[Double](Storage(Array(10.0, 10.0)))
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.1
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.0970299

Python example:

optim_method = SGD(0.1)
optimMethod.learningRateSchedule = Poly(3, 100)

Default

It is the default learning rate schedule. For each iteration, the learning rate would update with the following formula: l_{n + 1} = l / (1 + n * learning_rate_decay) where l is the initial learning rate

Scala:

val lrScheduler = Default()

Python:

lr_scheduler = Default()

Scala example:

val optimMethod = new SGD[Double](0.1, 0.1)
def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
  return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
}
val x = Tensor[Double](Storage(Array(10.0, 10.0)))
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.1
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.09090909090909091
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.08333333333333334

Python example:

optimMethod = SGD(leaningrate_schedule=Default())

NaturalExp

A learning rate schedule, which rescale the learning rate by exp ( -decay_rate * iter / decay_step ) referring to tensorflow's learning rate decay # natural_exp_decay

decay_step how often to apply decay

gamma the decay rate. e.g. 0.96

Scala:

val learningRateScheduler = NaturalExp(1, 1)

Scala example:

val optimMethod = new SGD[Double](0.1)
optimMethod.learningRateSchedule = NaturalExp(1, 1)
def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
  (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
}
val x = Tensor[Double](Storage(Array(10.0, 10.0)))
val state = T("epoch" -> 0, "evalCounter" -> 0)
optimMethod.state = state
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.1

optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.036787944117144235

optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.013533528323661271

Exponential

A learning rate schedule, which rescale the learning rate by lr_{n + 1} = lr * decayRate ^ (iter / decayStep)

decayStep the inteval for lr decay

decayRate decay rate

stairCase if true, iter / decayStep is an integer division and the decayed learning rate follows a staircase function.

Scala:

val learningRateSchedule = Exponential(10, 0.96)

Python:

exponential = Exponential(100, 0.1)

Scala example:

val optimMethod = new SGD[Double](0.05)
optimMethod.learningRateSchedule = Exponential(10, 0.96)
def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
  (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
}
val x = Tensor[Double](Storage(Array(10.0, 10.0)))
val state = T("epoch" -> 0, "evalCounter" -> 0)
optimMethod.state = state
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.05

optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.049796306069892535

Python example:

optimMethod = SGD(leaningrate_schedule=Exponential(100, 0.1))

Plateau

Plateau is the learning rate schedule when a metric has stopped improving. Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. It monitors a quantity and if no improvement is seen for a 'patience' number of epochs, the learning rate is reduced.

monitor quantity to be monitored, can be Loss or score

factor factor by which the learning rate will be reduced. new_lr = lr * factor

patience number of epochs with no improvement after which learning rate will be reduced.

mode one of {min, max}. In min mode, lr will be reduced when the quantity monitored has stopped decreasing; in max mode it will be reduced when the quantity monitored has stopped increasing

epsilon threshold for measuring the new optimum, to only focus on significant changes.

cooldown number of epochs to wait before resuming normal operation after lr has been reduced.

minLr lower bound on the learning rate.

Scala:

val learningRateSchedule = Plateau(monitor="score", factor=0.1, patience=10, mode="min", epsilon=1e-4f, cooldown=0, minLr=0)

Python:

plateau = Plateau("score", factor=0.1, patience=10, mode="min", epsilon=1e-4, cooldown=0, minLr=0)

Scala example:

val optimMethod = new SGD[Double](0.05)
optimMethod.learningRateSchedule = Plateau(monitor="score", factor=0.1, patience=10, mode="min", epsilon=1e-4f, cooldown=0, minLr=0)
def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
  (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
}
val x = Tensor[Double](Storage(Array(10.0, 10.0)))
val state = T("epoch" -> 0, "evalCounter" -> 0)
optimMethod.state = state
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)


optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)

Python example:

optimMethod = SGD(leaningrate_schedule=Plateau("score"))

Warmup

A learning rate gradual increase policy, where the effective learning rate increase delta after each iteration. Calculation: base_lr + delta * iteration

delta increase amount after each iteration

Scala:

val learningRateSchedule = Warmup(delta = 0.05)

Python:

warmup = Warmup(delta=0.05)

Scala example:

val lrSchedules = new SequentialSchedule(100)
lrSchedules.add(Warmup(0.3), 3).add(Poly(3, 100), 100)
val optimMethod = new SGD[Double](learningRate = 0.1, learningRateSchedule = lrSchedules)

def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
  return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
}
val x = Tensor[Double](Storage(Array(10.0, 10.0)))
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.1

optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.4

Python example:

optimMethod = SGD(leaningrate_schedule=Warmup(0.05))

SequentialSchedule

A learning rate scheduler which can stack several learning rate schedulers.

iterationPerEpoch iteration numbers per epoch

Scala:

val learningRateSchedule = SequentialSchedule(iterationPerEpoch=100)

Python:

sequentialSchedule = SequentialSchedule(iteration_per_epoch=5)

Scala example:

val lrSchedules = new SequentialSchedule(100)
lrSchedules.add(Warmup(0.3), 3).add(Poly(3, 100), 100)
val optimMethod = new SGD[Double](learningRate = 0.1, learningRateSchedule = lrSchedules)

def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
  return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
}
val x = Tensor[Double](Storage(Array(10.0, 10.0)))
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.1

optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.4

optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.7

optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-1.0

optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.9702989999999999

Python example:

sequentialSchedule = SequentialSchedule(5)
poly = Poly(0.5, 2)
sequentialSchedule.add(poly, 5)

EpochDecay

Scala:

def decay(epoch: Int): Double =
  if (epoch >= 1) 2.0 else if (epoch >= 2) 1.0 else 0.0

val learningRateSchedule = EpochDecay(decay)

It is an epoch decay learning rate schedule. The learning rate decays through a function argument on number of run epochs l_{n + 1} = l_{n} * 0.1 ^ decayType(epoch)

decayType is a function with number of run epochs as the argument

Scala example:

def decay(epoch: Int): Double =
  if (epoch == 1) 2.0 else if (epoch == 2) 1.0 else 0.0

val optimMethod = new SGD[Double](1000)
optimMethod.learningRateSchedule = EpochDecay(decay)
def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
  return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
}
val x = Tensor[Double](Storage(Array(10.0, 10.0)))
val state = T("epoch" -> 0)
for(e <- 1 to 3) {
  state("epoch") = e
  optimMethod.state = state
  optimMethod.optimize(feval, x)
  if(e <= 1) {
    assert(optimMethod.learningRateSchedule.currentRate==10)
  } else if (e <= 2) {
    assert(optimMethod.learningRateSchedule.currentRate==100)
  } else {
    assert(optimMethod.learningRateSchedule.currentRate==1000)
  }
}

Regime

A structure to specify hyper parameters by start epoch and end epoch. Usually work with EpochSchedule.

startEpoch start epoch

endEpoch end epoch

config config table contains hyper parameters

EpochSchedule

A learning rate schedule which configure the learning rate according to some pre-defined Regime. If the running epoch is within the interval of a regime r [r.startEpoch, r.endEpoch], then the learning rate will take the "learningRate" in r.config.

regimes an array of pre-defined Regime.

Scala:

val regimes: Array[Regime] = Array(
  Regime(1, 3, T("learningRate" -> 1e-2, "weightDecay" -> 2e-4)),
  Regime(4, 7, T("learningRate" -> 5e-3, "weightDecay" -> 2e-4)),
  Regime(8, 10, T("learningRate" -> 1e-3, "weightDecay" -> 0.0))
)
val learningRateScheduler = EpochSchedule(regimes)

Scala example:

val regimes: Array[Regime] = Array(
  Regime(1, 3, T("learningRate" -> 1e-2, "weightDecay" -> 2e-4)),
  Regime(4, 7, T("learningRate" -> 5e-3, "weightDecay" -> 2e-4)),
  Regime(8, 10, T("learningRate" -> 1e-3, "weightDecay" -> 0.0))
)

val state = T("epoch" -> 0)
val optimMethod = new SGD[Double](0.1)
optimMethod.learningRateSchedule = EpochSchedule(regimes)
def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
  return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
}
val x = Tensor[Double](Storage(Array(10.0, 10.0)))
for(e <- 1 to 10) {
  state("epoch") = e
  optimMethod.state = state
  optimMethod.optimize(feval, x)
  if(e <= 3) {
    assert(optimMethod.learningRateSchedule.currentRate==-1e-2)
    assert(optimMethod.weightDecay==2e-4)
  } else if (e <= 7) {
    assert(optimMethod.learningRateSchedule.currentRate==-5e-3)
    assert(optimMethod.weightDecay==2e-4)
  } else if (e <= 10) {
    assert(optimMethod.learningRateSchedule.currentRate==-1e-3)
    assert(optimMethod.weightDecay==0.0)
  }
}

EpochStep

A learning rate schedule which rescale the learning rate by gamma for each stepSize epochs.

stepSize For how many epochs to update the learning rate once

gamma the rescale factor

Scala:

val learningRateScheduler = EpochStep(1, 0.5)

Scala example:

val optimMethod = new SGD[Double](0.1)
optimMethod.learningRateSchedule = EpochStep(1, 0.5)
def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
  (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
}
val x = Tensor[Double](Storage(Array(10.0, 10.0)))
val state = T("epoch" -> 0)
for(e <- 1 to 10) {
  state("epoch") = e
  optimMethod.state = state
  optimMethod.optimize(feval, x)
  assert(optimMethod.learningRateSchedule.currentRate==(-0.1 * Math.pow(0.5, e)))
}