Learning Rate Schedule Algorithm

From GM-RKB
Jump to navigation Jump to search

A Learning Rate Schedule Algorithm is a Neural Network Training Algorithm that changes the learning rate during training.



References

2021

  • (Wikipedia, 2021) ⇒ https://en.wikipedia.org/wiki/Learning_rate#Learning_rate_schedule Retrieved:2021-7-4.
    • Initial rate can be left as system default or can be selected using a range of techniques. A learning rate schedule changes the learning rate during learning and is most often changed between epochs/iterations. This is mainly done with two parameters: decay and momentum . There are many different learning rate schedules but the most common are time-based, step-based and exponential.

      Decay serves to settle the learning in a nice place and avoid oscillations, a situation that may arise when a too high constant learning rate makes the learning jump back and forth over a minimum, and is controlled by a hyperparameter.

      Momentum is analogous to a ball rolling down a hill; we want the ball to settle at the lowest point of the hill (corresponding to the lowest error). Momentum both speeds up the learning (increasing the learning rate) when the error cost gradient is heading in the same direction for a long time and also avoids local minima by 'rolling over' small bumps. Momentum is controlled by a hyper parameter analogous to a ball's mass which must be chosen manually—too high and the ball will roll over minima which we wish to find, too low and it will not fulfil its purpose. The formula for factoring in the momentum is more complex than for decay but is most often built in with deep learning libraries such as Keras.

      Time-based learning schedules alter the learning rate depending on the learning rate of the previous time iteration. Factoring in the decay the mathematical formula for the learning rate is:

      [math]\displaystyle{ \eta_{n+1} = \dfrac{\eta_n }{1+dn} }[/math]

      where [math]\displaystyle{ \eta }[/math] is the learning rate, [math]\displaystyle{ d }[/math] is a decay parameter and [math]\displaystyle{ n }[/math] is the iteration step.

      Step-based learning schedules changes the learning rate according to some pre defined steps. The decay application formula is here defined as:

      [math]\displaystyle{ \eta_{n} = \eta_0d^{floor(\frac{1+n}{r})} }[/math]

      where [math]\displaystyle{ \eta_{n} }[/math] is the learning rate at iteration [math]\displaystyle{ n }[/math] , [math]\displaystyle{ \eta_0 }[/math] is the initial learning rate, [math]\displaystyle{ d }[/math] is how much the learning rate should change at each drop (0.5 corresponds to a halving) and [math]\displaystyle{ r }[/math] corresponds to the droprate, or how often the rate should be dropped (10 corresponds to a drop every 10 iterations). The floor function here drops the value of its input to 0 for all values smaller than 1.

      Exponential learning schedules are similar to step-based but instead of steps a decreasing exponential function is used. The mathematical formula for factoring in the decay is:

      [math]\displaystyle{ \eta_{n} = \eta_0e^{-dn} }[/math]

      where [math]\displaystyle{ d }[/math] is a decay parameter.