Root Mean Square Propagation Algorithm (RMSprop)

Jump to: navigation, search

A Root Mean Square Propagation Algorithm (RMSprop) is a Gradient Descent-based Learning Algorithm that combines Adagrad and Adadelta methods.



  • (Wikipedia, 2018) ⇒ Retrieved:2018-4-29.
    • RMSProp (for Root Mean Square Propagation) is also a method in which the learning rate is adapted for each of the parameters. The idea is to divide the learning rate for a weight by a running average of the magnitudes of recent gradients for that weight. [1] So, first the running average is calculated in terms of means square,

      [math] v(w,t):=\gamma v(w,t-1)+(1-\gamma)(\nabla Q_i(w))^2 [/math]

      where, [math] \gamma [/math] is the forgetting factor. And the parameters are updated as,

      [math] w:=w-\frac{\eta}{\sqrt{v(w,t)}}\nabla Q_i(w) [/math]

      RMSProp has shown excellent adaptation of learning rate in different applications. RMSProp can be seen as a generalization of Rprop and is capable to work with mini-batches as well opposed to only full-batches.





  • (Misra, 2015) ⇒ Ishan Misra (2015)."Optimization for Deep Networks" (PDF)
    • QUOTE: RMSProp = Rprop + SGD
      • Tieleman & Hinton et al., 2012 (Coursera slide 29, Lecture 6)
      • Scale updates similarly across mini-batches,
      • Scale by decaying average of squared gradient,
        • Rather than the sum of squared gradients in AdaGrad.

          [math]r_t=(1-\gamma)f'(\theta)^2+\gamma r_{t-1}[/math]




  1. Tieleman, Tijmen and Hinton, Geoffrey (2012). Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning