2011 LearningRecurrentNeuralNetworks

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Recurrent Neural Network; Hessian-Free Optimization, Sequence Modeling Task, Martens Hessian-Free Optimization , Neural Network Sequence Model, Generalized Gauss-Newton Matrix.

Notes

Cited By

Quotes

Abstract

In this work we resolve the long-outstanding problem of how to effectively train recurrent neural networks (RNNs) on complex and difficult sequence modeling problems which may contain long-term data dependencies. Utilizing recent advances in the Hessian-free optimization approach (Martens, 2010), together with a novel damping scheme, we successfully train RNNs on two sets of challenging problems. First, a collection of pathological synthetic datasets which are known to be impossible for standard optimization approaches (due to their extremely long-term dependencies), and second, on three natural and highly complex real-world sequence datasets where we find that our method significantly outperforms the previous state-of-the-art method for training neural sequence models: the Long Short-term Memory approach of Hochreiter and Schmidhuber (1997). Additionally, we offer a new interpretation of the generalized Gauss-Newton matrix of Schraudolph (2002) which is used within the HF approach of Martens.

References

BibTeX

@inproceedings{2011_LearningRecurrentNeuralNetworks,
  author    = {James Martens and
               [[Ilya Sutskever]]},
  editor    = {Lise Getoor and
               Tobias Scheffer},
  title     = {Learning Recurrent Neural Networks with Hessian-Free Optimization},
  booktitle = {Proceedings of the 28th International Conference on Machine Learning
               (ICML 2011)},
  pages     = {1033--1040},
  publisher = {Omnipress},
  year      = {2011},
  url       = {https://icml.cc/2011/papers/532\_icmlpaper.pdf},
}


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2011 LearningRecurrentNeuralNetworksIlya Sutskever
James Martens
Learning Recurrent Neural Networks with Hessian-Free Optimization2011