2016 APrimeronNeuralNetworkModelsfor

(Goldberg, 2016) ⇒ Yoav Goldberg. (2016). “A Primer on Neural Network Models for Natural Language Processing.” In: Journal of Artificial Intelligence Research, 57(1). DOI: 10.1613/jair.4992 arXiv:1510.00726

Subject Headings: Neural Network NLP Algorithm, Neural Natural Language Processing System.

Notes

Cited By

Quotes

Abstract

Over the past few years, neural networks have re-emerged as powerful machine-learning models, yielding state-of-the-art results in fields such as image recognition and speech processing. More recently, neural network models started to be applied also to textual natural language signals, again with very promising results. This tutorial surveys neural network models from the perspective of natural language processing research, in an attempt to bring natural-language researchers up to speed with the neural techniques. The tutorial covers input encoding for natural language tasks, feed-forward networks, convolutional networks, recurrent networks and recursive networks, as well as the computation graph abstraction for automatic gradient computation.

References

1. Adel, H., Vu, N. T., & Schultz, T. (2013). Combination of Recurrent Neural Networks and Factored Language Models for Code-Switching Language Modeling. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Pp. 206-211, Sofia, Bulgaria. Association for Computational Linguistics.
2. Rie Kubota Ando, Tong Zhang, A High-performance Semi-supervised Learning Method for Text Chunking, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, p.1-9, June 25-30, 2005, Ann Arbor, Michigan
3. Rie Kubota Ando, Tong Zhang, A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data, The Journal of Machine Learning Research, 6, p.1817-1853, 12/1/2005
4. Auli, M., Galley, M., Quirk, C., & Zweig, G. (2013). Joint Language and Translation Modeling with Recurrent Neural Networks. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Pp. 1044-1054, Seattle, Washington, USA. Association for Computational Linguistics.
5. Auli, M., & Gao, J. (2014). Decoder Integration and Expected BLEU Training for Recurrent Neural Network Language Models. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Pp. 136-142, Baltimore, Maryland. Association for Computational Linguistics.
6. Ballesteros, M., Dyer, C., & Smith, N. A. (2015). Improved Transition-based Parsing by Modeling Characters Instead of Words with LSTMs. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Pp. 349-359, Lisbon, Portugal. Association for Computational Linguistics.
7. Ballesteros, M., Goldberg, Y., Dyer, C., & Smith, N. A. (2016). Training with Exploration Improves a Greedy Stack-LSTM Parser. arXiv:1603.03793 [cs].
8. Bansal, M., Gimpel, K., & Livescu, K. (2014). Tailoring Continuous Word Representations for Dependency Parsing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Pp. 809-815, Baltimore, Maryland. Association for Computational Linguistics.
9. Baydin, A. G., Pearlmutter, B. A., Radul, A. A., & Siskind, J. M. (2015). Automatic Differentiation in Machine Learning: A Survey. arXiv:1502.05767 [cs].
10. Bengio, Y. (2012). Practical Recommendations for Gradient-based Training of Deep Architectures. arXiv:1206.5533 [cs].
11. Yoshua Bengio, Réjean Ducharme, Pascal Vincent, Christian Janvin, A Neural Probabilistic Language Model, The Journal of Machine Learning Research, 3, 3/1/2003
12. Ian Goodfellow, Yoshua Bengio, Aaron Courville, Deep Learning, The MIT Press, 2016
13. Bitvai, Z., & Cohn, T. (2015). Non-Linear Text Regression with a Deep Convolutional Neural Network. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Pp. 180-185, Beijing, China. Association for Computational Linguistics.
14. Jan A. Botha, Phil Blunsom, Compositional Morphology for Word Representations and Language Modelling, Proceedings of the 31st International Conference on International Conference on Machine Learning, June 21-26, 2014, Beijing, China
15. Bottou, L. (2012). Stochastic Gradient Descent Tricks. In Neural Networks: Tricks of the Trade, Pp. 421-436. Springer.
16. Eugene Charniak, Mark Johnson, Coarse-to-fine n-best Parsing and MaxEnt Discriminative Reranking, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, p.173-180, June 25-30, 2005, Ann Arbor, Michigan
17. Chen, D., & Manning, C. (2014). A Fast and Accurate Dependency Parser Using Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Pp. 740-750, Doha, Qatar. Association for Computational Linguistics.
18. Chen, Y., Xu, L., Liu, K., Zeng, D., & Zhao, J. (2015). Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Pp. 167-176, Beijing, China. Association for Computational Linguistics.
19. Cho, K. (2015). Natural Language Understanding with Distributed Representation. arXiv:1511.07916 [cs, Stat].
20. Cho, K., Van Merrienboer, B., Bahdanau, D., & Bengio, Y. (2014a). On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. In Proceedings of SSST- 8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Pp. 103-111, Doha, Qatar. Association for Computational Linguistics.
21. Cho, K., Van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014b). Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Pp. 1724-1734, Doha, Qatar. Association for Computational Linguistics.
22. Chrupala, G. (2014). Normalizing Tweets with Edit Scripts and Recurrent Neural Embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Pp. 680-686, Baltimore, Maryland. Association for Computational Linguistics.
23. Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv:1412.3555 [cs].
24. Michael Collins, Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms, Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, p.1-8, July 06, 2002
25. Michael Collins, Terry Koo, Discriminative Reranking for Natural Language Parsing, Computational Linguistics, v.31 n.1, p.25-70, March 2005
26. Ronan Collobert, Jason Weston, A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning, Proceedings of the 25th International Conference on Machine Learning, p.160-167, July 05-09, 2008, Helsinki, Finland
27. Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, Pavel Kuksa, Natural Language Processing (Almost) from Scratch, The Journal of Machine Learning Research, 12, p.2493-2537, 2/1/2011
28. Koby Crammer, Yoram Singer, On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines, The Journal of Machine Learning Research, 2, 3/1/2002
29. Mathias Creutz, Krista Lagus, Unsupervised Models for Morpheme Segmentation and Morphology Learning, ACM Transactions on Speech and Language Processing (TSLP), v.4 n.1, p.1-34, January 2007
30. Cybenko, G. (1989). Approximation by Superpositions of a Sigmoidal Function. Mathematics of Control, Signals and Systems, 2 (4), 303-314.
31. Dahl, G., Sainath, T., & Hinton, G. (2013). Improving Deep Neural Networks for LVCSR Using Rectified Linear Units and Dropout. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Pp. 8609-8613.
32. Yann N. Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio, Identifying and Attacking the Saddle Point Problem in High-dimensional Non-convex Optimization, Proceedings of the 27th International Conference on Neural Information Processing Systems, p.2933-2941, December 08-13, 2014, Montreal, Canada
33. de Gispert, A., Iglesias, G., & Byrne, B. (2015). Fast and Accurate Preordering for SMT Using Neural Networks. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Pp. 1012-1017, Denver, Colorado. Association for Computational Linguistics.
34. Do, T., Arti, T., & Others (2010). Neural Conditional Random Fields. In International Conference on Artificial Intelligence and Statistics, Pp. 177-184.
35. Dong, L.,Wei, F., Tan, C., Tang, D., Zhou, M., & Xu, K. (2014). Adaptive Recursive Neural Network for Target-dependent Twitter Sentiment Classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Pp. 49-54, Baltimore, Maryland. Association for Computational Linguistics.
36. Dong, L., Wei, F., Zhou, M., & Xu, K. (2015). Question Answering over Freebase with Multi-Column Convolutional Neural Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Pp. 260-269, Beijing, China. Association for Computational Linguistics.
37. dos Santos, C., & Gatti, M. (2014). Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Pp. 69-78, Dublin, Ireland. Dublin City University and Association for Computational Linguistics.
38. dos Santos, C., Xiang, B., & Zhou, B. (2015). Classifying Relations by Ranking with Convolutional Neural Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Pp. 626-634, Beijing, China. Association for Computational Linguistics.
39. Cícero Nogueira Dos Santos, Bianca Zadrozny, Learning Character-level Representations for Part-of-speech Tagging, Proceedings of the 31st International Conference on International Conference on Machine Learning, June 21-26, 2014, Beijing, China
40. John Duchi, Elad Hazan, Yoram Singer, Adaptive Subgradient Methods for Online Learning and Stochastic Optimization, The Journal of Machine Learning Research, 12, p.2121-2159, 2/1/2011
41. Duh, K., Neubig, G., Sudoh, K., & Tsukada, H. (2013). Adaptation Data Selection Using Neural Language Models: Experiments in Machine Translation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Pp. 678-683, Sofia, Bulgaria. Association for Computational Linguistics.
42. Durrett, G., & Klein, D. (2015). Neural CRF Parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Pp. 302- 312, Beijing, China. Association for Computational Linguistics.
43. Dyer, C., Ballesteros, M., Ling, W., Matthews, A., & Smith, N. A. (2015). Transition-Based Dependency Parsing with Stack Long Short-Term Memory. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Pp. 334-343, Beijing, China. Association for Computational Linguistics.
44. Elman, J. L. (1990). Finding Structure in Time. Cognitive Science, 14 (2), 179-211.
45. Faruqui, M., & Dyer, C. (2014). Improving Vector Space Word Representations Using Multilingual Correlation. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Pp. 462-471, Gothenburg, Sweden. Association for Computational Linguistics.
46. Filippova, K., Alfonseca, E., Colmenares, C. A., Kaiser, L., & Vinyals, O. (2015). Sentence Compression by Deletion with LSTMs. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Pp. 360-368, Lisbon, Portugal. Association for Computational Linguistics.
47. Mikel L. Forcada, Ramón P. Ñeco, Recursive Hetero-associative Memories for Translation, Proceedings of the International Work-Conference on Artificial and Natural Neural Networks: Biological and Artificial Computation: From Neuroscience to Technology, p.453-462, June 04-06, 1997
48. Gao, J., Pantel, P., Gamon, M., He, X., & Deng, L. (2014). Modeling Interestingness with Deep Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Pp. 2-13, Doha, Qatar. Association for Computational Linguistics.
49. Giménez, J., & Màrquez, L. (2004). SVMTool: A General POS Tagger Generator based on Support Vector Machines. In Proceedings of the 4th LREC, Lisbon, Portugal.
50. Glorot, X., & Bengio, Y. (2010). Understanding the Difficulty of Training Deep Feedforward Neural Networks. In International Conference on Artificial Intelligence and Statistics, Pp. 249-256.
51. Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep Sparse Rectifier Neural Networks. In International Conference on Artificial Intelligence and Statistics, Pp. 315-323.
52. Yoav Goldberg, Michael Elhadad, An Efficient Algorithm for Easy-first Non-directional Dependency Parsing, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, p.742-750, June 02-04, 2010, Los Angeles, California
53. Goldberg, Y., & Levy, O. (2014). Word2vec Explained: Deriving Mikolov Et Al.'s Negative-sampling Word-embedding Method. arXiv:1402.3722 [cs, Stat].
54. Goldberg, Y., & Nivre, J. (2013). Training Deterministic Parsers with Non-Deterministic Oracles. Transactions of the Association for Computational Linguistics, 1 (0), 403- 414.
55. Goldberg, Y., Zhao, K., & Huang, L. (2013). Efficient Implementation of Beam-Search Incremental Parsers. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Pp. 628-633, Sofia, Bulgaria. Association for Computational Linguistics.
56. Goller, C., & Küchler, A. (1996). Learning Task-Dependent Distributed Representations by Backpropagation Through Structure. In In Proc. of the ICNN-96, Pp. 347-352. IEEE.
57. Stephan Gouws, Yoshua Bengio, Greg Corrado, BilBOWA: Fast Bilingual Distributed Representations Without Word Alignments, Proceedings of the 32nd International Conference on International Conference on Machine Learning, July 06-11, 2015, Lille, France
58. Graves, A. (2008). Supervised Sequence Labelling with Recurrent Neural Networks. Ph.D. Thesis, Technische Universität München.
59. Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., & Schmidhuber, J. (2015). LSTM: A Search Space Odyssey. arXiv:1503.04069 [cs].
60. Hal Daumé, Iii, John Langford, Daniel Marcu, Search-based Structured Prediction, Machine Learning, v.75 n.3, p.297-325, June 2009
61. Harris, Z. (1954). Distributional Structure. Word, 10 (23), 146-162.
62. Hashimoto, K., Miwa, M., Tsuruoka, Y., & Chikayama, T. (2013). Simple Customization of Recursive Neural Networks for Semantic Relation Classification. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Pp. 1372-1376, Seattle, Washington, USA. Association for Computational Linguistics.
63. He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving Deep Into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. arXiv:1502.01852 [cs].
64. Henderson, M., Thomson, B., & Young, S. (2013). Deep Neural Network Approach for the Dialog State Tracking Challenge. In Proceedings of the SIGDIAL 2013 Conference, Pp. 467-471, Metz, France. Association for Computational Linguistics.
65. Hermann, K. M., & Blunsom, P. (2013). The Role of Syntax in Vector Space Models of Compositional Semantics. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Pp. 894-904, Sofia, Bulgaria. Association for Computational Linguistics.
66. Hermann, K. M., & Blunsom, P. (2014). Multilingual Models for Compositional Distributed Semantics. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Pp. 58-68, Baltimore, Maryland. Association for Computational Linguistics.
67. Salah El Hihi, Yoshua Bengio, Hierarchical Recurrent Neural Networks for Long-term Dependencies, Proceedings of the 8th International Conference on Neural Information Processing Systems, p.493-499, November 27-December 02, 1995, Denver, Colorado
68. Hill, F., Cho, K., Jean, S., Devin, C., & Bengio, Y. (2014). Embedding Word Similarity with Neural Machine Translation. arXiv:1412.6448 [cs].
69. Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving Neural Networks by Preventing Co-adaptation of Feature Detectors. arXiv:1207.0580 [cs].
70. Sepp Hochreiter, Jürgen Schmidhuber, Long Short-Term Memory, Neural Computation, v.9 n.8, p.1735-1780, November 15, 1997
71. K. Hornik, M. Stinchcombe, H. White, Multilayer Feedforward Networks Are Universal Approximators, Neural Networks, v.2 n.5, p.359-366, 1989
72. Ioffe, S., & Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv:1502.03167 [cs].
73. Irsoy, O., & Cardie, C. (2014). Opinion Mining with Deep Recurrent Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Pp. 720-728, Doha, Qatar. Association for Computational Linguistics.
74. Iyyer, M., Boyd-Graber, J., Claudino, L., Socher, R., & Daumé III, H. (2014a). A Neural Network for Factoid Question Answering over Paragraphs. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Pp. 633-644, Doha, Qatar. Association for Computational Linguistics.
75. Iyyer, M., Enns, P., Boyd-Graber, J., & Resnik, P. (2014b). Political Ideology Detection Using Recursive Neural Networks. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Pp. 1113-1122, Baltimore, Maryland. Association for Computational Linguistics.
76. Iyyer, M., Manjunatha, V., Boyd-Graber, J., & Daumé III, H. (2015). Deep Unordered Composition Rivals Syntactic Methods for Text Classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Pp. 1681-1691, Beijing, China. Association for Computational Linguistics.
77. Johnson, R., & Zhang, T. (2015). Effective Use of Word Order for Text Categorization with Convolutional Neural Networks. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Pp. 103-112, Denver, Colorado. Association for Computational Linguistics.
78. Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., & Wu, Y. (2016). Exploring the Limits of Language Modeling. arXiv:1602.02410 [cs].
79. Rafal Jozefowicz, Wojciech Zaremba, Ilya Sutskever, An Empirical Exploration of Recurrent Network Architectures, Proceedings of the 32nd International Conference on International Conference on Machine Learning, July 06-11, 2015, Lille, France
80. Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (2014). A Convolutional Neural Network for Modelling Sentences. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Pp. 655-665, Baltimore, Maryland. Association for Computational Linguistics.
81. Karpathy, A., Johnson, J., & Li, F.-F. (2015). Visualizing and Understanding Recurrent Networks. arXiv:1506.02078 [cs].
82. Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Pp. 1746-1751, Doha, Qatar. Association for Computational Linguistics.
83. Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush, Character-aware Neural Language Models, Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona
84. Kingma, D., & Ba, J. (2014). Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs].
85. Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, Proceedings of the 25th International Conference on Neural Information Processing Systems, p.1097-1105, December 03-06, 2012, Lake Tahoe, Nevada
86. Taku Kudo, Yuji Matsumoto, Fast Methods for Kernel-based Text Analysis, Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, p.24-31, July 07-12, 2003, Sapporo, Japan
87. John D. Lafferty, Andrew McCallum, Fernando C. N. Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, Proceedings of the Eighteenth International Conference on Machine Learning, p.282-289, June 28-July 01, 2001
88. Le, P., & Zuidema, W. (2014). The Inside-Outside Recursive Neural Network Model for Dependency Parsing. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Pp. 729-739, Doha, Qatar. Association for Computational Linguistics.
89. Le, P., & Zuidema, W. (2015). The Forest Convolutional Network: Compositional Distributional Semantics with a Neural Chart and Without Binarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Pp. 1155-1164, Lisbon, Portugal. Association for Computational Linguistics.
90. Le, Q. V., Jaitly, N., & Hinton, G. E. (2015). A Simple Way to Initialize Recurrent Networks of Rectified Linear Units. arXiv:1504.00941 [cs].
91. Yann LeCun, Yoshua Bengio, Convolutional Networks for Images, Speech, and Time Series, The Handbook of Brain Theory and Neural Networks, MIT Press, Cambridge, MA, 1998
92. Yann LeCun, Léon Bottou, Genevieve B. Orr, Klaus-Robert Müller, Efficient BackProp, Neural Networks: Tricks of the Trade, This Book is An Outgrowth of a 1996 NIPS Workshop, p.9-50, January 1998
93. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998b). Gradient Based Learning Applied to Pattern Recognition. Proceedings of the IEEE, 86 (11), 2278-2324.
94. LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., & Huang, F. (2006). A Tutorial on Energy-based Learning. Predicting Structured Data, 1, 0.
95. LeCun, Y., & Huang, F. (2005). Loss Functions for Discriminative Training of Energy-based Models. In Proceedings of AISTATS. AIStats.
96. Lee, G., Flowers, M., & Dyer, M. G. (1992). Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing. In Connectionist Natural Language Processing, Pp. 215-247. Springer.
97. Levy, O., & Goldberg, Y. (2014a). Dependency-Based Word Embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Pp. 302-308, Baltimore, Maryland. Association for Computational Linguistics.
98. Omer Levy, Yoav Goldberg, Neural Word Embedding As Implicit Matrix Factorization, Proceedings of the 27th International Conference on Neural Information Processing Systems, p.2177-2185, December 08-13, 2014, Montreal, Canada
99. Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving Distributional Similarity with Lessons Learned from Word Embeddings. Transactions of the Association for Computational Linguistics, 3 (0), 211-225.
100. Lewis, M., & Steedman, M. (2014). Improved CCG Parsing with Semi-supervised Supertagging. Transactions of the Association for Computational Linguistics, 2 (0), 327-338.
101. Li, J., Li, R., & Hovy, E. (2014). Recursive Deep Models for Discourse Parsing. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Pp. 2061-2069, Doha, Qatar. Association for Computational Linguistics.
102. Ling, W., Dyer, C., Black, A. W., & Trancoso, I. (2015a). Two/Too Simple Adaptations of Word2Vec for Syntax Problems. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Pp. 1299-1304, Denver, Colorado. Association for Computational Linguistics.
103. Ling, W., Dyer, C., Black, A. W., Trancoso, I., Fermandez, R., Amir, S., Marujo, L., & Luis, T. (2015b). Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Pp. 1520-1530, Lisbon, Portugal. Association for Computational Linguistics.
104. Liu, Y., Wei, F., Li, S., Ji, H., Zhou, M., & Wang, H. (2015). A Dependency-Based Neural Network for Relation Classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Pp. 285-290, Beijing, China. Association for Computational Linguistics.
105. Luong, M.-T., Le, Q. V., Sutskever, I., Vinyals, O., & Kaiser, L. (2015). Multi-task Sequence to Sequence Learning. arXiv:1511.06114 [cs, Stat].
106. Ma, J., Zhang, Y., & Zhu, J. (2014). Tagging The Web: Building A Robust Web Tagger with Neural Network. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Pp. 144-154, Baltimore, Maryland. Association for Computational Linguistics.
107. Ma, M., Huang, L., Zhou, B., & Xiang, B. (2015). Dependency-based Convolutional Neural Networks for Sentence Embedding. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Pp. 174-179, Beijing, China. Association for Computational Linguistics.
108. Andrew McCallum, Dayne Freitag, Fernando C. N. Pereira, Maximum Entropy Markov Models for Information Extraction and Segmentation, Proceedings of the Seventeenth International Conference on Machine Learning, p.591-598, June 29-July 02, 2000
109. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs].
110. Mikolov, T., Joulin, A., Chopra, S., Mathieu, M., & Ranzato, M. (2014). Learning Longer Memory in Recurrent Neural Networks. arXiv:1412.7753 [cs].
111. Mikolov, T., Karafiát, M., Burget, L., Cernocky, J., & Khudanpur, S. (2010). Recurrent Neural Network based Language Model. In INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010, Pp. 1045-1048.
112. Mikolov, T., Kombrink, S., Lukáš Burget, Černocky, J. H., & Khudanpur, S. (2011). Extensions of Recurrent Neural Network Language Model. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, Pp. 5528-5531. IEEE.
113. Tomáš Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean, Distributed Representations of Words and Phrases and their Compositionality, Proceedings of the 26th International Conference on Neural Information Processing Systems, p.3111-3119, December 05-10, 2013, Lake Tahoe, Nevada
114. Mikolov, T. (2012). Statistical Language Models based on Neural Networks. Ph.D. Thesis, Ph. D. Thesis, Brno University of Technology.
115. Andriy Mnih, Koray Kavukcuoglu, Learning Word Embeddings Efficiently with Noise-contrastive Estimation, Proceedings of the 26th International Conference on Neural Information Processing Systems, p.2265-2273, December 05-10, 2013, Lake Tahoe, Nevada
116. Mrkšic, N., Ó Séaghdha, D., Thomson, B., Gasic, M., Su, P.-H., Vandyke, D., Wen, T.-H., & Young, S. (2015). Multi-domain Dialog State Tracking Using Recurrent Neural Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Pp. 794-799, Beijing, China. Association for Computational Linguistics.
117. Richard D. Neidinger, Introduction to Automatic Differentiation and MATLAB Object-Oriented Programming, SIAM Review, v.52 n.3, p.545-563, August 2010
118. Nesterov, Y. (1983). A Method of Solving a Convex Programming Problem with Convergence Rate O (1/k2). In Soviet Mathematics Doklady, Vol. 27, Pp. 372-376.
119. Yurii Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Springer Publishing Company, Incorporated, 2014
120. Nguyen, T. H., & Grishman, R. (2015). Event Detection and Domain Adaptation with Convolutional Neural Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Pp. 365-371, Beijing, China. Association for Computational Linguistics.
121. Joakim Nivre, Algorithms for Deterministic Incremental Dependency Parsing, Computational Linguistics, v.34 n.4, p.513-553, December 2008
122. Chris Okasaki, Purely Functional Data Structures, Cambridge University Press, New York, NY, 1999
123. Olah, C. (2015a). Calculus on Computational Graphs: Backpropagation. Retrieved from http://colah.github.io/posts/2015-08-Backprop/.
124. Olah, C. (2015b). Understanding LSTM Networks. Retrieved from http://colah.github.io/posts/2015-08-Understanding-LSTMs/.
125. Pascanu, R., Mikolov, T., & Bengio, Y. (2012). On the Difficulty of Training Recurrent Neural Networks. arXiv:1211.5063 [cs].
126. Pei, W., Ge, T., & Chang, B. (2015). An Effective Neural Network Model for Graph-based Dependency Parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Pp. 313-322, Beijing, China. Association for Computational Linguistics.
127. (Peng et al., 2009) ⇒ Jian Peng, Liefeng Bo, and Jinbo Xu. (2009). “Conditional Neural Fields.” In: Proceedings of the 22nd International Conference on Neural Information Processing Systems. ISBN:978-1-61567-911-9
128. Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Pp. 1532-1543, Doha, Qatar. Association for Computational Linguistics.
129. J. B. Pollack, Recursive Distributed Representations, Artificial Intelligence, v.46 n.1-2, p.77-105, Nov. 1990
130. Polyak, B. T. (1964). Some Methods of Speeding Up the Convergence of Iteration Methods. USSR Computational Mathematics and Mathematical Physics, 4 (5), 1-17.
131. Qian, Q., Tian, B., Huang, M., Liu, Y., Zhu, X., & Zhu, X. (2015). Learning Tag Embeddings and Tag-specific Composition Functions in Recursive Neural Network. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Pp. 1365-1374, Beijing, China. Association for Computational Linguistics.
132. Rong, X. (2014). Word2vec Parameter Learning Explained. arXiv:1411.2738 [cs].
133. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning Representations by Back-propagating Errors. Nature, 323 (6088), 533-536.
134. M. Schuster, K.K. Paliwal, Bidirectional Recurrent Neural Networks, IEEE Transactions on Signal Processing, v.45 n.11, p.2673-2681, November 1997
135. Holger Schwenk, Daniel Dchelotte, Jean-Luc Gauvain, Continuous Space Language Models for Statistical Machine Translation, Proceedings of the COLING/ACL on Main Conference Poster Sessions, p.723-730, July 17-18, 2006, Sydney, Australia
136. John Shawe-Taylor, Nello Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press, New York, NY, 2004
137. Noah A. Smith, Linguistic Structure Prediction, Morgan & Claypool Publishers, 2011
138. Socher, R. (2014). Recursive Deep Learning For Natural Language Processing and Computer Vision. Ph.D. Thesis, Stanford University.
139. Socher, R., Bauer, J., Manning, C. D., & Ng, A. Y. (2013). Parsing with Compositional Vector Grammars. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Pp. 455-465, Sofia, Bulgaria. Association for Computational Linguistics.
140. Richard Socher, Brody Huval, Christopher D. Manning, Andrew Y. Ng, Semantic Compositionality through Recursive Matrix-vector Spaces, Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, July 12-14, 2012, Jeju Island, Korea
141. Richard Socher, Cliff Chiung-Yu Lin, Andrew Y. Ng, Christopher D. Manning, Parsing Natural Scenes and Natural Language with Recursive Neural Networks, Proceedings of the 28th International Conference on International Conference on Machine Learning, p.129-136, June 28-July 02, 2011, Bellevue, Washington, USA
142. Socher, R., Manning, C., & Ng, A. (2010). Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks. In Proceedings of the Deep Learning and Unsupervised Feature Learning Workshop of {NIPS} 2010, Pp. 1-9.
143. Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., & Potts, C. (2013). Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Pp. 1631-1642, Seattle, Washington, USA. Association for Computational Linguistics.
144. Søgaard, A., & Goldberg, Y. (2016). Deep Multi-task Learning with Low Level Tasks Supervised at Lower Layers. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Pp. 231-235. Association for Computational Linguistics.
145. Sordoni, A., Galley, M., Auli, M., Brockett, C., Ji, Y., Mitchell, M., Nie, J.-Y., Gao, J., & Dolan, B. (2015). A Neural Network Approach to Context-Sensitive Generation of Conversational Responses. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Pp. 196-205, Denver, Colorado. Association for Computational Linguistics.
146. Sundermeyer, M., Alkhouli, T., Wuebker, J., & Ney, H. (2014). Translation Modeling with Bidirectional Recurrent Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Pp. 14-25, Doha, Qatar. Association for Computational Linguistics.
147. Sundermeyer, M., Schlüter, R., & Ney, H. (2012). LSTM Neural Networks for Language Modeling.. In INTERSPEECH.
148. Ilya Sutskever, James Martens, George Dahl, Geoffrey Hinton, On the Importance of Initialization and Momentum in Deep Learning, Proceedings of the 30th International Conference on International Conference on Machine Learning, June 16-21, 2013, Atlanta, GA, USA
149. Ilya Sutskever, James Martens, Geoffrey Hinton, Generating Text with Recurrent Neural Networks, Proceedings of the 28th International Conference on International Conference on Machine Learning, p.1017-1024, June 28-July 02, 2011, Bellevue, Washington, USA
150. Ilya Sutskever, Oriol Vinyals, Quoc V. Le, Sequence to Sequence Learning with Neural Networks, Proceedings of the 27th International Conference on Neural Information Processing Systems, p.3104-3112, December 08-13, 2014, Montreal, Canada
151. Tai, K. S., Socher, R., & Manning, C. D. (2015). Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Pp. 1556-1566, Beijing, China. Association for Computational Linguistics.
152. Tamura, A., Watanabe, T., & Sumita, E. (2014). Recurrent Neural Networks for Word Alignment Model. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Pp. 1470-1480, Baltimore, Maryland. Association for Computational Linguistics.
153. Telgarsky, M. (2016). Benefits of Depth in Neural Networks. arXiv:1602.04485 [cs, Stat].
154. Tieleman, T., & Hinton, G. (2012). Lecture 6.5--RmsProp: Divide the Gradient by a Running Average of Its Recent Magnitude. COURSERA: Neural Networks for Machine Learning.
155. Van De Cruys, T. (2014). A Neural Network Approach to Selectional Preference Acquisition. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Pp. 26-35, Doha, Qatar. Association for Computational Linguistics.
156. Vaswani, A., Zhao, Y., Fossum, V., & Chiang, D. (2013). Decoding with Large-Scale Neural Language Models Improves Translation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Pp. 1387-1392, Seattle, Washington, USA. Association for Computational Linguistics.
157. Stefan Wager, Sida Wang, Percy Liang, Dropout Training As Adaptive Regularization, Proceedings of the 26th International Conference on Neural Information Processing Systems, p.351-359, December 05-10, 2013, Lake Tahoe, Nevada
158. Wang, M., & Manning, C. D. (2013). Effect of Non-linear Deep Architecture in Sequence Labeling.. In IJCNLP, Pp. 1285-1291.
159. Wang, P., Xu, J., Xu, B., Liu, C., Zhang, H., Wang, F., & Hao, H. (2015a). Semantic Clustering and Convolutional Neural Network for Short Text Categorization. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Pp. 352-357, Beijing, China. Association for Computational Linguistics.
160. Wang, X., Liu, Y., Sun, C., Wang, B., & Wang, X. (2015b). Predicting Polarities of Tweets by Composing Word Embeddings with Long Short-Term Memory. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Pp. 1343-1353, Beijing, China. Association for Computational Linguistics.
161. Watanabe, T., & Sumita, E. (2015). Transition-based Neural Constituent Parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Pp. 1169-1179, Beijing, China. Association for Computational Linguistics.
162. Weiss, D., Alberti, C., Collins, M., & Petrov, S. (2015). Structured Training for Neural Network Transition-Based Parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Pp. 323-333, Beijing, China. Association for Computational Linguistics.
163. Werbos, P. J. (1990). Backpropagation through Time: What It Does and how to Do It. Proceedings of the IEEE, 78 (10), 1550-1560.
164. Weston, J., Bordes, A., Yakhnenko, O., & Usunier, N. (2013). Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Pp. 1366-1371, Seattle, Washington, USA. Association for Computational Linguistics.
165. Xu, W., Auli, M., & Clark, S. (2015). CCG Supertagging with a Recurrent Neural Network. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Pp. 250-255, Beijing, China. Association for Computational Linguistics.
166. Yin, W., & Schütze, H. (2015). Convolutional Neural Network for Paraphrase Identification. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Pp. 901-911, Denver, Colorado. Association for Computational Linguistics.
167. Zaremba, W., Sutskever, I., & Vinyals, O. (2014). Recurrent Neural Network Regularization. arXiv:1409.2329 [cs].
168. Zeiler, M. D. (2012). ADADELTA: An Adaptive Learning Rate Method. arXiv:1212.5701 [cs].
169. Zeng, D., Liu, K., Lai, S., Zhou, G., & Zhao, J. (2014). Relation Classification via Convolutional Deep Neural Network. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Pp. 2335-2344, Dublin, Ireland. Dublin City University and Association for Computational Linguistics.
170. Zhang, Y., & Weiss, D. (2016). Stack-propagation: Improved Representation Learning for Syntax. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Pp. 1557-1566. Association for Computational Linguistics.
171. Zhou, H., Zhang, Y., Huang, S., & Chen, J. (2015). A Neural Probabilistic Structured-Prediction Model for Transition-Based Dependency Parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Pp. 1213-1222, Beijing, China. Association for Computational Linguistics.
172. Zhu, C., Qiu, X., Chen, X., & Huang, X. (2015a). A Re-ranking Model for Dependency Parser with Recursive Convolutional Neural Network. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Pp. 1159-1168, Beijing, China. Association for Computational Linguistics.
173. Xiaodan Zhu, Parinaz Sobhani, Hongyu Guo, Long Short-term Memory over Recursive Structures, Proceedings of the 32nd International Conference on International Conference on Machine Learning, July 06-11, 2015, Lille, France

}};

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2016 APrimeronNeuralNetworkModelsfor	Yoav Goldberg			A Primer on Neural Network Models for Natural Language Processing						2016

2016 APrimeronNeuralNetworkModelsfor

Notes

Cited By

Quotes

Abstract

References

Navigation menu

Search