2013 LinguisticRegularitiesinContinu

Jump to navigation Jump to search

Subject Headings: Word Embedding Task, Analogy Recovery Task.


Cited By



Continuous space language models have recently demonstrated outstanding results across a variety of tasks. In this paper, we examine the vector-space word representations that are implicitly learned by the input-layer weights. We find that these representations are surprisingly good at capturing syntactic and semantic regularities in language, and that each relationship is characterized by a relation-specific vector offset. This allows vector-oriented reasoning based on the offsets between words. For example, the male / female relationship is automatically learned, and with the induced vector representations, "King - Man + Woman" results in a vector very close to "Queen." We demonstrate that the word vectors capture syntactic regularities by means of syntactic analogy questions (provided with this paper), and are able to correctly answer almost 40% of the questions. We demonstrate that the word vectors capture semantic regularities by using the vector offset method to answer SemEval-2012 Task 2 questions. Remarkably, this method outperforms the best previous systems.


  • Y. Bengio, R. Ducharme, Vincent, P., and C. Jauvin. 2003. A neural probabilistic language model. Journal of Machine Learning Reseach, 3(6).
  • Y. Bengio, H. Schwenk, J.S. Senécal, F. Morin, and J.L. Gauvain. 2006. Neural probabilistic language models. Innovations in Machine Learning, pages 137–186.
  • A. Bordes, X. Glorot, J. Weston, and Y. Bengio. 2012. Joint learning of words and meaning representations for open-text semantic parsing. In: Proceedings of 15th International Conference on Artificial Intelligence and Statistics.
  • R. Collobert and J. Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine learning, pages 160–167. ACM.
  • S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, and R. Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(96).
  • J.L. Elman. 1991. Distributed representations, simple recurrent networks, and grammatical structure. Machine learning, 7(2):195–225.
  • G.E. Hinton and R.R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507.
  • G. Hinton and R. Salakhutdinov. 2010. Discovering binary codes for documents by learning deep generative models. Topics in Cognitive Science, 3(1):74–91.
  • G.E. Hinton. 1986. Learning distributed representations of concepts. In: Proceedings of the eighth annual conference of the cognitive science society, pages 1–12.
  • David Jurgens, Saif Mohammad, Peter Turney, and Keith Holyoak. 2012. Semeval-2012 task 2: Measuring degrees of relational similarity. In *SEM 2012: The First Joint Conference on Lexical and Computational Semantics (SemEval 2012), pages 356–364. Association for Computational Linguistics.
  • Hai-Son Le, I. Oparin, A. Allauzen, J.-L. Gauvain, and F. Yvon. 2011. Structured output layer neural network language model. In: Proceedings of ICASSP 2011.
  • Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a large annotated corpus of english: the penn treebank. Computational Linguistics, 19(2):313–330.
  • Tomáš Mikolov, Martin Karafiat, Jan Cernocky, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In: Proceedings of Interspeech 2010.
  • Tomáš Mikolov, Anoop Deoras, Daniel Povey, Lukas Burget, and Jan Cernocky. 2011a. Strategies for Training Large Scale Neural Network Language Models. In: Proceedings of ASRU 2011.
  • Tomáš Mikolov, Stefan Kombrink, Lukas Burget, Jan Cernocky, and Sanjeev Khudanpur. 2011b. Extensions of recurrent neural network based language model. In: Proceedings of ICASSP 2011.
  • Tomáš Mikolov. 2012. RNN toolkit. A. Mnih and G.E. Hinton. 2009. A scalable hierarchical distributed language model. Advances in neural information processing systems, 21:1081–1088.
  • F. Morin and Y. Bengio. 2005. Hierarchical probabilistic neural network language model. In: Proceedings of the international workshop on artificial intelligence and statistics, pages 246–252.
  • J.B. Pollack. 1990. Recursive distributed representations. Artificial Intelligence, 46(1):77–105.
  • Bryan Rink and Sanda Harabagiu. 2012. UTD: Determining relational similarity using lexical patterns. In *SEM 2012: The First Joint Conference on Lexical and Computational Semantics (SemEval 2012), pages 413–418. Association for Computational Linguistics.
  • Holger Schwenk. 2007. Continuous space language models. Computer Speech and Language, 21(3):492 – 518.
  • J. Turian, L. Ratinov, and Y. Bengio. 2010. Word representations: a simple and general method for semisupervised learning. In: Proceedings of Association for Computational Linguistics (ACL 2010).
  • P.D. Turney. 2012. Domain and function: A dual-space model of semantic relations and compositions. Journal of Artificial Intelligence Research, 44:533–585.


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2013 LinguisticRegularitiesinContinuWen-tau Yih
Geoffrey Zweig
Tomáš Mikolov
Linguistic Regularities in Continuous Space Word Representations.2013