2014 LearningPhraseRepresentationsUs

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Gated Recurrent Unit (GRU), Encoder-Decoder GRU+Attention-based RNN, Sequence-to-Sequence Learning Task.

Notes

Cited By

2017

2017b

2015

Quotes

Abstract

In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder-Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.

1. Introduction

Deep neural networks have shown great success in various applications such as objection recognition (see, e.g., (Krizhevsky et al., 2012)) and speech recognition (see, e.g., (Dahl et al., 2012)). Furthermore, many recent works showed that neural networks can be successfully used in a number of tasks in natural language processing (NLP). These include, but are not limited to, language modeling (Bengio et al., 2003), paraphrase detection (Socher et al., 2011) and word embedding extraction (Mikolov et al., 2013). In the field of statistical machine translation (SMT), deep neural networks have begun to show promising results. (Schwenk, 2012) summarizes a successful usage of feedforward neural networks in the framework of phrase-based SMT system.

Along this line of research on using neural networks for SMT, this paper focuses on a novel neural network architecture that can be used as a part of the conventional phrase-based SMT system. The proposed neural network architecture, which we will refer to as an RNN Encoder–Decoder, consists of two recurrent neural networks (RNN) that act as an encoder and a decoder pair. The encoder maps a variable-length source sequence to a fixed-length vector, and the decoder maps the vector representation back to a variable-length target sequence. The two networks are trained jointly to maximize the conditional probability of the target sequence given a source sequence. Additionally, we propose to use a rather sophisticated hidden unit in order to improve both the memory capacity and the ease of training.

The proposed RNN Encoder–Decoder with a novel hidden unit is empirically evaluated on the task of translating from English to French. We train the model to learn the translation probability of an English phrase to a corresponding French phrase. The model is then used as a part of a standard phrase-based SMT system by scoring each phrase pair in the phrase table. The empirical evaluation reveals that this approach of scoring phrase pairs with an RNN Encoder–Decoder improves the translation performance.

We qualitatively analyze the trained RNN Encoder–Decoder by comparing its phrase scores with those given by the existing translation model. The qualitative analysis shows that the RNN Encoder–Decoder is better at capturing the linguistic regularities in the phrase table, indirectly explaining the quantitative improvements in the overall translation performance. The further analysis of the model reveals that the RNN Encoder–Decoder learns a continuous space representation of a phrase that preserves both the semantic and syntactic structure of the phrase.

References

BibTeX

@inproceedings{2014_LearningPhraseRepresentationsUs,
  author    = {Kyunghyun Cho and
               Bart van Merrienboer and
               {\c{C}}aglar G{\"{u}}l{\c{c}}ehre and
               [[Dzmitry Bahdanau]] and
               Fethi Bougares and
               Holger Schwenk and
               [[Yoshua Bengio]]},
  editor    = {Alessandro Moschitti and
               Bo Pang and
               Walter Daelemans},
  title     = {Learning Phrase Representations using {RNN} Encoder-Decoder for Statistical
               Machine Translation},
  booktitle = {Proceedings of the 2014 Conference on Empirical Methods in Natural
               Language Processing (EMNLP 2014), October 25-29, 2014, Doha, Qatar,
               A meeting of SIGDAT, a Special Interest Group of the ACL},
  pages     = {1724--1734},
  publisher = {ACL},
  year      = {2014},
  url       = {https://www.aclweb.org/anthology/D14-1179.pdf},
  doi       = {10.3115/v1/d14-1179},
}


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2014 LearningPhraseRepresentationsUsYoshua Bengio
Kyunghyun Cho
Bart van Merrienboer
Dzmitry Bahdanau
Fethi Bougares
Holger Schwenk
Caglar Gulcehre
Learning Phrase Representations Using {RNN} Encoder-Decoder for Statistical Machine Translation2014