2017 LanguageModelingwithGatedConvol

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Gated Linear Unit (GLU), Neural Language Modeling.

Notes

Cited By

2020

Quotes

Abstract

The pre-dominant approach to language modeling to date is based on recurrent neural networks. Their success on this task is often linked to their ability to capture unbounded context. In this paper we develop a finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens. We propose a novel simplified gating mechanism that outperforms Oord et al (2016) and investigate the impact of key architectural decisions. The proposed approach achieves state-of-the-art on the WikiText-103 benchmark, even though it features long-term dependencies, as well as competitive results on the Google Billion Words benchmark. Our model reduces the latency to score a sentence by an order of magnitude compared to a recurrent baseline. To our knowledge, this is the first time a non-recurrent approach is competitive with strong recurrent models on these large scale language tasks.

...

6. Conclusion

We introduce a convolutional neural network for language modeling with a novel gating mechanism. Compared to recurrent neural networks, our approach builds a hierarchical representation of the input words that makes it easier to capture long-range dependencies, similar in spirit to the tree-structured analysis of linguistic grammar formalisms. The same property eases learning since features are passed through a fixed number of layers and non-linearities, unlike for recurrent networks where the number of processing steps differs depending on the position of the word in the input. The results show that our gated convolutional network achieves a new state of the art on WikiText-103. On the Google Billion Word benchmark, we show competitive results can be achieved with significantly fewer resources.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2017 LanguageModelingwithGatedConvolYann N. Dauphin
Angela Fan
Michael Auli
David Grangier
Language Modeling with Gated Convolutional Networks2017