2015 TextUnderstandingfromScratch

Jump to navigation Jump to search

Subject Headings: Char-level CNN.


Cited By



This article demontrates that we can apply deep learning to text understanding from character-level inputs all the way up to abstract text concepts, using temporal convolutional networks (ConvNets). We apply ConvNets to various large-scale datasets, including ontology classification, sentiment analysis, and text categorization. We show that temporal ConvNets can achieve astonishing performance without the knowledge of words, phrases, sentences and any other syntactic or semantic structures with regards to a human language. Evidence shows that our models can work for both English and Chinese.

1. Introduction

Text understanding consists in reading texts formed in natural languages, determining the explicit or implicit meaning of each elements such as words, phrases, sentences and paragraphs, and making inferences about the implicit or explicit properties of these texts (Norvig, 1987). This problem has been traditionally difficult because of the extreme variability in language formation (Linell, 1982). To date, most ways to handle text understanding, be it a handcrafted parsing program or a statistically learnt model, have been resorted to the means of matching words statistics.

So far, most machine learning approaches to text understanding consist in tokenizing a string of characters into structures such as words, phrases, sentences and paragraphs, and then apply some statistical classification algorithm onto the statistics of such structures (Soderland, 2001). These techniques work well enough when applied to a narrowly defined domain, but the prior knowledge required is not cheap – they need to pre-define a dictionary of interested words, and the structural parser needs to handle many special variations such as word morphological changes and ambiguous chunking. These requirements make text understanding more or less specialized to a particular language – if the language is changed, many things must be engineered from scratch.

With the advancement of deep learning and availability of large datasets, methods of handling text understanding using deep learning techniques have gradually become available. One technique which draws great interests is word2vec (Mikolov et al., 2013b). Inspired by traditional language models, this technique constructs representation of words into a vector of fixed length trained under a large corpus. Based on the hope that machines may make sense of languages in a formal fashion, many researchers have tried to train a neural network for understanding texts based the features extracted from it or similar techniques, to name a few, (Frome et al., 2013) (Gao et al., 2013) (Le & Mikolov, 2014) (Mikolov et al., 2013a) (Pennington et al., 2014). Most of these techniques try to apply word2vec or similar techniques with an engineered language model.

On the other hand, some researchers have also tried to train a neural network from word level with little structural engineering (Collobert et al., 2011b) (Kim, 2014) (Johnson & Zhang, 2014) (dos Santos & Gatti, 2014). In these works, a word level feature extractor such as lookup table (Collobert et al., 2011b) or word2vec (Mikolov et al., 2013b) is used to feed a temporal ConvNet (LeCun et al., 1998). After training, ConvNets worked for both structured prediction tasks such as part-of-speech tagging and named entity recognition, and text understanding tasks such as sentiment analysisand sentence classification. They claim good results for various tasks, but the datasets and models are relatively small and there are still some engineered layers to represent structures such as words, phrases and sentences.

In this article we show that text understanding can be handled by a deep learning system without artificially imbedding knowledge about words, phrases, sentences or any other syntactic or semantic structures associated with a language. We apply temporal ConvNets (LeCun et al., 1998) to various large-scale text understanding tasks, in which the inputs are quantized characters and the outputs are abstract properties of the text. Our approach is one that ‘learns from scratch’, in the following 2 senses:

  1. . ConvNets do not require knowledge of words – working with characters is fine. This renders a word-based feature extractor (such as LookupTable (Collobert et al., 2011b) or word2vec (Mikolov et al., 2013b)) unnecessary. All previous works start with words instead of characters, which is difficult to apply a convolutional layer directly due to its high dimensionality.
  2. . ConvNets do not require knowledge of syntax or semantic structures – inference directly to high-level targets is fine. This also invalidates the assumption that structured predictions and language models are necessary for high-level text understanding.

Our approach is partly inspired by ConvNet’s success in computer vision. It has outstanding performance in various image recognition tasks (Girshick et al., 2013) (Krizhevsky et al., 2012) (Sermanet et al., 2013). These successful results usually involve some end-to-end ConvNet model that learns hierarchical representation from raw pixels (Girshick et al., 2013) (Zeiler & Fergus, 2014). Similarly, we hypothesize that when trained from raw characters, temporal ConvNet is able to learn the hierarchical representations of words, phrases and sentences in order to understand text.

2. ConvNet Model Design

In this section, we introduce the design of ConvNets for text understanding. The design is modular, where the gradients are obtained by back-propagation (Rumelhart et al., 1986) to perform optimization.

2.1 Key Modules



 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2015 TextUnderstandingfromScratchXiang Zhang
Yann LeCun
Text Understanding from Scratch