2015 TheNLPEngineAUniversalTuringMac

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Shannon Entropy.

Notes

Cited By

Quotes

Abstract

It is commonly accepted that machine translation is a more complex task than part of speech tagging. But how much more complex? In this paper we make an attempt to develop a general framework and methodology for computing the informational and/or processing complexity of NLP applications and tasks. We define a universal framework akin to a Turning Machine that attempts to fit (most) NLP tasks into one paradigm. We calculate the complexities of various NLP tasks using measures of Shannon Entropy, and compare `simple' ones such as part of speech tagging to `complex' ones such as machine translation. This paper provides a first, though far from perfect, attempt to quantify NLP tasks under a uniform paradigm. We point out current deficiencies and suggest some avenues for fruitful research.

1. Introduction

The purpose of this paper is to suggest a unified framework in which modern NLP research can quantitatively describe and compare NLP tasks. Even though everyone agrees that some NLP tasks are more complex than others, e.g., machine translation is ‘harder’ than syntactic parsing, which in turn is ‘harder’ than part-of-speech tagging, we cannot compute the relative complexities of different NLP tasks and subtasks.

In the typical current NLP paradigm, researchers apply several machine learning algorithms to a problem, report on their performance levels, and establish the winner as setting the level to beat in the future. We have no single overall model of NLP that subsumes and regularizes its various tasks. If you were to ask NLP researchers today they would say that no such model is possible, and that NLP is a collection of several semiindependent research directions that all focus on language and mostly use machine learning techniques. Researchers will tell you that a good summarization system on DUC / TAC dataset obtains a ROUGE score of 0.40, a good French-English translation system achieves a BLUE score of 37.0, 20-news classifiers can achieve accuracy of 0.85, and named entity recognition systems a recall of 0.95, and these numbers are not comparable. Further, we usually pay little attention to additional important factors such as the performance curve with respect to the amount of training data, the amount of preprocessing required, the size and complexity of auxiliary information required, etc. And even when some studies do report such numbers, in NLP we don’t know how to characterize these aspects in general and across applications, how to quantify them in relationship to each other.

We here describe our first attempt to develop a single generic high-level model of NLP. We adopt the model of a universal machine, akin to a Turing Machine but specific to the concerns of language processing, and show how it can be instantiated in different ways for different applications. We employ Shannon Entropy within the machine to measure the complexity of each NLP task.

In his epoch-making work, Shannon (1951) demonstrated how to compute the amount of information in a message. He considered the case in which a string of input symbols is considered one by one, and the uncertainty of the next is measured by counting how difficult it is to guess. We make the fundamental assumption that most NLP tasks can be viewed as transformations of notation, in which a stream of input symbols is transformed and/or embellished into a stream of output symbols (for example, POS tagging is the task of embellishing each symbol with its tag, and MT is the task of outputting the appropriate translation word (s)). Under this assumption one can ask: how much uncertainty is there in making the embellishment or transformation? This clearly depends on the precise nature of the task, on the associated auxiliary knowledge resources, and on the actual algorithm employed. We discuss each of these issues below. We first describe the key challenge involved in performing uncertainty comparison using the Entropy measure in Section 2. In Section 3 we provide high-level comments on what properties a framework should have to enable fair complexity comparison. In Section 4, based on the properties identified in Section 3, we consider the theoretical nature of NLP tasks and provide suggestions for instantiating the paradigm. The framework is described in Sections 5, 6 and our results are presented in Section 7. We point out current deficiencies and suggest avenues for fruitful research in Section 8, followed by a conclusion.

2 The Dilemma for Shannon Entropy

2.1 Review of Entropy and Cross Entropy

Entropy, denoted as - Px p (y) log (y), illustrates the amount of information contained in a message, and can be characterized as the uncertainty of a random variable of a process. For example, Shannon (1951) reported an upper bound of 1.3 bits / character symbol for English character prediction and 5.9 bits / word symbol for English word prediction, meaning that it is highly likely that English word prediction is a harder task than English character prediction.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2015 TheNLPEngineAUniversalTuringMacEduard Hovy
Jiwei Li
The NLP Engine: A Universal Turing Machine for NLP2015