1995 TextChunkingUsingTransfBasedLearning

Jump to: navigation, search

Subject Headings: Base NP Chunking Task, Phrase Chunking Task, IOB Tagging Task.


Cited By



Eric D. Brill introduced transformation-based learning and showed that it can do part-of-speech tagging with fairly high accuracy. The same method can be applied at a higher level of textual interpretation for locating chunks in the tagged text, including non-recursive "baseNP" chunks. For this purpose, it is convenient to view chunking as a tagging problem by encoding the chunk structure in new tags attached to each word. In automatic tests using Treebank-derived data, this technique achieved recall and precision rates of roughly 92% for baseNP chunks and 88% for somewhat more complex chunks that partition the sentence. Some interesting adaptations to the transformation-based learning approach are also suggested by this application.

1 Introduction

Text chunking involves dividing sentences into non-overlapping segments on the basis of fairly superficial analysis. [[Abney (1991)]] has proposed this as a useful and relatively tractable precursor to full parsing, since it provides a foundation for further levels of analysis including verb-argument identification, while still allowing more complex attachment decisions to be postponed to a later phase. Since chunking includes identifying the non-recursive portions of noun phrases, it can also be useful for other purposes including index term generation.

Most efforts at superficially extracting segments from sentences have focused on identifying low-level noun groups, either using hand-built grammars and finite state techniques or using statistical models like HMMs trained from corpora. In this paper, we target a somewhat higher level of chunk structure using Brill's (1993b) transformation-based learning mechanism, in which a sequence of transformational rules is learned from a corpus; this sequence iteratively improves upon a baseline model for some interpretive feature of the text. This technique has previously been used not only for part-of-speech tagging (Brill, 1994), but also for prepositional phrase attachment disambiguation (Brill and Resnik, 1994), and assigning unlabeled binary-branching tree structure to sentences (Brill, 1993a). Because transformation-based learning uses pattern-action rules based on selected features of the local context, it is helpful for the values being predicted to also be encoded locally. In the text-chunking application, encoding the predicted chunk structure in tags attached to the words, rather than as brackets between words, avoids many of the difficulties with unbalanced bracketings that would result if such local rules were allowed to insert or alter inter-word brackets directly.

In this study, training and test sets marked with two different types of chunk structure were derived algorithmically from the parsed data in the Penn Treebank corpus of Wall Street Journal text (Marcus et al., 1994). The source texts were then run through Brill's part-of-speech tagger (Brill, 1993c), and, as a baseline heuristic, chunk structure tags were assigned to each word based on its part-of-speech tag. Rules were then automatically learned that updated these chunk structure tags based on neighboring words and their part-of-speech and chunk tags. Applying transformation-based learning to text chunking turns out to be different in interesting ways from its use for part-of-speech tagging. The much smaller tagset calls for a different organization of the computation, and the fact that part-of-speech assignments as well as word identities are fixed suggests different optimizations.

2 Text Chunking

[[Abney (1991)]] has proposed text chunking as a useful preliminary step to parsing. His chunks are inspired in part by psychological studies of Gee and Grosjean (1983) that link pause durations in reading and naive sentence diagraming to text groupings that they called C-phrases, which very roughly correspond to breaking the string after each syntactic head that is a content word. Abney's other motivation for chunking is procedural, based on the hypothesis that the identification of chunks can be done fairly dependably by finite state methods, postponing the decisions that require higher-level analysis to a parsing phase that chooses how to combine the chunks.

2.1 Existing Chunk Identification Techniques

On the grammar-based side, [[1992_SurfaceGramAnForTheExtrOfTermNPs|Bourigault (1992)]] describes a system for extractingterminological noun phrases” from French text. This system first uses heuristics to find “maximal length noun phrases", and then uses a grammar to extractterminological units." For example, from the maximal NP le disque dur de la station de travail it extracts the two terminological phrases disque dur, and station de travail. Bourigault claims that the grammar can parse "around 95% of the maximal length noun phrases" in a test corpus into possible terminological phrases, which then require manual validation. However, because its goal is terminological phrases, it appears that this system ignores NP chunk-initial determiners and other initial prenominal modifiers, somewhat simplifying the parsing task.

Voutilalnen (1993), in his impressive NPtool system, uses an approach that is in some ways similar to the one used here, in that he adds to his part-of-speech tags a new kind of tag that shows chunk structure; the chunk tag "@>N", for example, is used for determiners and premodifiers, both of which group with the following noun head. He uses a lexicon that lists all the possible chunk tags for each word combined with hand-built constraint grammar patterns. These patterns eliminate impossible readings to identify a somewhat idiosyncratic kind of target noun group that does not include initial determiners but does include postmodifying prepositional phrases (including determiners).

NPtool parseApparent correct parse
less [time] [less time]
the other hand the [other hand]
many [advantages] [many advantages]
[binary addressing] and [instruction formats] [binary addressing and instruction formats]
a purely [binary computer] a [purely binary computer]

2.2 Deriving Chunks from Treebank Parses

The goal of the "baseNP" chunks was to identify essentially the initial portions of nonrecursive noun phrases up to the head, including determiners but not including postmodifying prepositional phrases or clauses. These chunks were extracted from the Treebank parses, basically by selecting NPs that contained no nested NPs 1. The handling of conjunction followed that of the Treebank annotators as to whether to show separate baseNPs or a single baseNP spanning the conjunction 2. Possessives were treated as a special case, viewing the possessive marker as the first word of a new baseNP, thus flattening the recursive structure in a useful way. The following sentences give examples of this baseNP chunk structure:

  • During [N the third quarter N], IN Compaq N] purchased [N a former Wang Laboratories manufacturing facility N] in [N Sterling N], [N Scotland N], which will be used for IN international service and repair operations N]
  • [N The government N] has [N other agencies and instruments N] for pursuing [N these other objectives N]
  • Even IN Mao Tse-tung N] [N's China/v] began in [N 1949 N] with [N a partnership N] between [N the communists N] and [N a number N] of IN smaller, non-communist parties N]

4.1 Encoding Choices

Applying transformational learning to text chunking requires that the system's current hypotheses about chunk structure be represented in a way that can be matched against the pattern parts of rules. One way to do this would be to have patterns match tree fragments and actions modify tree geometries, as in Brill's transformational parser (1993a). In this work, we have found it convenient to do so by encoding the chunking using an additional set of tags, so that each word carries both a part-of-speech tag and also a "chunk tag" from which the chunk structure can be derived.

In the baseNP experiments aimed at non-recursive NP structures, we use the chunk tag set (I, O, B}, where words marked I are inside some baseNP, those marked O are outside, and the B tag is used to mark the left most item of a baseNP which immediately follows another baseNP. In these tests, punctuation marks were tagged in the same way as words.

In the experiments that partitioned text into N and V chunks, we use the chunk tag set {BN, N, BV, V, P), where BN marks the first word and N the succeeding words in an N-type group while BY and Y play the same role for V-type groups. Punctuation marks, which are ignored in Abney's chunk grammar, but which the Treebank data treats as normal lexical items with their own part-of-speech tags, are unambiguously assigned the chunk tag P. Items tagged P are allowed to appear within N or V chunks; they are irrelevant as far as chunk boundaries are concerned, but they are still available to be matched against as elements of the left hand sides of rules.

Encoding chunk structure with tags attached to words rather than non-recursive bracket markers inserted between words has the advantage that it limits the dependence between different elements of the encoded representation. While brackets must be correctly paired in order to derive a chunk structure, it is easy to define a mapping that can produce a valid chunk structure from any sequence of chunk tags; the few hard cases that arise can be handled completely locally. For example, in the baseNP tag set, whenever a B tag immediately follows an 0, it must be treated as an I, and, in the partitioning chunk tag set, wherever a V tag immediately follows an N tag without any intervening BV, it must be treated as a BV.

8 Conclusions

By representing text chunking as a kind of tagging problem, it becomes possible to easily apply transformation-based learning. We have shown that this approach is able to automatically induce a chunking model from supervised training that achieves recall and precision of 92% for baseNP chunks and 88% for partitioning N and V chunks. Such chunking models provide a useful and feasible next step in textual interpretation that goes beyond part-of-speech tagging, and that serve as a foundation both for larger-scale grouping and for direct extraction of subunits like index terms. In addition, some variations in the transformation-based learning algorithm are suggested by this application that may also be useful in other settings.


  • (Abney, 1989) ⇒ Steven P. Abney. (1989). “Parsing By Chunks." The MIT Parsing Volume, 1988-89. Center for Cognitive Science, MIT.
  • (Bourigault, 1992) ⇒ Didier Bourigault. (1992). “Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases.” In: Proceedings of the Fifteenth International Conference on Computational Linguistics. doi:10.3115/993079.993111
  • Eric Brill. 1993a. Automatic grammar induction and parsing free text: A transformation-based approach. In: Proceedings of the DARPA Speech and Natural Language Workshop, 1993, pages 237-242.
  • Eric Brill. 1993b. A Corpus-based Approach to Language Learning. Ph.D. thesis, University of Pennsylvania.
  • Eric Brill. 1993c. Rule based tagger, version 1.14. Available from ftp.cs.jhu.edu in the directory /pub/brill/programs/.
  • Eric Brill. (1994). Some advances in transformation-based part of speech tagging. In: Proceedings of the Twelfth National Conference on Artificial Intelligence, pages 722-727. (cmplg/ 9406010).
  • Eric Brill, and Philip Resnik. (1994). A rule-based approach to prepositional attachment disambiguation. In: Proceedings of the Sixteenth International Conference on Computational Linguistics. (cmp-lg/9410026).
  • Kenneth Church. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In Second Conference on Applied Natural Language Processing. ACL.
  • Ejerhed, Eva I. 1988. Finding clauses in unrestricted text by finitary and stochastic methods. In Second Conference on Applied Natural Language Processing, pages 219-227. ACL.
  • Gee, James Paul and Francois Grosjean. 1983. Performance structures: A psycholinguistic and linguistic appraisal. Cognitive Psychology, 15:411-458.
  • Kupiec, Julian. (1993). An algorithm for finding noun phrase correspondences in bilingual corpora. In: Proceedings of the 31st Annual Meeting of the ACL, pages 17-22.
  • (Marcus et al., 1994) ⇒ Mitchell Marcus, Grace Kim, Mary Ann Marcinkiewicz, Robert MacIntyre, Ann Bies, Mark Ferguson, Karen Katz, and Britta Schasberger. (1994). “The Penn Treebank: A revised corpus design for extracting predicate argument structure.” In: Human Language Technology, ARPA March 1994 Workshop.
  • (Ramshaw & Marcus, 1994) ⇒ Lance A. Ramshaw, and Mitchell P. Marcus. (1994). “Exploring the Statistical Derivation of Transformational Rule Sequences for Part-of-Speech Tagging.” In: Proceedings of the ACL Balancing Act Workshop on Combining Symbolic and Statistical Approaches to Language. (cmp-lg/9406011).
  • Voutilainen, Atro. (1993). NPTool, a detector of English noun phrases. In: Proceedings of the Workshop on Very Large Corpora, pages 48-57. ACL, June. (cmp-lg/9502010).


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
1995 TextChunkingUsingTransfBasedLearningLance A. Ramshaw
Mitchell P. Marcus
Text Chunking Using Transformation-based LearningProceedings of the Third ACL Workshop on Very Large Corporahttp://www.aclweb.org/anthology-new/W/W95/W95-0107.pdf1995