- (Abney, 1989) ⇒ Steven P. Abney. (1989). “Parsing By Chunks.” In: The MIT Parsing Volume, 1988-89. Center for Cognitive Science, MIT.
- It can also be cited as S. Abney. (1991). “Parsing by chunks.” In: R. Berwick, S. Abney, and C. Tenny, editors, Principle-based Parsing. Kluwer Academic.
- (Sha & Pereira, 2003a) ⇒ Fei Sha, and Fernando Pereira. (2003). “Shallow Parsing with Conditional Random Fields.” In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL 2003). doi:10.3115/1073445.1073473
- (Ramshaw & Marcus, 1995) ⇒ Lance A. Ramshaw, and Mitch P. Marcus. (1995). “Text Chunking Using Transformation-based Learning.” In: Proceedings of the Third ACL Workshop on Very Large Corpora (WVLC 1995).
- (1) [I begin] [with an intuition]: [when I read] [a sentence], [I read it] [a chunk] [at a time]
These chunks correspond in some way to prosodic patterns. It appears, for instance, that the strongest stresses in the sentence fall one to a chunk, and pauses are most likely to fall between chunks. Chunks also represent a grammatical watershed of sorts. The typical chunk consists of a single content word surrounded by a constellation of function words, matching a fixed template. A simple context-free grammar is quite adequate to describe the structure of chunks. By contrast, the relationships between chunks are mediated more by lexical selection than by rigid templates. Co-occurence of chunks is determined not just by their syntactic categories, but is sensitive to the precise words that head them; and the order in which chunks occur is much more flexible than the order of words within chunks.
The work I would like to describe is an attempt to give content to these intuitions, and to show that parsing by chunks has distinct processing advantages, advantages that help explain why the human parser might adopt a chunk-by-chunk strategy.
There is psychological evidence for the existence of chunks. Gee and Grosjean 1983 examine what they call performance structures . These are structures of word clustering that emerge from a variety of types of experimental data, such as pause durations in reading, and naive sentence diagramming. Gee and Grosjean argue that performance structur es are best predicted by what they call φ-phrases . φ-phrases are created by breaking the input string after each syntactic head that is a content word (with the exception that function words syntactically associated with a preceding content word – in particular, object pronouns – group with the preceding content word). The chunks of sentence (1) are φ-phrases.
Unfortunately, Gee and Grosjean must make some undesirable syntactic assumptions. For example, they assume that prenominal adjectives do not qualify as syntactic heads – otherwise, phrases like a big dog would not comprise one chunk, but two. Also, Gee and Grosjean do not assign syntactic structure to chunks.
2 Structure of the Parser
A typical natural language parser processes text in two stages. A tokenizer/morphological analyzer converts a stream of characters into a stream of words, and the parser proper converts a stream of words into a parsed sentence, or a stream of parsed sentences. In a chunking parser, the syntactic analyzer is decomposed into two separate stages, which I call the chunker and the attacher. The chunker converts a stream of words into a stream of chunks, and the attacher converts the stream of chunks into a stream of sentences.
The chunker is a non-deterministic version of an LR parser (Knuth 1965), employing a best-first search. I first give a brief description of LR parsing, for those unfamiliar with it. (A much more detailed discussion can be found in Aho and Ullman 1972.)
3.1 LR Parsing
An LR parser is a deterministic bottom-up parser. It is possible to automatically generate an LR parser for any of a large class of context-free grammars. The parser shifts words from the input string onto the stack until it recognizes a sequence of words matching the right-hand side of a rule from the grammar. At that point, it reduces the sequence to a single node, whose category is given in the left-hand side of the rule.
4.1 Attachment Ambiguities and Lexical Selection
The attacher’s main job is dealing with attachment ambiguities. In basic construction, it is identical to the chunker. It simulates a non-deterministic LR parser, using the four heuristic factors given earlier. But in accordance with the importance of attachment ambiguity resolution, the attacher has two additional factors in its scores:
- 5. Prefer argument attachment, prefer verb attachment
- 6. Prefer low attachment
These factors are used to rate alternative attachment sites. Finding an attachment as argument is more important than finding an attachment as verb, so potential attachment sites are ranked as follows: attachment as verb argument (best), attachment as argument of non-verb, attachment as verb modifier, attachment as modifier of non-verb. The second factor is relative height of attachment sites, counted as number of sentence (IP) nodes below the attachment site in the rightmost branch of the tree at the time of attachment.
5 Comparison to Related Models
5.1 The Chunker and Chart Parsing
An issue I have skirted to now is ambiguity in the output of the chunker. There are at least two sorts of ambiguity that arise that cannot be satisfactorily resolved by the heuristics we have discussed. First, it is possible for the same stretch of words to be analyzed as a chunk in more than one category. This arises especially with single-word chunks. For example, described may represent either a single-word VP or a single-word PtcP, and the ambiguity can be resolved only in context. In this case, both readings are passed to the attacher.
By way of conclusion, I would like to reiterate the advantages of a chunking parser. First, one of the most difficult problems for context-free parsing techniques is attachment ambiguities. But within chunks, (syntactic) attachment ambiguities do not arise, and simple context-free parsing techniques are very effective. By having separate chunker and attacher, we can limit the use of expensive techniques for dealing with attachment ambiguities to the parts of grammar where they are really necessary – i.e., in the attacher.
Another motivation is modularity. Since the chunker is insensitive to the state of the attacher, we can develop and debug it separately from the attacher. The chunker also simplifies the task that the attacher faces: many lexical ambiguities can be resolved within chunks, relieving the attacher of that task, and there is less clutter to deal with at the level of chunks than at the level of words.
A related motivation is that the chunker-attacher division keeps attachment ambiguities from being multiplied with chunk ambiguities. The chunker evaluates chunks as well as it can on its own, instead of making decisions relative to one or another branch of the attacher’s nondeterministic computation.
As we have seen, there is also some psychological evidence for chunks. Gee and Grosjean argue that the ‘performance structures’ that emerge from a range of diverse experiments are best predicted by what they call φ-phrases. Apart from their structure – that is, seen simply as strings – chunks and φ-phrases are nearly identical.
A fifth motivation is related to Gee and Grosjean’s work. They show that φ-phrases are a good predictor of intonation. Given the correspondence between φ-phrases and chunks, there is a possibility of using the chunker in determining intonation, for speech synthesis.
And last, but not least, we can account for a range of otherwise inexplicable syntactic constraints if we assume the existence of chunks. For example, we can explain why *the proud of his son man is odd, by observing that it involves a chunk, of his son, embedded in another chunk, the proud man. (See Abney, to appear.) If the chunks are produced in a stream, it is not possible to interleave them.
- Abney, Steven, (1987) The English Noun Phrase in Its Sentential Aspect, unpublished doctoral dissertation, MIT, Cambridge, MA.
- Abney, Steven, (to appear) ‘Syntactic Affixation and Performance Structures,’ in Views on Phrase Structure, Kluwer.
- Aho, Alfred V., and Jeffrey D. Ullman (1972) The Theory of Parsing, Translation, and Compiling, in two volumes, Prentice-Hall, Inc., Englewood Cliffs, NJ.
- Noam Chomsky, (1986) Barriers, MIT Press, Cambridge, MA.
- Kenneth W. Church, and Ramesh Patil (1982) ‘Coping with Syntactic Ambiguity or How to Put the Block in the Box on the Table,’ American Journal of Computation Linguistics, 8.3-4.
- Frazier, L., and J.D. Fodor (1978) ‘The Sausage Machine: A new two-stage parsing model,’ Cognition 6, 291-325.
- Gee, James Paul, and François Grosjean (1983) ‘Performance Structures: A Psycholinguistic and Linguistic Appraisal,’ Cognitive Psychology 15, 411-458.
- Hindle, Donald, (1983) ‘User manual for Fidditch, a deterministic parser,’ Naval Research Laboratory Technical Memorandum #7590-142.
- Knuth, D.E. (1965) ‘On the translation of languages from left to right,’ Information and Control 8.6, 607-639.
- Williams, George E. (1986) ‘The Solar Cycle in Precambrian Time,’ Scientific American 255.2.
|1989 ParsingByChunks||Steven P. Abney||Parsing By Chunks||http://www.vinartus.net/spa/89d.pdf|