Noun Compound Bracketing Algorithm
A Noun Compound Bracketing Algorithm a Syntactic Parsing Algorithm that can solve a Compound Noun Bracketing Task.
- AKA: Noun Phrase Parsing Algorithm.
- Context:
- It can be implemented by a Noun Compound Bracketing System to solve a Noun Compound Bracketing Task.
- It can range from being a Noun-Noun Compound Bracketing Algorithm, to being a Two-Word Noun Compound Bracketing Algorithm, to being a Multi-Word Compound Bracketing Algorithm.
- It can involved Attachment Disambiguation.
- Example(s):
- Counter-Example(s):
- a Root Bracketing Algorithm (a Root-Finding Algorithm based on the Bracketing Method),
- a Word Sense Disambiguation Algorithm,
- a Partial Parsing Algorithm,
- a Corpus Tagging Algorithm,
- a Named Entity Recognition Algorithm,
- a Part-of-Speech Tagging Algorithm,
- a Noun Phrase Chunking Algorithm,
- a Text Tokenization Algorithm,
- a Word Segmentation Algorithm.
- See: Prepositional Phrase Attachment, Noun Phrase Coordination, Natural Language Processing Task, Computational Linguistics, Computer Speech Processing, Word Sense Disambiguation, Sentiment Analysis, Compound Noun, Natural Language Processing, Complex Nominal, Head Noun, Prenominal modifier, Word Similarity, Keyword Extraction, Text Summarization, Text Analysis.
References
2016
- (Fares, 2016) ⇒ Murhaf Fares. (2016). “A Dataset for Joint Noun-Noun Compound Bracketing and Interpretation.” In: Proceedings of 54th Annual Meeting of the Association for Computational Linguistics - ACL 2016 Student Research Workshop.
- QUOTE: Noun-noun compound bracketing can be defined as the disambiguation of the internal structure of compounds with three nouns or more. For example, we can bracket the compound noon fashion show in two ways:
- 1. Left-bracketing:
[[noon fashion] show]
- 2. Right-bracketing:
[noon [fashion show]]
- 1. Left-bracketing:
- In this example, the right-bracketing interpretation (a fashion show happening at noon) is more likely than the left-bracketing one (a show of noon fashion). However, the correct bracketing need not always be as obvious, some compounds can be subtler to bracket, e.g. car radio equipment (Girju et al., 2005).
2014a
- (Barriere & Menard, 2014) ⇒ Caroline Barriere, and Pierre Andre Menard. (2014). “Multiword Noun Compound Bracketing Using Wikipedia.” In: Proceedings of the First Workshop on Computational Approaches to Compound Analysis (ComAComA 2014).
- QUOTE: The noun compound bracketing task consists in determining related subgroups of nouns within a larger compound. For example (from Lauer (1995)), (<woman (aid worker)) requires a right-bracketing interpretation, contrarily to ((copper alloy) rod) requiring a left-bracketing interpretation. When only three words are used, n1 n2 n3, bracketing is defined as a binary decision between grouping (n1,n2) or grouping (n2,n3). Two models, described in early work by Lauer (1995), are commonly used to inform such decision: the adjacency model and the dependency model. The former compares probabilities (or more loosely, strength of association) of two alternative adjacent noun compounds, that of n1 n2 and of n2 n3. The latter compares probabilities of two alternative dependencies, either between n1 and n3 or between n2 and n3.
2014b
- (Menard & Barriere, 2014) ⇒ Pierre Andre Menard, and aCaroline Barriere. (2014). “Linked Open Data and Web Corpus Data for Noun Compound Bracketing.” In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014).
- QUOTE: In the field of computational linguistics, large corpora have been shown to be quite good for the task of noun compound bracketing. |Such task consists in determining which nouns within a larger noun compound form subgroups. For example (from Lauer (1995)), woman aid worker would be bracketed as
woman [aid worker]
, called a right-bracketing, contrarily to copper alloy rod, which would be bracketed as[copper alloy] rod
, called a leftbracketing.In compound bracketing, when only three words are used, [math]\displaystyle{ n_1\; n_2\; n_3 }[/math], the task becomes a binary decision between grouping [math]\displaystyle{ n_1 }[/math] and [math]\displaystyle{ n_2 }[/math] or grouping [math]\displaystyle{ n_2 }[/math] and [math]\displaystyle{ n_3 }[/math]. Two models, described in early work by Lauer (1995) and still used in recent work, are the adjacency model and the dependency model. The former compares probabilities (or more loosely strength of association) of two alternative adjacent noun compounds, that of [math]\displaystyle{ n_1 }[/math] [math]\displaystyle{ n_2 }[/math] and of [math]\displaystyle{ n_2 }[/math] [math]\displaystyle{ n_3 }[/math]. The latter compares probabilities of two alternative attachment (modifying) noun relations, that of [math]\displaystyle{ n_1 }[/math] [math]\displaystyle{ n_3 }[/math] and of [math]\displaystyle{ n_2 }[/math] [math]\displaystyle{ n_3 }[/math] (...)
Noun compound bracketing, sometimes referred to as NP parsing (Pitler et al., 2010), has been studied as a task in itself (e.g. Lauer (1995), Vadas and Curran (2007a), Nakov and Hearst (2005)). It is also studied as the first step of semantic analysis of NPs (Girju et al., 2005) where not only subgroups of words are found within the compound, but semantic relations between these groups are looked at (Nastase et al., 2013).
- QUOTE: In the field of computational linguistics, large corpora have been shown to be quite good for the task of noun compound bracketing. |Such task consists in determining which nouns within a larger noun compound form subgroups. For example (from Lauer (1995)), woman aid worker would be bracketed as
2007
- (Vadas & Curran, 2007b) ⇒ David Vadas, and James R. Curran. (2007). “Large-Scale Supervised Models for Noun Phrase Bracketing .” In: Proceedings of 10th Conference of the Pacific Association for Computational Linguistics (PACLING).
- QUOTE: Noun phrase (NP) bracketing is a requirement for the syntactic and semantic analysis of NPs. In the literature, e.g. Marcus (1980, p253) and Lauer (1995), the task is generally framed as follows: given a 3 word noun phrase like those below, decide whether it is left branching (1) or right branching (2).
((crude oil) prices)
(1)(world (oil prices))
(2) NP bracketing is crucial for many Natural Language Processing (NLP) tasks. For example, question answering (QA) and anaphora resolution both require (potentially nested) candidate NPs, typically identified using a parser ...
NP bracketing is similar to chunking (Ramshaw and Marcus, 1995), as both tasks aim to identify NP structure...
A basic method for solving the simple NP bracketing task was first described in Marcus (1980). This adjacency model compares the semantic association of words 1–2 to that between words 2–3. If the former is more likely, then the compound is left branching, otherwise it is right branching ...
Lauer (1995) proposes a new variation: the dependency model. In this case, we compare the semantic association of words 1–2 to that of words 1–3. This change is motivated by the dependencies that arise from the structure of the NP. We would expect a dependency between words 2–3 whether the compound was left or right branching, so there is no reason to analyse it.
- QUOTE: Noun phrase (NP) bracketing is a requirement for the syntactic and semantic analysis of NPs. In the literature, e.g. Marcus (1980, p253) and Lauer (1995), the task is generally framed as follows: given a 3 word noun phrase like those below, decide whether it is left branching (1) or right branching (2).
2005a
- (Girju et al., 2005) ⇒ Roxana Girju, Dan Moldovan, Marta Tatu, and Daniel Antohe. (2005). “On the Semantics of Noun Compounds.” In: Computer Speech & Language, 19(4).
- ABSTRACT: This paper provides new insights on the semantic characteristics of two and three noun compounds. An analysis is performed using two sets of semantic classification categories: a list of 8 prepositional paraphrases previously proposed by Lauer Designing statistical language learners: experiments on noun compounds, Ph.D. Thesis, Macquarie University, Australia and a new set of 35 semantic relations introduced by us. We show the distribution of these semantic categories on a corpus of noun compounds and present several models for the bracketing and the semantic classification of noun compounds. The results are compared against state-of-the-art models reported in the literature.
- NOTES: supervised model
- NOTES: bracketing in context
- NOTES: requires WordNet senses
2005b
- (Nakov & Hearst, 2005) ⇒ Preslav Nakov, and Marti Hearst. (2005). “Search Engine Statistics Beyond the n-gram: Application to Noun Compound Bracketing.” In: Proceedings of CoNLL-2005.
- An important but understudied language analysis problem is that of noun compound bracketing, which is generally viewed as a necessary step towards noun compound interpretation. Consider the following contrastive pair of noun compounds:
2005c
- (Lapata & Keller, 2005) ⇒ Mirella Lapata, and Frank Keller. (2005). “Web-based Models for Natural Language Processing.” In: ACM Transactions on Speech and Language Processing (TSLP), 2(1).
- The first analysis task we consider is the syntactic disambiguation of compound nouns, which has received a fair amount of attention in the NLP literature (Pustejovsky et al. 1993; Resnik 1993; Lauer 1995).
- Previous approaches typically compare different bracketings and choose the most likely one. The adjacency model compares
[n1 n27#93;
against[n2 n3]
and adopts a right branching analysis if[n2 n3]
is more likely than[n1 n2]
. The dependency model compares[n1 n2]
against[n1 n3]
and adopts a right branching analysis if[n1 n3]
is more likely than[n1 n2]
. - The simplest model of compound noun disambiguation compares the frequencies of the two competing analyses and opts for the most frequent one (Pustejovsky et al. 1993). Lauer (1995) proposes an unsupervised method for estimating the frequencies of the competing bracketings based on a taxonomy or a thesaurus. He uses a probability ratio to compare the probability of the left-branching analysis to that of the right-branching ...
1995
- (Lauer, 1995) ⇒ Mark Lauer. (1995). “Corpus Statistics Meet the Noun Compound: Some empirical results.” In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics.
- ABSTRACT: A variety of statistical methods for noun compound analysis are implemented and compared. The results support two main conclusions. First, the use of conceptual association not only enables a broad coverage, but also improves the accuracy. Second, an analysis model based on dependency grammar is substantially more accurate than one based on deepest constituents, even though the latter is more prevalent in the literature.
1993a
- (Pustejovsky et al., 1993) ⇒ James Pustejovsky, Peter Anick, and Sabine Bergler. (1993). “Lexical Semantic Techniques for Corpus Analysis.” In: Computational Linguistics Journal, 19(2).
- QUOTE: Noun compound recognition and bracketing. In technical sublanguages, noun compounds are often employed to expand the working vocabulary without the invention of new word forms. It is therefore useful in applications such as lexicon-assisted full-text information retrieval (Anick 1992) to include such noun compounds as lexical items for both querying and thesaurus browsing. We construct bracketed noun compounds from our database of partial parses in a two-step process. The first simply searches the corpus for (recurring) contiguous sequences of nouns. Then, to bracket each compound that includes more than two nouns, we test whether possible subcomponents of the phrase exist on their own (as complete noun compounds) elsewhere in the corpus. Sample bracketed compounds derived from the computer troubleshooting database include
[[system management] utility], [TK50 [tape drive]], [[database management] system]
.
- QUOTE: Noun compound recognition and bracketing. In technical sublanguages, noun compounds are often employed to expand the working vocabulary without the invention of new word forms. It is therefore useful in applications such as lexicon-assisted full-text information retrieval (Anick 1992) to include such noun compounds as lexical items for both querying and thesaurus browsing. We construct bracketed noun compounds from our database of partial parses in a two-step process. The first simply searches the corpus for (recurring) contiguous sequences of nouns. Then, to bracket each compound that includes more than two nouns, we test whether possible subcomponents of the phrase exist on their own (as complete noun compounds) elsewhere in the corpus. Sample bracketed compounds derived from the computer troubleshooting database include
1993b
- (Resnik, 1993) ⇒ Philip S. Resnik. (1993). “Selection and information: A Class-based Approach to Lexical Relationships." Ph.D. thesis, University of Pennsylvania.