2003 HierHiddenMarkovModelsForIE

(Skounakis et al., 2003) ⇒ Marios Skounakis, Mark Craven, and Soumya Ray. (2003). “Hierarchical Hidden Markov Models for Information Extraction.” In: Proceedings of IJCAI Conference (IJCAI 2003).

Subject Headings: Markov Model, Hierarchical HMM.

Notes

Cited By

~120++ …

Quotes

Abstract

Information extraction can be defined as the task of automatically extracting instances of specified classes or relations from text. We consider the case of using machine learning methods to induce models for extracting relation instances from biomedical articles. We propose and evaluate an approach that is based on using hierarchical hidden Markov models to represent the grammatical structure of the sentences being processed. Our approach first uses a shallow parser to construct a multi-level representation of each sentence being processed. Then we train hierarchical HMMs to capture the regularities of the parses for both positive and negative sentences. We evaluate our method by inducing models to extract binary relations in three biomedical domains. Our experiments indicate that our approach results in more accurate models than several baseline HMM approaches.

…

Empirical Evaluation

In this section we present experiments testing the hypothesis that our hierarchical HMMs are able to provide more accurate models than HMMs that incorporate less grammatical information. In particular we empirically compare two types of hierarchical HMMs with three baseline HMMs.

Context HHMMs: hierarchical HMMs with context features, as described in the previous section.
HHMMs: hierarchical HMMs without context features.
Phrase HMMs: single-level HMMs in which states are typed (as in the phrase level of an HHMM) and emit whole phrases. These HMMs were introduced by Ray and Craven (2001). Unlike hierarchical HMMs, the states of Phrase HMMs do not have embedded HMMs which emit words. Instead each state has a single multinomial distribution to represent its emissions, and each emitted phrase is treated as a bag of words.
POS HMMs: single-level HMMs in which states emit words, but are typed with part-of-speech tags so that a given state can emit words with only a single POS.
Token HMMs: single-level HMMs in which untyped states emit words.

We evaluate our hypotheses on three data sets that we have assembled from the biomedical literature.1 The data sets are composed of abstracts gathered from the MEDLINE database [National Library of Medicine, 2003]. The rst set contains instances of the subcellular-localization relation. It is composed of 769 positive and 6,360 negative sentences. The positive sentences contain 949 total tuple instances. The number of actual tuples is 404 since some tuples occur multiple times either in the same sentence or in multiple sentences. The second, which we refer to as the disorder-association data set, characterizes a binary relation between genes and disorders. It contains 829 positive and 11,771 negative sentences. The positive sentences represent 878 instances of 145 tuples. The third, which we refer to as the protein-interaction data set, characterizes physical interactions between pairs of proteins. It is composed of 5,457 positive and 42,015 negative sentences. It contains 8,088 instances of 819 tuples.

…

Conclusion

We have presented an approach to learning models for information extraction that is based on using hierarchical HMMs to represent the grammatical structure of the sentences being processed. We employ a shallow parser to obtain parse trees for sentences and then use these trees to construct the input representation for the hierarchical HMMs

Our approach builds on previous work on hierarchical HMMs and incorporating grammatical knowledge into information-extraction models. The application of HHMMs to IE is novel and has required us to modify HHMM learning algorithms to operate on a hierarchical input representation. In particular our methods take into account that phrases and states must have matching types, and that phrase states must emit complete phrases. We have also introduced a novel modification of HHMMs in which observations can be feature vectors. With respect to previous work on incorporating grammatical knowledge into IE models, our main contribution is an approach that takes advantage of grammatical information represented at multiple scales. An appealing property of our approach is that it generalizes to additional levels of description of the input text.

We have evaluated our approach in the context of learning IE models to extract instances of three biomedical relations from the abstracts of scientific articles. These experiments demonstrate that incorporating a hierarchical representation of grammatical structure improves extraction accuracy in hidden Markov models.

References

[Bikel et al., 1999] D. Bikel, R. Schwartz, and R. Weischedel. An algorithm that learns what's in a name. Machine Learning, 34(1):211231, 1999.
[Cestnik, 1990] B. Cestnik. Estimating probabilities: A crucial task in machine learning. In: Proceedings of the 9th European Conference on Artificial Intelligence, pages 147150, Stockholm, Sweden, (1990). Pitman.
[Fine et al., 1998] S. Fine, Yoram Singer, and N. Tishby. The hierarchical hidden Markov model: Analysis and applications. Machine Learning, 32:4162, 1998.
[Freitag and McCallum, 2000] Dayne Freitag and Andrew McCallum. Information extraction with HMM structures learned by stochastic optimization. In: Proceedings of the 17th National Conference on Artificial Intelligence, (2000). AAAI Press.
[Hirschman et al., 2002] Lynette Hirschman, J. Park, Jun'ichi Tsujii, L. Wong, and C. Wu. Accomplishments and challenges in literature data mining for biology. Bioinformatics, 18:15531561, 2002.
[Krogh, 1994] A. Krogh. Hidden Markov models for labeled sequences. In: Proceedings of the 12th International Conference on Pattern Recognition, pages 140144, Jerusalem, Israel, (1994). IEEE Computer Society Press.
[Lafferty et al., 2001] John D. Lafferty, Andrew McCallum, and Fernando Pereira. Conditional random elds: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning, pages 282289,Williamstown, MA, (2001). Morgan Kaufmann.
[Leek, 1997] T. Leek. Information extraction using hidden Markov models. M.S. thesis, Dept. of Computer Science and Engineering, Univ. of California, San Diego, 1997.
[McCallum et al., 2000] Andrew McCallum, Dayne Freitag, and Fernando Pereira. Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the 17th International Conference on Machine Learning, pages 591598, Stanford, CA, (2000). Morgan Kaufmann.
[Miller et al., 2000] S. Miller, H. Fox, L. Ramshaw, and R.Weischedel. A novel use of statistical parsing to extract information from text. In: Proceedings of the 6th Applied Natural Language Processing Conf., pages 226233, Seattle, WA, (2000). Association for Computational Linguistics.
[National Library of Medicine, 2003] National Library of Medicine. The MEDLINE database, (2003). http://www.ncbi.nlm.nih.gov/PubMed/.
[Porter, 1980] M. F. Porter. An algorithm for suf x stripping. Program, 14(3):127130, 1980.
[Rabiner, 1989] Lawrence R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257286, 1989.
[Ray and Craven, 2001] S. Ray and M. Craven. Representing sentence structure in hidden Markov models for information extraction. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence, pages 12731279, Seattle, WA, (2001). Morgan Kaufmann.
[Riloff, 1998] Ellen Riloff. The sundance sentence analyzer, (1998). http://www.cs.utah.edu/projects/nlp/.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2003 HierHiddenMarkovModelsForIE	Mark Craven Soumya Ray Marios Skounakis			Hierarchical Hidden Markov Models for Information Extraction			http://www.biostat.wisc.edu/~craven/papers/ijcai03.pdf