2004 SemiMarkovCRFforIE

Subject Headings: Semi-Markov Conditional Random Field, Sequence Segmentation Statistical Models, Supervised Sequence Segmentation Task.

Notes

(Daume III & Marcu, 2005) ⇒ Hal Daumé, III, and Daniel Marcu. (2005). “Learning as Search Optimization: approximate large margin methods for structured prediction.” In: Proceedings of the 22nd International Conference on Machine Learning (ICML 2005) doi:10.1145/1102351.1102373

We describe semi-Markov conditional random-fields (semi-CRFs), a conditionally trained version of semi-Markov chains. Intuitively, a semi-CRF on an Input sequence [math]\displaystyle{ \bf{x} }[/math] outputs a “segmentation” of [math]\displaystyle{ \bf{x} }[/math], in which labels are assigned to segments (i.e., subsequences) of [math]\displaystyle{ \bf{x} }[/math] rather than to individual elements [math]\displaystyle{ x_i }[/math] of [math]\displaystyle{ \bf{x} }[/math]. Importantly, features for semi-CRFs can measure properties of segments, and transitions within a segment can be non-Markovian. In spite of this additional power, exact learning and inference algorithms for semi-CRFs are polynomial-time — often only a small constant factor slower than conventional CRFs. In experiments on five named entity recognition problems, semi-CRFs generally outperform conventional CRFs.

Conditional random fields (CRFs) are a recently-introduced formalism [12] for representing a conditional model Pr(y|x), where both x and y have non-trivial structure (often sequential). Here we introduce a generalization of sequential CRFs called semi-Markov conditional random fields (or semi-CRFs). Recall that semi-Markov chain models extend hidden Markov models (HMMs) by allowing each state s_i to persist for a non-unit length of time d_i. After this time has elapsed, the system will transition to a new state s₀, which depends only on s_i ; however, during the “segment” of time between [math]\displaystyle{ i }[/math] to [math]\displaystyle{ i }[/math] + d_i, the behavior of the system may be non-Markovian. Semi-Markov models are fairly common in certain applications of statistics [8, 9], and are also used in reinforcement learning to model hierarchical Markov decision processes [19].

Semi-CRFs are a conditionally trained version of semi-Markov chains. In this paper, we present inference and learning methods for semi-CRFs. We also argue that segments often have a clear intuitive meaning, and hence semi-CRFs are more natural than conventional CRFs. We focus here on named entity recognition (NER), in which a segment corresponds to an extracted entity; however, similar arguments might be made for several other tasks, such as gene-finding [11] or NP-chunking [16].

In NER, a semi-Markov formulation allows one to easily construct entity-level features (such as “entity length” and …

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2004 SemiMarkovCRFforIE	William W. Cohen Shun-Zheng Yu Sunita Sarawagi			Semi-Markov Conditional Random Fields for Information Extraction		Proceedings of Advances in Neural Information Processing System	http://books.nips.cc/papers/files/nips17/NIPS2004 0427.pdf			2004