2009 GeneralizedExpectCritforBootstrap...

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Abstract

Traditionally, machine learning approaches for information extraction require human annotated data that can be costly and time-consuming to produce. However, in many cases, there already exists a database (DB) with schema related to the desired output, and records related to the expected input text. We present a conditional random field (CRF) that aligns tokens of a given DB record and its realization in text. The CRF model is trained using only the available DB and unlabeled text with generalized expectation criteria. An annotation of the text induced from inferred alignments is used to train an information extractor. We evaluate our method on a citation extraction task in which alignments between DBLP database records and citation texts are used to train an extractor. Experimental results demonstrate an error reduction of 35% over a previous state-of-the-art method that uses heuristic alignments.

References


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2009 GeneralizedExpectCritforBootstrap...Kedar BellareGeneralized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignmenthttp://www.cs.umass.edu/~kedarb/papers/dbie ge align.pdf