2003 BibliographicAttrExtractFromErroRefBasedOnAStatModel

(Takasu, 2003) ⇒ Atsuhiro Takasu. (2003). “Bibliographic Attribute Extraction from Erroneous References Based on a Statistical Model. “ In: Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2003). doi:10.1109/JCDL.2003.1204843

Subject Headings:

Notes

Cited By

~63 http://scholar.google.com/scholar?cites=18182970713474273177

Quotes

Abstract

In this paper, we propose an method for extracting bibliographic attributes from reference strings captured using Optical Character Recognition (OCR) and an extended hidden Markov model. Bibliographic attribute extraction can be used in two ways. One is reference parsing in which attribute values are extracted from OCR-processed references for bibliographic matching. The other is reference alignment in which attribute values are aligned to the bibliographic record to enrich the vocabulary of the bibliographic database. In this paper, we first propose a statistical model for attribute extraction that represents both the syntactical structure of references and OCR error patterns. Then, we perform experiments using bibliographic references obtained from scanned images of papers in journals and transactions and show that useful attribute values are extracted from OCR-processed references. We also show that the proposed model has advantages in reducing the cost of preparing training data, a critical problem in rule-based systems.

References

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2003 BibliographicAttrExtractFromErroRefBasedOnAStatModel	Atsuhiro Takasu			Bibliographic Attribute Extraction from Erroneous References Based on a Statistical Model		Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries		10.1109/JCDL.2003.1204843		2003