2006 TheDifficultiesOfTexNamExtr

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Organism NER.

Notes

Quotes

Abstract

In modern biology, digitization of biosystematics publications is an important task. Extraction of taxonomic names from such documents is one of its major issues. This is because these names identify the various genera and species. This article reports on our experiences with learning techniques for this particular task. We say why established Named-Entity Recognition techniques are somewhat difficult to use in our context. One reason is that we have only very little training data available. Our experiments show that a combining approach that relies on regular expressions, heuristics, and word-level language recognition achieves very high precision and recall and allows to cope with those difficulties.,


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2006 TheDifficultiesOfTexNamExtrGuido Sautter
Klemens Böhm
The Difficulties of Taxonomic Name Extraction and a Solutionhttp://acl.ldc.upenn.edu/W/W06/W06-3325.pdf