Open main menu

GM-RKB β

LingPipe NER System

References

2010

  • (Lingpipe, 2010) ⇒ Lingpipe online demo http://alias-i.com/lingpipe/web/demo-ne.html
    • Named entity recognition finds mentions of things in text. The interface in LingPipe provides character offset representations as chunkings.
    • Genre-Specific Models: Named entity recognizers in LingPipe are trained from a corpus of data. The examples below extract mentions of people, locations or organizations in English news texts, and mentions of genes and other biological entities of interest in biomedical research literature.
    • Language-Specific Models: LingPipe provides three statistical named-entity recognizers:

com.aliasi.chunk. Size 1st-bestn-bestconfidence
speedaccuracyspeedaccuracyspeedaccuracy
TokenShapeChunker small fastmedium n/a
CharLmHmmChunker medium fastlow mediummedium slowhigh
CharLmRescoringChunker very large slowhigh slowerhigh slowestlow


  • http://alias-i.com/lingpipe/demos/tutorial/ne/read-me.html
    • Running this program on the same input provides the following results … The entity recognizer is 99.99% confident that p53 is a mention of a gene. It's only 73.28% confident about P4 promoter being a gene, and even less confident about insulin-line growth factor II gene. The list gets interesting when we see names that overlap, such as the fourth (index 3) result, which prefixes human to the third result, or the fifth result, which prefixes active to the second result. These confidences reflect the uncertainty of the recognizers.

      The first requirement for training a named entity recognizer is gathering the data.