LingPipe NER System

References

(Lingpipe, 2010) ⇒ Lingpipe online demo http://alias-i.com/lingpipe/web/demo-ne.html
- Named entity recognition finds mentions of things in text. The interface in LingPipe provides character offset representations as chunkings.
- Genre-Specific Models: Named entity recognizers in LingPipe are trained from a corpus of data. The examples below extract mentions of people, locations or organizations in English news texts, and mentions of genes and other biological entities of interest in biomedical research literature.
- Language-Specific Models: LingPipe provides three statistical named-entity recognizers:

`com.aliasi.chunk.`	Size	1st-best		n-best		confidence
`com.aliasi.chunk.`	Size	speed	accuracy	speed	accuracy	speed	accuracy
`TokenShapeChunker`	small	fast	medium	n/a
`CharLmHmmChunker`	medium	fast	low	medium	medium	slow	high
`CharLmRescoringChunker`	very large	slow	high	slower	high	slowest	low

http://alias-i.com/lingpipe/demos/tutorial/ne/read-me.html
- Running this program on the same input provides the following results … The entity recognizer is 99.99% confident that p53 is a mention of a gene. It's only 73.28% confident about P4 promoter being a gene, and even less confident about insulin-line growth factor II gene. The list gets interesting when we see names that overlap, such as the fourth (index 3) result, which prefixes human to the third result, or the fifth result, which prefixes active to the second result. These confidences reflect the uncertainty of the recognizers.
  The first requirement for training a named entity recognizer is gathering the data.