CoNLL-2000 Text String Labeled Segmentation Format

From GM-RKB
Jump to navigation Jump to search

A CoNLL-2000 Text String Labeled Segmentation Format is a BIO-style Text String Labeled Segmentation Format introduced in CoNLL-2000 shared task.



References

2008

2003

  • http://www.clips.ua.ac.be/conll2003/ner/
    • Output example of the evaluation program for this shared task: conlleval. The example deals with text chunking, a task which uses the same output format as this named entity task. The program requires the output of the NER system for each word to be appended to the corresponding line in the test file, with a single space between the line and the output tag. Make sure you keep the empty lines in the test file otherwise the software may mistakingly regard separate entities as one big entity.

2000

  • http://www.cnts.ua.ac.be/conll2000/chunking/output.html
    • his is an output example for the Perl script conlleval, which can be used for measuring the performance of a system that has processed the CoNLL-2000 shared task data. The input of this script should consist of lines similar to the shared task data files.

      Each line contains four symbols: the current word, its part-of-speech tag (POS), the chunk tag according to the corpus and the predicted chunk tag. Sentences have been separated by empty lines.

      Here is an example:

  Boeing NNP B-NP I-NP
  's POS B-NP B-NP
  747 CD I-NP I-NP
  jetliners NNS I-NP I-NP
  . . O O

  Rockwell NNP B-NP I-NP
  said VBD B-VP B-VP
the DT B-NP B-NP
  agreement NN I-NP I-NP