IOB Tag Set

References

(Smith et al., 2008) ⇒ Larry Smith, Lorraine K. Tanabe, Rie J. Ando, Cheng-Ju Kuo, I-Fang Chung, Chun-Nan Hsu, Yu-Shi Lin, Roman Klinger, Christoph M. Friedrich, and Kuzman Ganchev, Manabu Torii, Hongfang Liu, Barry Haddow, Craig A. Struble, Richard J. Povinelli, Andreas Vlachos, William A. Baumgartner, Lawrence Hunter, Bob Carpenter, Richard T. Tsai, Hong-Jie Dai, Feng Liu, Yifei Chen, Chengjie Sun, Sophia Katrenko, Pieter Adriaans, Christian Blaschke, Rafael Torres, Mariana Neves, Preslav Nakov, Anna Divoli, Manuel Maña-López, Jacinto Mata, and W. John Wilbur. (2008). “Overview of BioCreative II Gene Mention Recognition.” In: Genome biology, 9(Suppl 2). doi:10.1186/gb-2008-9-s2-s2
- QUOTE: NER is frequently accomplished with B-I-O tagging, which classifies each token as being at the beginning of the named entity (B), continuing the entity (I), or outside of any entity to be tagged (O).

(Sarawagi, 2006) ⇒ Sunita Sarawagi. (2006). “Efficient Inference on Sequence Segmentation Models.” In: Proceedings of the 23rd International Conference on Machine Learning (ICML 2006). doi:10.1145/1143844.1143944
- QUOTE:Traditionally many of these applications have been artificially formulated as sequence labeling tasks at the expense of a loss of flexibility of features that can be exploited. This limitation is partly addressed by expanding the label set — for example, a popular choice in named entity recognition tasks (NER) is the Begin-Continue-End-Unique-other (BCEUO) encoding of entity labels (Borthwick et al., 1998), and in syntactic chunking tasks is the Begin-Inside-Outside (BIO) encoding of labels (Zhang et al., 2002).

(Ramshaw & Marcus, 1995) ⇒ Lance A. Ramshaw, and Mitch P. Marcus. (1995). “Text Chunking Using Transformation-based Learning.” In: Proceedings of the Third ACL Workshop on Very Large Corpora (WVLC 1995).
- QUOTE:In the baseNP experiments aimed at non-recursive NP structures, we use the chunk tag set (I, O, B}, where words marked I are inside some baseNP, those marked O are outside, and the B tag is used to mark the left most item of a baseNP which immediately follows another baseNP. In these tests, punctuation marks were tagged in the same way as words.