Sentence Boundary Detection Task

A Sentence Boundary Detection Task is a text segmentation task that requires the segmentation of a linguistic expression into its component natural language sentences




  • (Wikipedia, 2011) ⇒
    • QUOTE: Sentence segmentation is the problem of dividing a string of written language into its component sentences. In English and some other languages, using punctuation, particularly the full stop character is a reasonable approximation. However even in English this problem is not trivial due to the use of the full stop character for abbreviations, which may or may not also terminate a sentence. For example Mr. is not its own sentence in "Mr. Smith went to the shops in Jones Street.” When processing plain text, tables of abbreviations that contain periods can help prevent incorrect assignment of sentence boundaries.

      As with word segmentation, not all written languages contain punctuation characters which are useful for approximating sentence boundaries.