Written Sentence Boundary Detection Task

From GM-RKB
Jump to: navigation, search

A Written Sentence Boundary Detection Task is a text segmentation task that is a sentence boundary detection task (which requires the identification of the start and end of linguistic sentences in a text item).

  • AKA: Written Sentence Segmentation Task, Sentence Boundary Detection, Text Segmentation, Sentence Segmentation.
  • Context:
  • Example(s):
    • "I walked home with Ms. Smith. She ate breakfast.” ⇒ <SENT>I walked home with Ms. Smith.</SENT> <SENT>She ate breakfast.</SENT>
    • The virA and virG genes control the induction of vir genes by plant signals. virA encodes a membrane-bound sensor kinase protein and virG encodes a cytoplasmic regulator protein. an challenging example where the first letter of a sentence is not capitalized.
    • "Plant signal molecules such as acetosyringone and certain monosaccharides induce the expression of Agrobacterium tumefaciens virulence (vir) genes, which are required for the processing, transfer, and possibly integration of a piece of the bacterial plasmid DNA (T-DNA) into the plant genome. Two of the vir genes, virA and virG, belonging to the bacterial two-component regulatory system family, control the induction of vir genes by plant signals. virA encodes a membrane-bound sensor kinase protein and virG encodes a cytoplasmic regulator protein.
      <PSID=8611.0>Plant signal molecules such as acetosyringone and certain monosaccharides induce the expression of Agrobacterium tumefaciens virulence (vir) genes, which are required for the processing, transfer, and possibly integration of a piece of the bacterial plasmid DNA (T-DNA) into the plant genome. <PSID=8611.1>Two of the vir genes, virA and virG, belonging to the bacterial two-component regulatory system family, control the induction of vir genes by plant signals. <PSID=8611.2>virA encodes a membrane-bound sensor kinase protein and virG encodes a cytoplasmic regulator protein.
  • Counter-Example(s):
  • See: PPLRE Project, Full Stop, Sentences.


References

2015

  • (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Text_segmentation#Sentence_segmentation Retrieved:2015-4-11.
    • Sentence segmentation is the problem of dividing a string of written language into its component sentences. In English and some other languages, using punctuation, particularly the full stop character is a reasonable approximation. However even in English this problem is not trivial due to the use of the full stop character for abbreviations, which may or may not also terminate a sentence. For example Mr. is not its own sentence in "Mr. Smith went to the shops in Jones Street." When processing plain text, tables of abbreviations that contain periods can help prevent incorrect assignment of sentence boundaries.

      As with word segmentation, not all written languages contain punctuation characters which are useful for approximating sentence boundaries.

1998

1997

1994

1989