2006 APhrasebasedStatisticalModelfor

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Lexical Normalization, SMS, Texting Language, Phrase-level Language Model, Short Messaging Service (SMS) Text, SMS Normalization Task.

Notes

Cited By

Quotes

Author Keywords

Abstract

Short Messaging Service (SMS) texts behave quite differently from normal written texts and have some very special phenomena. To translate SMS texts, traditional approaches model such irregularities directly in Machine Translation (MT). However, such approaches suffer from customization problem as tremendous effort is required to adapt the language model of the existing translation system to handle SMS text style. We offer an alternative approach to resolve such irregularities by normalizing SMS texts before MT. In this paper, we view the task of SMS normalization as a translation problem from the SMS language to the English language and we propose to adapt a phrase-based statistical MT model for the task. Evaluation by 5-fold cross validation on a parallel SMS normalized corpus of 5000 sentences shows that our method can achieve 0.80702 in BLEU score against the baseline BLEU score 0.6958. Another experiment of translating SMS texts from English to Chinese on a separate SMS text corpus shows that, using SMS normalization as MT preprocessing can largely boost SMS translation performance from 0.1926 to 0.3770 in BLEU score.

References

BibTeX

@inproceedings{2006_APhrasebasedStatisticalModelfo,
  author    = {AiTi Aw and
               [[Min Zhang]] and
               Juan Xiao and
               Jian Su},
  editor    = {Nicoletta Calzolari and
               Claire Cardie and
               Pierre Isabelle},
  title     = {A Phrase-Based Statistical Model for SMS Text Normalization},
  booktitle = {Proceedings of the 21st International Conference on Computational Linguistics
               and 44th Annual Meeting of the Association for Computational Linguistics
               (ACL2006)},
  publisher = {The Association for Computer Linguistics},
  year      = {2006},
  url       = {https://www.aclweb.org/anthology/P06-2005/},
}


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2006 APhrasebasedStatisticalModelforMin Zhang
Jian Su
AiTi Aw
Juan Xiao
A Phrase-based Statistical Model for SMS Text Normalization2006