2007 AnalysisofMorphBasedSpeechRecog
- (Creutz et al., 2007) ⇒ Mathias Creutz, Teemu Hirsimaki, Mikko Kurimo, Antti Puurula, Janne Pylkkonen, Vesa Siivola, Matti Varjokallio, Ebru Arisoy, Murat Saraclar, and Andreas Stolcke. (2007). “Analysis of Morph-Based Speech Recognition and the Modeling of Out-of-Vocabulary Words Across Languages.” In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (ACL 2007).
Subject Headings: Subword-Level Language Model; OOV Word; In-Vocabulary Word; OOV Rate; OOV Detection, Morfessor; MAP Optimization Criterion; Grapheme-To-Phoneme Mapping, Morph Language Model, Word Language Model.
Notes
Cited By
- Google Scholar: ~ 36 Citations.
Quotes
Abstract
We analyze subword-based language models (LMs) in large-vocabulary continuous speech recognition across four "morphologically rich" languages: Finnish, Estonian, Turkish, and Egyptian Colloquial Arabic. By estimating n-gram LMs over sequences of morphs instead of words, better vocabulary coverage and reduced data sparsity is obtained. Standard word LMs suffer from high out-of-vocabulary (OOV) rates, whereas the morph LMs can recognize previously unseen word forms by concatenating morphs. We show that the morph LMs generally outperform the word LMs and that they perform fairly well on OOVs without compromising the accuracy obtained for in-vocabulary words.
References
BibTeX
@inproceedings{2007_AnalysisofMorphBasedSpeechRecog,
author = {Mathias Creutz and
Teemu Hirsimaki and
Mikko Kurimo and
Antti Puurula and
Janne Pylkkonen and
Vesa Siivola and
Matti Varjokallio and
Ebru Arisoy and
Murat Saraclar and
Andreas Stolcke},
editor = {Candace L. Sidner and
Tanja Schultz and
Matthew Stone and
ChengXiang Zhai},
title = {Analysis of Morph-Based Speech Recognition and the Modeling of Out-of-Vocabulary
Words Across Languages},
booktitle = {Proceedings of the Human Language Technology Conference of the North American Chapter
of the Association of Computational Linguistics (ACL 2007)},
pages = {380--387},
publisher = {The Association for Computational Linguistics},
year = {2007},
url = {https://www.aclweb.org/anthology/N07-1048/},
}
| Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
|---|---|---|---|---|---|---|---|---|---|---|
| 2007 AnalysisofMorphBasedSpeechRecog | Andreas Stolcke Mathias Creutz Teemu Hirsimaki Mikko Kurimo Antti Puurula Janne Pylkkonen Vesa Siivola Matti Varjokallio Ebru Arisoy Murat Saraclar | Analysis of Morph-Based Speech Recognition and the Modeling of Out-of-Vocabulary Words Across Languages | 2007 |