2010 ContextualInformationImprovesOO

From GM-RKB
Jump to navigation Jump to search

Subject Headings: OOV Word; OOV Word Detection System; Large Vocabulary Continuous Speech Recognition (LVCSR) System, Maximum Entropy OOV Detection System, Word Error Rate (WER), OOV Corpus, Parada-HLTCOE MaxEnt OOV Detection System.

Notes

Cited By

Quotes

Abstract

Out-of-vocabulary (OOV) words represent an important source of error in large vocabulary continuous speech recognition (LVCSR) systems. These words cause recognition failures, which propagate through pipeline systems impacting the performance of downstream applications. The detection of OOV regions in the output of a LVCSR system is typically addressed as a binary classification task, where each region is independently classified using local information. In this paper, we show that jointly predicting OOV regions, and including contextual information from each region, leads to substantial improvement in OOV detection. Compared to the state-of-the-art, we reduce the missed OOV rate from 42.6% to 28.4% at 10% false alarm rate.

1. Introduction

2. Maximum Entropy OOV Detection

Our baseline system is the Maximum Entropy model with features from filler and confidence estimation models proposed by Rastrow et al. (2009a). Based on filler models, this approach models OOVs by constructing a hybrid system which combines words and sub-word units. Sub-word units, or fragments, are variable length phone sequences selected using statistical methods (Siohan and Bacchiani, 2005). The vocabulary contains a word and a fragment lexicon; fragments are used to represent OOVs in the language model text. Language model training text is obtained by replacing low frequency words (assumed OOVs) by their fragment representation. Pronunciations for OOVs are obtained using grapheme to phoneme models (Chen, 2003).

...

2010 ContextualInformationImprovesOO Fig1.png
Figure 1: Example confusion network from the hybrid system with OOV regions and BIO encoding. Hypothesis are ordered by decreasing value of posterior probability. Best hypothesis is the concatenation of the top word/fragments in each bin. We omit posterior probabilities due to spacing.

...

3. Experimental Setup

4 From MaxEnt to CRFs

5. Context for OOV Detection

6. Local Lexical Context

7. Global Utterance Context

8. Final System

9. Related Work

10. Conclusion and Future Work

Acknowledgments

The authors thank Ariya Rastrow for providing the baseline system code, Abhinav Sethy and Bhuvana Ramabhadran for providing the data used in the experiments and for many insightful discussions.

References

...

2009

2005

2003

BibTeX

@inproceedings{2010_ContextualInformationImprovesOO,
  author    = {Carolina Parada and
               Mark Dredze and
               Denis Filimonov and
               Frederick Jelinek},
  title     = {Contextual Information Improves {OOV} Detection in Speech},
  booktitle = {Proceedings of the Human Language Technologies: Conference of the North American Chapter
               of the Association of Computational Linguistics (HLT-NAACL 2010)},
  pages     = {216--224},
  publisher = {The Association for Computational Linguistics},
  year      = {2010},
  url       = {https://www.aclweb.org/anthology/N10-1025/},
}


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2010 ContextualInformationImprovesOOMark Dredze
Carolina Parada
Denis Filimonov
Frederick Jelinek
Contextual Information Improves {OOV} Detection in Speech2010