Automated Speech-to-Text Transcription Task

(Redirected from speech recognition)
Jump to navigation Jump to search

An Automated Speech-to-Text Transcription Task is a speech-to-text transcription task that is an automated transcription task which requires the conversion of spoken utterances into a machine-processable artifact.




    • In computer science, speech recognition (SR) is the translation of spoken words into text. It is also known as "automatic speech recognition", "ASR", "computer speech recognition", "speech to text", or just "STT". Some SR systems use "training" where an individual speaker reads sections of text into the SR system. These systems analyze the person's specific voice and use it to fine tune the recognition of that person's speech, resulting in more accurate transcription. Systems that do not use training are called "Speaker Independent" systems. Systems that use training are called "Speaker Dependent" systems.

      Speech recognition applications include voice user interfaces such as voice dialing (e.g. "Call home"), call routing (e.g. "I would like to make a collect call"), domotic appliance control, search (e.g. find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g. a radiology report), speech-to-text processing (e.g., word processors or emails), and aircraft (usually termed Direct Voice Input).

      The term voice recognitionCite error: Invalid <ref> tag; invalid names, e.g. too many[1][2] refers to finding the identity of "who" is speaking, rather than what they are saying. Recognizing the speaker can simplify the task of translating speech in systems that have been trained on specific person's voices or it can be used to authenticate or verify the identity of a speaker as part of a security process.


  • Jennifer Lai, Clare-Marie Karat, and Nicole Yankelovich. "Conversational speech interfaces and technologies." Human-Computer Interaction: Design Issues, Solutions, and Applications (2009): 53.