Optical Character Recognition (OCR) Task

An Optical Character Recognition (OCR) Task is a visual entity recognition task that requires the recognition of the graphemes in a written text.

Context:
- performance: an OCR Performance Measure (such as transcription error or word error rate).
- It can (typically) be composed of an Optical Character Segmentation Task and an Optical Character Classification Task.
- It can be solved by an OCR System (that applies an OCR algorithm).
- It can range from being a Heuristic Optical Character Recognition Task to being a Data-Driven Optical Character Recognition Task (such as supervised OCR).
- It can range from being a Language-Dependent OCR Task to being a Language-Independent OCR Task.
- It can range from being a Hand-Written OCR Task to being a Machine-written OCR Task.
- It can range from being a Script-Dependent OCR Task to being a Script-Independent OCR Task.
- ...
Example(s):
- Scanned Document OCR, such as printed receipt recognition, digitized book recognition, or historical document transcription.
- Scanned Photo OCR, such as license plate number recognition, street sign recognition, or product label extraction.
- Real-time OCR, such as augmented reality recognition, mobile text recognition, or assistive technology for visually impaired.
- ...
Counter-Example(s):
- An OCR Recognition Task.
- A Face Recognition Task.
- A Phoneme Recognition Task or Text Understanding Task.
See: Multiclass Classification Task, Machine Translation, Text-to-Speech, Computer Vision, Document Image Analysis, Natural Language Processing.

References

http://www.mkgandhi.org/images/lefthand.JPG

2014

(Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/Optical_character_recognition Retrieved:2014-9-28.
- Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic conversion of scanned or photographed images of typewritten or printed text into machine-encoded/computer-readable text. It is widely used as a form of data entry from some sort of original paper data source, whether passport documents, invoices, bank statement, receipts, business card, mail, or any number of printed records. It is a common method of digitizing printed texts so that they can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data extraction and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.
  Early versions needed to be programmed with images of each character, and worked on one font at a time. "Intelligent" systems with a high degree of recognition accuracy for most fonts are now common. Some commercial systems are capable of reproducing formatted output that closely approximates the original scanned page including images, columns and other non-textual components.

2005

(Strasburder, 2005) ⇒ Hans Strasburger. (2005). “Unfocussed Spatial Attention Underlies the Crowding Effect in Indirect Form Vision.” In: Journal of Vision, 5(11):8. doi:10.1167/5.11.8
- QUOTE: In a comprehensive analysis, Pelli et al. (2004) have characterized crowding as a process of impaired feature integration occurring in the visual periphery, in contradistinction to (lateral) masking as occurring from impaired feature detection anywhere in the visual field. We have ourselves characterized the visual periphery — where the interesting cases of crowding occur (Strasburger et al., 1991) — as differing from the fovea by the architecture of feature integration (Strasburger & Rentschler, 1996). That argument was based on the differing dependence-on-eccentricity functions of contrast sensitivity for grating detection and for character recognition (Strasburger, 2003b; Strasburger, Gothe, & Lutz, 2000; Strasburger, Rentschler, & Harvey, 1994) and by showing that the difference between the two cannot be explained by a spatial scaling concept (M scaling, cortical-magnification scaling). We concluded that there must be architectural differences across the visual field — in particular between the fovea and the rest of the field — that concern feature integration not feature detection. In a hierarchy of task complexity ranging from
  - (1) pattern detection (present/nonpresent),
  - (2) coarse grating discrimination^[1] (horizontal/vertical),
  - (3) fine grating discrimination (orientation threshold), and
  - (4) character recognition or identification,

↑ The term “discrimination task” is sometimes used in a different meaning, implying the judgement of a quantity being larger or smaller than another (the corresponding psychometric function then goes from −1 to 1). This is not implied here, the intended meaning being that the observer can discriminate between two broadly different stimuli and thereby identify each. The term “identification task” is sometimes used for that case but is avoided here to reserve the concept of identification for those tasks where discrimination between a few cases will not solve the identification.

[1] The term “discrimination task” is sometimes used in a different meaning, implying the judgement of a quantity being larger or smaller than another (the corresponding psychometric function then goes from −1 to 1). This is not implied here, the intended meaning being that the observer can discriminate between two broadly different stimuli and thereby identify each. The term “identification task” is sometimes used for that case but is avoided here to reserve the concept of identification for those tasks where discrimination between a few cases will not solve the identification.

[1]

Optical Character Recognition (OCR) Task

References

2014

2005

Navigation menu

Search