Image Classification Task
- See: YouTube-8M Dataset, Face Recognition.
- (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Computer_vision#Recognition Retrieved:2018-5-23.
The classical problem in computer vision, image processing, and machine vision is that of determining whether or not the image data contains some specific object, feature, or activity. Different varieties of the recognition problem are described in the literature:
- Object recognition (also called object classification)Template:Spaced ndashone or several pre-specified or learned objects or object classes can be recognized, usually together with their 2D positions in the image or 3D poses in the scene. Blippar, Google Goggles and LikeThat provide stand-alone programs that illustrate this functionality.
- IdentificationTemplate:Spaced ndashan individual instance of an object is recognized. Examples include identification of a specific person's face or fingerprint, identification of handwritten digits, or identification of a specific vehicle.
- DetectionTemplate:Spaced ndashthe image data are scanned for a specific condition. Examples include detection of possible abnormal cells or tissues in medical images or detection of a vehicle in an automatic road toll system. Detection based on relatively simple and fast computations is sometimes used for finding smaller regions of interesting image data which can be further analyzed by more computationally demanding techniques to produce a correct interpretation.
Currently, the best algorithms for such tasks are based on convolutional neural networks. An illustration of their capabilities is given by the ImageNet Large Scale Visual Recognition Challenge; this is a benchmark in object classification and detection, with millions of images and hundreds of object classes. Performance of convolutional neural networks, on the ImageNet tests, is now close to that of humans. The best algorithms still struggle with objects that are small or thin, such as a small ant on a stem of a flower or a person holding a quill in their hand. They also have trouble with images that have been distorted with filters (an increasingly common phenomenon with modern digital cameras). By contrast, those kinds of images rarely trouble humans. Humans, however, tend to have trouble with other issues. For example, they are not good at classifying objects into fine-grained classes, such as the particular breed of dog or species of bird, whereas convolutional neural networks handle this with ease.
Several specialized tasks based on recognition exist, such as:
- Content-based image retrievalTemplate:Spaced ndashfinding all images in a larger set of images which have a specific content. The content can be specified in different ways, for example in terms of similarity relative a target image (give me all images similar to image X), or in terms of high-level search criteria given as text input (give me all images which contains many houses, are taken during winter, and have no cars in them).
- Pose estimationTemplate:Spaced ndashestimating the position or orientation of a specific object relative to the camera. An example application for this technique would be assisting a robot arm in retrieving objects from a conveyor belt in an assembly line situation or picking parts from a bin.
- Optical character recognition (OCR)Template:Spaced ndashidentifying characters in images of printed or handwritten text, usually with a view to encoding the text in a format more amenable to editing or indexing (e.g. ASCII).
- 2D Code reading Reading of 2D codes such as data matrix and QR codes.
- Facial recognition
- Shape Recognition Technology (SRT) in people counter systems differentiating human beings (head and shoulder patterns) from objects
- (Abu-El-Haija et al., 2016) ⇒ Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. (2016). “YouTube-8m: A Large-scale Video Classification Benchmark.” In: arXiv preprint arXiv:1609.08675.
- QUOTE: With this sample application, you create HITs asking workers to categorize images based on a list of pre-defined categories. This sample comes with a list of 200 images and asks 3 workers to evaluate each image. You will be creating 200 HITs with 3 assignments each with a total of up to 600 assignments.
- O. Russakovsky et al., "ImageNet Large Scale Visual Recognition Challenge", 2014.