Semi-Supervised Learning Task

Jump to: navigation, search

A semi-supervised learning task is a supervised learning task with access to an unlabeled training set.



  • (Wikipedia, 2016) ⇒ Retrieved:2016-2-6.
    • Semi-supervised learning is a class of supervised learning tasks and techniques that also make use of unlabeled data for training - typically a small amount of labeled data with a large amount of unlabeled data. Semi-supervised learning falls between unsupervised learning (without any labeled training data) and supervised learning (with completely labeled training data). Many machine-learning researchers have found that unlabeled data, when used in conjunction with a small amount of labeled data, can produce considerable improvement in learning accuracy. The acquisition of labeled data for a learning problem often requires a skilled human agent (e.g. to transcribe an audio segment) or a physical experiment (e.g. determining the 3D structure of a protein or determining whether there is oil at a particular location). The cost associated with the labeling process thus may render a fully labeled training set infeasible, whereas acquisition of unlabeled data is relatively inexpensive. In such situations, semi-supervised learning can be of great practical value. Semi-supervised learning is also of theoretical interest in machine learning and as a model for human learning.

      As in the supervised learning framework, we are given a set of [math] l [/math] independently identically distributed examples [math] x_1,\dots,x_l \in X [/math] with corresponding labels [math] y_1,\dots,y_l \in Y [/math] . Additionally, we are given [math] u [/math] unlabeled examples [math] x_{l+1},\dots,x_{l+u} \in X [/math] . Semi-supervised learning attempts to make use of this combined information to surpass the classification performance that could be obtained either by discarding the unlabeled data and doing supervised learning or by discarding the labels and doing unsupervised learning.

      Semi-supervised learning may refer to either transductive learning or inductive learning. The goal of transductive learning is to infer the correct labels for the given unlabeled data [math] x_{l+1},\dots,x_{l+u} [/math] only. The goal of inductive learning is to infer the correct mapping from [math] X [/math] to [math] Y [/math] .

      Intuitively, we can think of the learning problem as an exam and labeled data as the few example problems that the teacher solved in class. The teacher also provides a set of unsolved problems. In the transductive setting, these unsolved problems are a take-home exam and you want to do well on them in particular. In the inductive setting, these are practice problems of the sort you will encounter on the in-class exam.

      It is unnecessary (and, according to Vapnik's principle, imprudent) to perform transductive learning by way of inferring a classification rule over the entire input space; however, in practice, algorithms formally designed for transduction or induction are often used interchangeably.