Self-Training Algorithm

From GM-RKB

(Redirected from self-training)

Jump to navigation Jump to search

A Self-Training Algorithm is a Semi-Supervised Learning Algorithm that makes use of an existing Predictive Model to extract more Training Cases from an Unlabeled Dataset.

Example(s):
- Snowball Algorithm.
See: Co-Training Algorithm.

References

2007

(Zhu, 2007) ⇒ Xiaojin Zhu. (2007). “Semi-Supervised Learning." Tutorial at ICML 2007.
- Self-training algorithm
  - Assumption: One’s own high confidence predictions are correct.
  - Self-training algorithm:
    - 1 Train f from (Xl, Yl)
    - 2 Predict on x 2 Xu
    - 3 Add (x, f(x)) to labeled data.
    - 4 Repeat
- Variations in self-training
  - Add a few most confident (x, f(x)) to labeled data.
  - Add all (x, f(x)) to labeled data.
  - Add all (x, f(x)) to labeled data, weigh each by confidence
- Advantages of self-training
  - The simplest semi-supervised learning method.
  - A wrapper method, applies to existing (complex) classifiers.
  - Often used in real tasks like natural language processing.
- Disadvantages of self-training
  - Early mistakes could reinforce themselves.
    - Heuristic solutions, e.g. “un-label” an instance if its confidence falls below a threshold.
  - Cannot say too much in terms of convergence.
    - But there are special cases when self-training is equivalent to the Expectation-Maximization (EM) algorithm.
    - There are also special cases (e.g., linear functions) when the closed-form solution is known.

2006

(McClosky et al., 2006) ⇒ David McClosky, Eugene Charniak, and Mark Johnson. (2006). “Effective Self-training for Parsing.” In: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. doi:10.3115/1220835.1220855
- QUOTE: A simple method of incorporating unlabeled data into a new model is self-training. In self-training, the existing model first labels unlabeled data. The newly labeled data is then treated as truth and combined with the actual labeled data to train a new model. This process can be iterated over different sets of unlabeled data if desired. It is not surprising that self-training is not normally effective: Charniak (1997) and Steedman et al. (2003) report either minor improvements or significant damage from using self-training for parsing. Clark et al. (2003) applies self-training to POS-tagging and reports the same outcomes. One would assume that errors in the original model would be amplified in the new model.

2004

(Mihalcea, 2004) ⇒ Rada Mihalcea. (2004). “Co-training and Self-training for Word Sense Disambiguation.” In: Proceedings of NAACL Conference (NAACL 2004).
- QUOTE: This paper investigated the application of co-training and self-training to supervised word sense disambiguation.

1997

(Charniak, 1997a) ⇒ Eugene Charniak. (1997). “Statistical Parsing with a Context-free Grammar and Word Statistics.” In: Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence, (AAAI 1997).

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=Self-Training_Algorithm&oldid=815344"

Concept