# Self-Training Algorithm

(Redirected from self-training)

A Self-Training Algorithm is a Semi-Supervised Learning Algorithm that makes use of an existing Predictive Model to extract more Training Cases from an Unlabeled Dataset.

**Example(s):****See:**Co-Training Algorithm.

## References

### 2007

- (Zhu, 2007) ⇒ Xiaojin Zhu. (2007). “Semi-Supervised Learning." Tutorial at ICML 2007.
- Self-training algorithm
- Assumption: One’s own high confidence predictions are correct.
- Self-training algorithm:
- 1 Train f from (Xl, Yl)
- 2 Predict on x 2 Xu
- 3 Add (x, f(x)) to labeled data
- 4 Repeat

- Variations in self-training
- Add a few most confident (x, f(x)) to labeled data
- Add all (x, f(x)) to labeled data
- Add all (x, f(x)) to labeled data, weigh each by confidence

- Advantages of self-training
- The simplest semi-supervised learning method.
- A wrapper method, applies to existing (complex) classifiers.
- Often used in real tasks like natural language processing.

- Disadvantages of self-training
- Early mistakes could reinforce themselves.
- Heuristic solutions, e.g. “un-label” an instance if its confidence falls below a threshold.

- Cannot say too much in terms of convergence.
- But there are special cases when self-training is equivalent to the Expectation-Maximization (EM) algorithm.
- There are also special cases (e.g., linear functions) when the closed-form solution is known.

- Early mistakes could reinforce themselves.

- Self-training algorithm

### 2006

- (McClosky et al., 2006) ⇒ David McClosky, Eugene Charniak, and Mark Johnson. (2006). “Effective Self-training for Parsing.” In: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. doi:10.3115/1220835.1220855
- QUOTE: A simple method of incorporating unlabeled data into a new model is self-training. In self-training, the existing model first labels unlabeled data. The newly labeled data is then treated as truth and combined with the actual labeled data to train a new model. This process can be iterated over different sets of unlabeled data if desired. It is not surprising that self-training is not normally effective: Charniak (1997) and Steedman et al. (2003) report either minor improvements or significant damage from using self-training for parsing. Clark et al. (2003) applies self-training to POS-tagging and reports the same outcomes. One would assume that errors in the original model would be amplified in the new model.

### 2004

- (Mihalcea, 2004) ⇒ Rada Mihalcea. (2004). “Co-training and Self-training for Word Sense Disambiguation.” In: Proceedings of NAACL Conference (NAACL 2004).
- QUOTE: This paper investigated the application of co-training and self-training to supervised word sense disambiguation.

### 1997

- (Charniak, 1997a) ⇒ Eugene Charniak. (1997). “Statistical Parsing with a Context-free Grammar and Word Statistics.” In: Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence, (AAAI 1997).