# CRF Training Algorithm

(Redirected from CRF training algorithm)

A CRF Training Algorithm is a graphical model training algorithm that accepts a CRF model family and can be applied by a CRF training system (to solve a CRF training task to produce a CRF structure).

**AKA:**Conditional Random Field Trainer.**Context:**- It can range from being a Fully-Supervised CRF Training Algorithm to being a Semi-Supervised CRF Training Algorithm.
- It can be a Part Of a CRF-based Tagging Algorithm.
- It can require a CRF Feature Template.
- …

**Example(s):**- It can be based on an Iterative Scaling Algorithm (Lafferty et al., 2001)
- It can be based on a Limited-Memory Quasi-Newton Algorithm/L-BFGS. (Tasi et al., 2006)
- It can be based on a Convex Optimization Algorithm (Sha & Pereira, 2003a)
- It can be based on a Stochastic Gradient Optimization Algorithm (Vishwanathan et al., 2006).
- …

**Counter-Example(s):****See:**CRF Label Bias, CRF Length Bias.

## References

### 2016

- http://www.datasciencecentral.com/profiles/blogs/conditional-random-fields-crf-short-survey
- QUOTE: The time complexity of the training process is large enough: [math]\begin{equation*} O(mNTQ2nS), \end{equation*}[/math], where:
- m is the number of training iterations
- N is the number of training data sequences
- T is the average length of training sequences
- Q is the number of class labels
- n is the number of CRF features
- S is the searching time of the optimization algorithm (for example, L-BFGS algorithm, which is considered good for this).

- In practical implementation, the computational time is often larger due to many other operations like numerical scaling, smoothing etc.

- QUOTE: The time complexity of the training process is large enough: [math]\begin{equation*} O(mNTQ2nS), \end{equation*}[/math], where:

### 2010

- (Lavergne et al., 2010) ⇒ Thomas Lavergne, Olivier Cappé, and François Yvon. (2010). “Practical Very Large Scale CRFs.” In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, (ACL 2010).
- QUOTE: In this paper, we address the issue of training very large CRFs, containing up to hundreds output labels and several billion features. Efficiency stems here from the sparsity induced by the use of a
*l*^{1}penalty term. Based on our own implementation, we compare three recent proposals for implementing this regularization strategy.

- QUOTE: In this paper, we address the issue of training very large CRFs, containing up to hundreds output labels and several billion features. Efficiency stems here from the sparsity induced by the use of a

### 2006

- (Vishwanathan et al., 2006) ⇒ S. V. N. Vishwanathan, Nicol N. Schraudolph, Mark W. Schmidt, and Kevin P. Murphy. (2006). “Accelerated Training of Conditional Random Fields with Stochastic Gradient Methods.” In: Proceedings of the 23rd International Conference on Machine learning (ICML-2006). doi:10.1145/1143844.1143966
- QUOTE: We apply Stochastic Meta-Descent (SMD), a stochastic gradient optimization method with gain vector adaptation, to the training of Conditional Random Fields (CRFs). On several large data sets, the resulting optimizer converges to the same quality of solution over an order of magnitude faster than limited-memory BFGS, the leading method reported to date. We report results for both exact and inexact inference techniques.

### 2003

- (Sha & Pereira, 2003a) ⇒ Fei Sha, and Fernando Pereira. (2003). “Shallow Parsing with Conditional Random Fields.” In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL 2003). doi:10.3115/1073445.1073473
- QUOTE: To obtain these results, we had to abandon the original iterative scaling CRF training algorithm for convex optimization algorithms with better convergence properties.