# Supervised Sequence-Member Classification Algorithm

A Supervised Sequence-Member Classification Algorithm is a supervised classification algorithm that can solve a supervised sequence-member labeling task.

**Context:**- It can range from (typically) being a Supervised Model-based Structured-Input Classification Algorithm to being a Supervised Model-based Structured-Input Classification Algorithm (such as kNN).
- It can be a Supervised Tuple-based Classification Algorithm with
- It can make use of a Supervised Tagging Feature.

**Example(s):****Counter-Example(s):****See:**Supervised Classification Algorithm.

## References

### 2015

- (Huang et al., 2015) ⇒ Zhiheng Huang, Wei Xu, and Kai Yu. (2015). “Bidirectional LSTM-CRF Models for Sequence Tagging.” In: arXiv preprint arXiv:1508.01991.
- QUOTE: In this paper, we propose a variety of Long Short-Term Memory (LSTM) based models for sequence tagging. These models include LSTM networks, bidirectional LSTM (BI-LSTM) networks, LSTM with a Conditional Random Field (CRF) layer (LSTM-CRF) and bidirectional LSTM with a CRF layer (BI-LSTM-CRF). Our work is the first to apply a bidirectional LSTM CRF (denoted as BI-LSTM-CRF) model to NLP benchmark sequence tagging data sets. ...

### 2012

- (Graves, 2012) ⇒ Alex Graves. (2012). “Supervised Sequence Labelling with Recurrent Neural Networks". Springer Berlin Heidelberg,

### 2010

- (Mejer & Crammer, 2010) ⇒ Avihai Mejer, and Koby Crammer. (2010). “Confidence in Structured-prediction Using Confidence-weighted Models.” In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP 2010).
- QUOTE: We employ a general approach (Collins, 2002; Crammer et al., 2009a) to generalize binary classification and use a joined feature mapping of an instance [math]x[/math] and a labeling [math]y[/math] into a common vector space, [math]\Phi(x, y) \in \mathbb{R}^d[/math].
Given an input instance [math]x[/math] and a model [math]μ \in \mathbb{R}^d[/math] we predict the labeling with the highest score, [math]\hat{y} = \text{arg max}_z\mu \cdot \bf{\Phi}(x,z)[/math]. A brute-force approach evaluates the value of the score [math]μ \cdot \Phi(x, z)[/math] for each possible labeling [math]z \in \mathcal{Y}^n[/math], which is not feasible for large values of [math]n[/math]. Instead, we follow standard factorization and restrict the joint mapping to be of the form, [math]\Phi(x,y) = \sum^{n}_{p=1} \Phi(x,y_p) + \sum^{n}_{q=2} \Phi(x, y_q, y_q-1)[/math]. That is, the mapping is a sum of mappings, each taking into consideration only a label of a single part, or two consecutive parts. The time required to compute the max operator is linear in [math]n[/math] and quadratic in [math]K[/math] using the dynamic-programming Viterbi algorithm.

- QUOTE: We employ a general approach (Collins, 2002; Crammer et al., 2009a) to generalize binary classification and use a joined feature mapping of an instance [math]x[/math] and a labeling [math]y[/math] into a common vector space, [math]\Phi(x, y) \in \mathbb{R}^d[/math].

### 1999

- (Tufis, 1999) ⇒ Dan Tufis. (1999). “Tiered Tagging and Combined Language Models Classifiers.” In: Proceedings of the Second International Workshop on Text, Speech and Dialogue.