# Forward-Backward Algorithm

A Forward-Backward Algorithm is an dynamic programming-based statistical inference algorithm that calculates the posterior marginal probability of all hidden state variables given an observed sequence.

**Context:**- It can be viewed as a generalization of the Viterbi algorithm where instead of choosing the optimal state sequence it evaluates all possible state sequences given the observation sequence.
- It can range from being a Constrained Forward-Backward Algorithm to being an Unconstrained Forward-Backward Algorithm.
- It can be implemented by a Forward Backward-based System.
- It can have Time Complexity of: [math]O(N^2 T)[/math], for N states and length T.
- It is an algorithm for marginalization and inference in HMMs.

**Example(s):**- a Forward-Backward Smoothing Algorithm such as in (Russell & Norvig, 2016):
- aima-python repository source code:
`forward_backward(HMM, ev, prior)`

[1]

- aima-python repository source code:
- a Forward-Backward Splitting Algorithm such as in Garrigos, Rosasco & Villa, 2017 and 2017|Lorenz & Pock, 2015.
- an Island Algorithm,
- a Kalman Smoothing Algorithm,

- a Forward-Backward Smoothing Algorithm such as in (Russell & Norvig, 2016):
**Counter-Example(s):****See:**Hidden Markov Model, Posterior Probability, Marginal Probability, Smoothing Task, Time Complexity, Multivariate Distribution, Conditional Independence.

## References

### 2018a

- (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Forward–backward_algorithm Retrieved:2018-4-23.
- The
**forward–backward algorithm**is an inference algorithm for hidden Markov models which computes the posterior marginals of all hidden state variables given a sequence of observations emissions [math] o_{1:t}:= o_1,\dots,o_t [/math] , i.e. it computes, for all hidden state variables [math] X_k \in \{X_1, \dots, X_t\} [/math], the distribution [math] P(X_k\ |\ o_{1:t}) [/math] . This inference task is usually called*smoothing*. The algorithm makes use of the principle of dynamic programming to efficiently compute the values that are required to obtain the posterior marginal distributions in two passes. The first pass goes forward in time while the second goes backward in time; hence the name*forward–backward algorithm*.The term

*forward–backward algorithm*is also used to refer to any algorithm belonging to the general class of algorithms that operate on sequence models in a forward–backward manner. In this sense, the descriptions in the remainder of this article refer but to one specific instance of this class.

- The

### 2018b

- (Metacademy, 2018) ⇒ https://metacademy.org/graphs/concepts/forward_backward_algorithm Retrieved:2018-9-23
- QUOTE: The forward-backward algorithm is an algorithm for computing posterior marginals in a hidden Markov model (HMM). It is based on dynamic programming, and has linear complexity in the length of the sequence. It is used as a component of several other algorithms, such as the Baum-Welch algorithm and block Gibbs sampling in factorial HMMs.

### 2017

- (Garrigos, Rosasco & Villa, 2017) ⇒ Guillaume Garrigos, Lorenzo Rosasco, and Silvia Villa (2017). "Convergence of the Forward-Backward Algorithm: Beyond the Worst Case with the Help of Geometry". arXiv preprint arXiv:1703.09477.

### 2016

- (Russell & Norvig, 2016) ⇒ Stuart J. Russell, and Peter Norvig (2016). "Artificial intelligence: a modern approach". Malaysia; Pearson Education Limited. AIMA Project.
- QUOTE:
**Figure 15.4**The forward–backward algorithm for smoothing: computing posterior probabilities of a sequence of states given a sequence of observations. The FORWARD and BACKWARD operators are defined by Equations (--) and (--), respectively.

- QUOTE:

### 2015

- (Lorenz & Pock, 2015) ⇒ Dirk A. Lorenz, and Thomas Pock (2015). "An inertial forward-backward algorithm for monotone inclusions". Journal of Mathematical Imaging and Vision, 51(2), 311-325. DOI:10.1007/s10851-014-0523-2

### 2013a

- (Collins, 2013) ⇒ Michael Collins. (2013). "The Forward-backward Algorithm". Columbia Columbia Univ
- QUOTE: The forward-backward algorithm has very important applications to both hidden Markov models (HMMs) and conditional random fields (CRFs). It is a dynamic programming algorithm, and is closely related to the Viterbi algorithm for decoding with HMMs or CRFs(...)
Figure 1: The forward-backward algorithm.

- QUOTE: The forward-backward algorithm has very important applications to both hidden Markov models (HMMs) and conditional random fields (CRFs). It is a dynamic programming algorithm, and is closely related to the Viterbi algorithm for decoding with HMMs or CRFs(...)

### 2013b

- (Rao at al., 2013) ⇒ Nikhil Rao, Parikshit Shah, Stephen Wright, and Robert Nowak (2013, May). "A greedy forward-backward algorithm for atomic norm constrained minimization". In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 5885-5889). IEEE. DOI: 10.1109/ICASSP.2013.6638793
- QUOTE: In this paper, we propose a Forward-Backward scheme that adds atoms greedily in the same manner as in [15] , while allowing atoms to be purged later if their contribution to the observations is superseded by atoms selected later in the process. Our scheme also admits flexibility in adjusting the current iterate, for example, by sparsifying its representation in terms of the current basis. Our algorithm enjoys similar convergence properties to the method of [15] (as we can show by making minor adjustments to the analysis of that paper) while producing solutions that are significantly sparser, as can be seen from our experiments(...)
The forward step of our algorithm below chooses a new atom and adjusts the coefficients to the basis using the same strategy as in [15]. The backward step removes one of the existing basis elements if the value of [math]f[/math] is not degraded by more than a certain fraction (less than 1) of the improvement gained during the forward step. All iterates remain feasible with respect to the constraint [math]||x||_A \leq \tau[/math] .

Source Code: CoGEnT

- QUOTE: In this paper, we propose a Forward-Backward scheme that adds atoms greedily in the same manner as in [15] , while allowing atoms to be purged later if their contribution to the observations is superseded by atoms selected later in the process. Our scheme also admits flexibility in adjusting the current iterate, for example, by sparsifying its representation in terms of the current basis. Our algorithm enjoys similar convergence properties to the method of [15] (as we can show by making minor adjustments to the analysis of that paper) while producing solutions that are significantly sparser, as can be seen from our experiments(...)

### 2011

- (Tewari, Ravikumar, & Dhillon, 2011) ⇒ Ambuj Tewari, Pradeep K. Ravikumar, and Inderjit S. Dhillon (2011). "Greedy algorithms for structurally constrained high dimensional problems". In Advances in Neural Information Processing Systems (pp. 882-890).

### 2010

- (Sridharan, 2010) ⇒ Ramesh Sridharan (2010). "HMMs and the forward-backward algorithm"
- QUOTE: These notes give a short review of Hidden Markov Models (HMMs) and the forward-backward algorithm. They’re written assuming familiarity with the sum-product belief propagation algorithm, but should be accessible to anyone who’s seen the fundamentals of HMMs before.

### 2004

- (Culotta & McCallum, 2004) ⇒ Aron Culotta, and Andrew McCallum. (2004). “Confidence Estimation for Information Extraction.” In: Proceedings of HLT-NAACL (NAACL 2004).
- QUOTE: The Forward-Backward algorithm can be viewed as a generalization of the Viterbi algorithm: instead of choosing the optimal state sequence, Forward-Backward evaluates all possible state sequences given the observation sequence. The “forward values” [math]\alpha_{t+1}(s_i)[/math] are recursively defined similarly as in Eq. 2, except the max is replaced by a summation. Thus we have [math]\alpha_{t+1}(s_i) = \sum_{s'} \bigl[ \alpha_t(s') \operatorname{exp} \bigl( \sum_k \lambda_k f_k(s',s_i,\bf{o},t)\bigr) \bigr][/math]. (3) terminating in [math]Z_o = \sum_i \alpha_T(s_i)[/math] from Eq. 1. To estimate the probability that a field is extracted correctly, we constrain the Forward-Backward algorithm such that each path conforms to some subpath of constraints [math]C = \lt s_q … s_r\gt [/math] from time step [math]q[/math] to [math]r[/math].

### 1972

- (Baum, 1972) ⇒ Leonard E. Baum. (1972). “An equality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes.” In: Inequalities, 3.

### 1970

- (Baum et al., 1970) ⇒ Leonard E. Baum, Ted Petrie, George Soules, and Norman Weiss. (1970). “A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains.” In: The annals of mathematical statistics 41, no. 1.