Deterministic Attention Mechanism

From GM-RKB
Jump to navigation Jump to search

A Deterministic Attention Mechanism is an Attention Mechanism that is based on a deterministic model.



References

2017

$c_{i}=\sum_{j \in \mathcal{D}_{i}} \mathbf{A}_{m} h_{j}$ (6)
Here, $\mathcal{D}_{i}$ is a list, saving the indices of the words that should be paid attention to at time step $i$ while generating the target-side sequence. $\mathbf{A}_{m}$ is a deterministic alignment matrix with the shape $dim(c) \times dim(h)$, where $dim(c)$ and $dim(h)$ are the dimensions of any $c_i$-s and any $h_j$ -s, respectively, and $1 \leq m \leq |\mathcal{D}_{i}|$ denoting the index of the parameter.

Compared with the probabilistic attention mechanism, our deterministic attention mechanism has the following characteristics:

2015

$\displaystyle \mathbb{E}_{p\left(s_t\vert a\right)}\big[\mathbf{\hat{z}}_i\big]=\sum_{i=1}^L\alpha_{t,i}\mathbf{a}_i$ (8)

and formulate a deterministic attention model by computing a soft attention weighted annotation vector $\phi\left(\{\mathbf{a}_i\},\{\alpha_i\}\right)=\sum_{i=1}^L\alpha_i\mathbf{a}_i;$ as proposed by Bahdanau et al. (2014). This corresponds to feeding in a soft $\alpha$ weighted context into the system. The whole model is smooth and differentiable under the deterministic attention, so learning end-to-end is trivial by using standard back-propagation.

 Learning the deterministic attention can also be understood as approximately optimizing the marginal likelihood in Eq. (5) under the attention location random variable $s_t$ from Sec. 4.1.