Long Short-Term Memory Unit Activation Function

From GM-RKB
Jump to navigation Jump to search

A Long Short-Term Memory Unit Activation Function is a neuron activation function that implements LSTM Neurons with forget gates.



References

2018a

  • [math]\displaystyle{ a }[/math] : sources of cell input
  • [math]\displaystyle{ i }[/math] : sources of input gate
  • [math]\displaystyle{ f }[/math] : sources of forget gate
  • [math]\displaystyle{ o }[/math] : sources of output gate
Second, it computes the updated cell state c and the outgoing signal h as:
[math]\displaystyle{ c=tanh(a)\sigma(i)+c_{prev}\sigma (f) }[/math],

[math]\displaystyle{ h=tanh(c)\sigma(o) }[/math],

where [math]\displaystyle{ \sigma }[/math] is the elementwise sigmoid function. These are returned as a tuple of two variables.

This function supports variable length inputs. The mini-batch size of the current input must be equal to or smaller than that of the previous one. When mini-batch size of x is smaller than that of c, this function only updates c[0:len(x)] and doesn’t change the rest of c, c[len(x):]. So, please sort input sequences in descending order of lengths before applying the function (...)

2018b

2018c

  • (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Long_short-term_memory Retrieved:2018-2-25.
    • Long short-term memory (LSTM) units (or blocks) are a building unit for layers of a recurrent neural network (RNN). A RNN composed of LSTM units is often called an LSTM network. A common LSTM unit is composed of a cell, an input gate, an output gate and a forget gate. The cell is responsible for "remembering" values over arbitrary time intervals; hence the word "memory" in LSTM. Each of the three gates can be thought of as a "conventional" artificial neuron, as in a multi-layer (or feedforward) neural network: that is, they compute an activation (using an activation function) of a weighted sum. Intuitively, they can be thought as regulators of the flow of values that goes through the connections of the LSTM; hence the denotation "gate". There are connections between these gates and the cell.

      The expression long short-term refers to the fact that LSTM is a model for the short-term memory which can last for a long period of time. An LSTM is well-suited to classify, process and predict time series given time lagsof unknown size and duration between important events. LSTMs were developed to deal with the exploding and vanishing gradient problem when training traditional RNNs. Relative insensitivity to gap length gives an advantage to LSTM over alternative RNNs, hidden Markov models and other sequence learning methods in numerous applications.

2001