Memory-based Neural Network

A Memory-based Neural Network is an artificial neural network that incorporates memory modules to store and process information across sequential inputs or time steps.

AKA: Neural Network with Memory, Memory-enhanced Neural Architecture.
Context:
- It can typically maintain internal state representations that persist beyond individual inputs, enabling processing of sequential inputs and capturing of temporal dependencies.
- It can typically store contextual information from previous inputs to inform processing of current and future inputs, facilitating temporal pattern recognition.
- It can typically mitigate the vanishing gradient problem through specialized memory-based gating mechanisms that regulate information flow across time steps.
- It can often leverage memory-based attention mechanisms to selectively focus on relevant parts of stored information when generating outputs.
- It can often outperform standard neural network architectures on tasks requiring temporal reasoning such as language modeling, sequence prediction, and time series analysis.
- It can range from being an Internal Memory-based Neural Network to being an External Memory-based Neural Network, depending on its memory-based information storage approach.
- It can range from being a Simple Memory-based Neural Network to being a Complex Memory-based Neural Network, depending on its memory-based architectural complexity.
- It can range from being a Short-term Memory-based Neural Network to being a Long-term Memory-based Neural Network, depending on its memory-based temporal retention capacity.
- It can implement various memory-based access patterns including sequential access mechanisms and random access mechanisms for retrieving stored information.
- It can be trained through backpropagation through time to optimize its ability to maintain and utilize relevant information across temporal spans.
- It can incorporate different types of memory-based control mechanisms such as memory-based gating functions or memory-based addressing systems to regulate information flow.
- It can be utilized for memory-based generative modeling tasks where generating coherent outputs requires maintaining consistent context.
- It can be applied to memory-based reinforcement learning scenarios where decisions depend on historical states and actions.
- It can be trained using a Memory-based Neural Network Training System that accounts for temporal dependencies in the learning process.
- ...
Examples:
- Internal Memory-based Neural Networks, such as:
  - Recurrent Memory-based Neural Networks, such as:
    - Standard Recurrent Neural Network, maintaining a simple hidden state as implicit memory that accumulates information across time steps.
    - Long Short-Term Memory Network, implementing specialized memory-based cell states and memory-based gating mechanisms to control information flow across time.
    - Gated Recurrent Unit Network, featuring simplified memory-based gating architecture compared to LSTMs while maintaining effective long-term dependency modeling.
  - Temporal Convolutional Memory-based Neural Networks, such as:
    - Dilated Temporal Convolutional Network, using dilated convolutions to capture long-range temporal dependencies through hierarchical memory structures.
    - WaveNet Memory-based Architecture, implementing memory-based causal convolutions for modeling sequential audio data with long-range dependencies.
- External Memory-based Neural Networks, such as:
  - Memory-Augmented Neural Network (MANN)s, such as:
    - Neural Turing Machine, incorporating explicit memory-based external storage with content and location-based addressing mechanisms.
    - Differentiable Neural Computer, extending the Neural Turing Machine with dynamic memory allocation and temporal link tracking.
  - Memory Networks, such as:
    - End-to-End Memory Network, utilizing multiple memory components with learned attention mechanisms for question answering tasks.
    - Key-Value Memory Network, separating addressing and content components in external memory for improved information retrieval.
- Hybrid Memory-based Neural Networks, such as:
  - Hierarchical Memory-based Neural Networks, such as:
    - Hierarchical Attention Network, combining memory-based word-level attention and memory-based sentence-level attention mechanisms for document classification.
    - Transformer Neural Network, implementing memory-based self-attention mechanisms that serve as content-addressable memory across sequence elements.
- ...
Counter-Examples:
- Feedforward Neural Network, which processes inputs independently without maintaining any form of state or memory between inputs, unlike memory-based neural networks that store information across time steps.
- Convolutional Neural Network without recurrent connections, which captures spatial patterns through filter operations but lacks the temporal memory capabilities of memory-based neural networks.
- Perceptron Network, which implements simple linear threshold functions with no capacity to maintain information across separate inputs as memory-based neural networks do.
- Neocognitron, which uses a hierarchical structure of feature detectors but lacks explicit memory mechanisms for temporal information that memory-based neural networks provide.
- Radial Basis Function Neural Network, which performs pattern recognition based on distance measures in feature space without temporal memory capabilities of memory-based neural networks.
See: Artificial Neural Network, Neural Natural Language Translation, Attention Mechanism, Deep Learning Neural Network, Speech Recognition, Document Classification, Temporal Sequence Processing, Memory-Augmented Neural Network.

References

2018a

(Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/long_short-term_memory Retrieved:2018-3-27.
- Long short-term memory (LSTM) units (or blocks) are a building unit for layers of a recurrent neural network (RNN). A RNN composed of LSTM units is often called an LSTM network. A common LSTM unit is composed of a cell, an input gate, an output gate and a forget gate. The cell is responsible for "remembering" values over arbitrary time intervals; hence the word "memory" in LSTM. Each of the three gates can be thought of as a "conventional" artificial neuron, as in a multi-layer (or feedforward) neural network: that is, they compute an activation (using an activation function) of a weighted sum. Intuitively, they can be thought as regulators of the flow of values that goes through the connections of the LSTM; hence the denotation "gate". There are connections between these gates and the cell.
  The expression long short-term refers to the fact that LSTM is a model for the short-term memory which can last for a long period of time. An LSTM is well-suited to classify, process and predict time series given time lags of unknown size and duration between important events. LSTMs were developed to deal with the exploding and vanishing gradient problem when training traditional RNNs. Relative insensitivity to gap length gives an advantage to LSTM over alternative RNNs, hidden Markov models and other sequence learning methods in numerous applications .

2016a

(Santoro et al., 2016) ⇒ Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap. (2016). “Meta-Learning with Memory-Augmented Neural Networks.” In: Proceedings of the 33rd International Conference on Machine Learning (ICML'16).
- QUOTE: (...) memory-augmented neural network (MANN) (note: here on, the term MANN will refer to the class of external-memory equipped networks, and not other “internal” memory-based architectures, such as LSTMs).

Memory-based Neural Network

References

2018a

2016a

Navigation menu

Search