Bidirectional Recurrent Neural Network (BiRNN) Training System

From GM-RKB
Jump to navigation Jump to search

A Bidirectional Recurrent Neural Network (BiRNN) Training System is a Recurrent Neural Network Training System that implements a BRNN Forward Pass Algorithm and a BRNN Backward Pass Algorithm (to create a BiRNN).



References

2018a

  • (Github, 2018) ⇒ Theano-Recurrence Training System: https://github.com/uyaseen/theano-recurrence#training Retrieved: 2018-07-01
    • train.py provides a convenient method train(..) to train each model, you can select the recurrent model with the rec_model parameter, it is set to gru by default (possible options include rnn, gru, lstm, birnn, bigru & bilstm), number of hidden neurons in each layer (at the moment only single layer models are supported to keep the things simple, although adding more layers is very trivial) can be adjusted by n_h parameter in train(..), which by default is set to 100. As the model is trained it stores the current best state of the model i.e set of weights (best = least training error), the stored model is in the data\models\MODEL-NAME-best_model.pkl, also this stored model can later be used for resuming training from the last point or just for prediction/sampling. If you don't want to start training from scratch and instead use the already trained model then set use_existing_model=True in argument to train(..). Also optimization strategies can be specified to train(..) via optimizer parameter, currently supported optimizations are rmsprop, adam and vanilla stochastic gradient descent and can be found in utilities\optimizers.py. b_path, learning_rate, n_epochs in the train(..) specifies the 'base path to store model' (default = data\models\), 'initial learning rate of the optimizer', and 'number of epochs respectively'. During the training some logs (current epoch, sample, cross-entropy error etc) are shown on console to get an idea of how well learning is proceeding, logging frequencycan be specified via logging_freq in the train(..). At the end of training, a plot of cross-entropy error vs # of iterations gives an overview of overall training process and is also stored in the b_path.

2018b

  • (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Bidirectional_recurrent_neural_networks Retrieved:2018-7-1.
    • Bidirectional Recurrent Neural Networks (BRNN) were invented in 1997 by Schuster and Paliwal. BRNNs were introduced to increase the amount of input information available to the network. For example, multilayer perceptron (MLPs) and time delay neural network (TDNNs) have limitations on the input data flexibility, as they require their input data to be fixed. Standard recurrent neural network (RNNs) also have restrictions as the future input information cannot be reached from the current state. On the contrary, BRNNs do not require their input data to be fixed. Moreover, their future input information is reachable from the current state. The basic idea of BRNNs is to connect two hidden layers of opposite directions to the same output. By this structure, the output layer can get information from past and future states.

      BRNN are especially useful when the context of the input is needed. For example, in handwriting recognition, the performance can be enhanced by knowledge of the letters located before and after the current letter.

2018c

Fig. 3 Unfolded architecture of bidirectional LSTM with three consecutive steps

2017

2015

2008

for t = 1 to T do

Do forward pass for the forward hidden layer, storing activations at each timestep

for t = T to 1 do

Do forward pass for the backward hidden layer, storing activations at each timestep

for' t = 1 to T do Do forward pass for the output layer, using the stored activations from both hidden layers

Algorithm 3.1: BRNN Forward Pass
Similarly, the backward pass proceeds as for a standard RNN trained with BPTT, except that all the output layer δ terms are calculated first, then fed back to the two hidden layers in opposite directions:
for t = 1 to T do

Do BPTT backward pass for the output layer only, storing δ terms at each timestep

for t = T to 1 do

Do BPTT backward pass for the forward hidden layer, using the stored δ terms from the output layer

for' t = 1 to T do Do BPTT backward pass for the backward hidden layer, using the stored δ terms from the output layer

Algorithm 3.2: BRNN Backward Pass

2005a

2005b

1999

1997