Bidirectional Encoder Representations from Transformers (BERT) Network Architecture

From GM-RKB
Jump to navigation Jump to search

A Bidirectional Encoder Representations from Transformers (BERT) Network Architecture is an transformer-based encoder-only architecture based on a Multi-layer Bidirectional Transformer Encoder Neural Network architecture.



References

2019a

...

Figure 3: Differences in pre-training model architectures. BERT uses a bidirectional Transformer. OpenAI GPT uses a left-to-right Transformer. ELMo uses the concatenation of independently trained left-to-right and right-to-left LSTM to generate features for downstream tasks. Among three, only BERT representations are jointly conditioned on both left and right context in all layers. In addition to the architecture differences, BERT and OpenAI GPT are fine-tuning approaches, while ELMo is a feature-based approach.