File:2018 BERTPreTrainingofDeepBidirectio Fig1.png

From GM-RKB
Jump to navigation Jump to search

Original file(1,194 × 274 pixels, file size: 85 KB, MIME type: image/png)

Summary

Figure 1: Differences in pre-training model architectures. BERT uses a bidirectional Transformer. OpenAI GPT uses a left-to-right Transformer. ELMo uses the concatenation of independently trained left-to-right and rightto-left LSTM to generate features for downstream tasks. Among three, only BERT representations are jointly conditioned on both left and right context in all layers.

Copyright: Devlin et al. (2018).

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeThumbnailDimensionsUserComment
current05:03, 28 April 2019Thumbnail for version as of 05:03, 28 April 20191,194 × 274 (85 KB)Omoreira (talk | contribs)<B>Figure 1:</B> Differences in pre-training model architectures. BERT uses a bidirectional Transformer. OpenAI GPT uses a left-to-right Transformer. ELMo uses the concatenation of independently trained left-to-right and rig...

Metadata