DistilBERT Model

From GM-RKB
Jump to navigation Jump to search

A DistilBERT Model is a transformer-based model that is achieved through a process called knowledge distillation.

  • Context:
    • It can retain 97% of BERT's performance on the General Language Understanding Evaluation (GLUE) benchmark, despite having 40% fewer parameters​``【oaicite:6】``​​``【oaicite:5】``​.
    • It can perform comparably on downstream tasks like IMDb sentiment classification and SQuAD v1.1 question answering, while significantly reducing model size and inference time​``【oaicite:4】``​.
    • It introduces a triple loss that combines language modeling, distillation, and cosine-distance losses during the pre-training phase​``【oaicite:3】``​.
    • It can be an efficient option for edge applications, demonstrated by its substantially faster inference times on mobile devices compared to BERT-base​``【oaicite:2】``​.
    • ...
  • Example(s):
    • ...
  • Counter-Example(s):
  • See: Transformer Architecture, BERT, Knowledge Distillation, GLUE Benchmark.


References

2019