20 January 2021 2 827 Report

The BERT is described in the paper 《BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding》.

The RoBERTa is described in the paper 《RoBERTa: A Robustly Optimized BERT Pretraining Approach》.

Now 3 years past. Are there any pretrained-language-model that surpass them in most of the task? (Under the same or nearby resources)

Speedup without accuracy decreasing is also considered as a better one.

More Tong Guo's questions See All
Similar questions and discussions