I used BERT in a few papers and I was going to use it in a future project. I needed to understand its drawbacks in a real ongoing project.

I read somewhere that its biggest disadvantage is that it is very compute-intensive at inference time, meaning that if you want to use it in production at scale, it can become costly.

More Issa Annamoradnejad's questions See All
Similar questions and discussions