You can use MobileBERT, it's a compact BERT model open sourced on GitHub. ... or the other Google open source models using projection methods, namely SGNN, PRADO and pQRNN. pQRNN is much smaller than BERT, but is quantized, and can nearly achieve BERT-level performance, despite being 300x smaller and being trained on only supervised data.
Ensemble Deep Learning based Multilabel Binary Classification can be used for text classification.
Further, I suggest the use of gated recurrent unit and support vector machine for some remarkable results. The gated recurrent unit can perform well in sequential learning tasks and overcomes the issues of vanishing and explosion of gradients in standard recurrent neural networks when capturing long-term dependencies.
You can use MobileBERT, it's a compact BERT model open sourced on GitHub. ... or the other Google open source models using projection methods, namely SGNN, PRADO and pQRNN. pQRNN is much smaller than BERT, but is quantized, and can nearly achieve BERT-level performance, despite being 300x smaller and being trained on only supervised data.