I wonder if word embeddings such as fasttext or word2vec really improve classification results. Are there any studies that acknowledge significant improvement of classification of natural/language texts (>5%).
What is still not clear to me why they calculate results with word embeddings e.g. CBOW although maybe only BOW is sufficient? Why don't they use BOW as a base line?
Yes, I am aware of the context. But still, why should I consider the overhead of word embeddings if they don't improve the BOW results. To consider the context, it is possible to use grammar rules.
I recently worked on a research project to classify the "risk type" in construction documents. The paper is under review and when publish I can share that.
Basically, I Compared the following NLP models:
1. TF-IDF + (support vector machine (SVM) with Stochastic Gradient Descent (SGD), Logistic Regression (LR), and Bernoulli Naïve Bayes (BNB))
2. Word2vec + (SVM, LR, BNB)
3. FasTtext + (SVM, LR, BNB)
4. Bidirectional Encoder Representations from Transformers (BERT)
BERT model outperforms other NLP models significantly.