There are new models such as BERT that requires pre-training and fine-tuning. And there are the traditional models, such as DecisionTree and SVM, which requires us to extract features from the text and train on them.
If I want to compare BERT results with the previous ones, do I need to extract features (perform feature engineering) on the text? Can I somehow use the pre-training values or vectors from BERT?