Dear all,
I have annotated a dataset of 7200 tweets with three sentiment classes: positive, negative and neutral.
After training and testing multiple classification models with 6000 training + 1200 testing of the dataset, I used the best model to automatically predict the sentiment classes of an unlabeled dataset (almost 500K tweets).
Now I would ask about the testing of the automatically annotated dataset either to use 80% and 20% for training and testing new models or train new models on all dataset and test using the 20% of the manually annotated dataset.
I want to compare the results obtained from both manually and automatically annotated datasets therefore I a am thinking of using the same testing set.
Please guide me.