Hello,

I have built a naive bayes classifier in java in order to help sort through publications for a meta analysis project. When developing the classifier I used an available training set that consisted of positive and negative reviews and it was able to predict with about 90% accuracy which ones were which. However now that I transition data sets to a data set more representative of what my "real" data set will be (mostly useless articles with a couple relevant articles scattered throughout the set) I am having some trouble. I am still predicting with high accuracy (~96%) but it never classifies any of the articles as being relevant. I am guessing because statistically it will always make sense to guess negative because that is by far what is most abundant. Is there any way to deal with this problem? Any suggestions would be helpful. Thanks!

More Stuart Fass's questions See All
Similar questions and discussions