I have a training set of ca. 2500 manually categorized abstracts and want to automatically categorize 4M papers. What is the best tool to download? Till now, I have only 90% with Kappa at 0.8 in cross validation. I used NaiveBayesMultinomial on 2500 features -- statistics of most frequent words. As preprocessing, I removed stop words and stemmed the text. 

Similar questions and discussions