08 September 2015 4 3K Report

I am working with text classification using ant colony algoriithm, but basically I am confused with computation of feature vector for test set.

For training feature vector, I took TF-IDF vector for each training data, and constructed a feature matrix [docs x terms] using the TF-IDF values.

But how about computing the test set's feature vector? Should I just use the TF-IDF values in training set to compute it?

eg: In training set for a particular word "apple", the doc frequency is 5. For test set, should I use the value 5 for "apple"? Or recompute the TF-IDF based on test set?? Or rather, am I going the wrong way in computing the feature vector??

Thanks in advance!

More Panei San's questions See All
Similar questions and discussions