01 January 2013 26 10K Report

I have a text classification task. I used sliding windows method to pupolate my data set. The problem is that the size of the data set is huge and the data points are very similar in my data set. I would like to reduce the data set without losing informative data points. I am aware of variable selection techniques such as "kruskal.test", "limma", "rfe", "rf", "lasso", .... But how can I choose a suitable method for my problem without doing computationaly intensive operations.

Similar questions and discussions