Your question is so general, indeed it depends on the problem and your data set and also the presentation, you should explain more so others will be able to help you in the right way, good luck
The cross validation comes after the classifier is selected. I meant to ask about some initial tests that could be performed on data itself to guide me in selecting a particular type of classifier.
Dear Negar,
I wanted to know it very generally whether there are certain set of rules to help me decide the classifier from any given dataset.
As far as I know there is no a well defined rule for such task. In general, it depends on the kind of data and amount of samples x features. For instance, I would recommend to use naive Bayes or linear SVM for text classification/categorization. For datasets with numerical attributes: I would suggest linear SVM, neural networks or logistic regression if the amount of features is much greater than the number of samples. On the other hand, I would recommend neural networks or SVM with RBF or polynomial kernel if the amount of samples is not too large and greater than the number of features. Otherwise, if the number of samples is huge I would suggest to use neural networks or linear SVM, and so on. Obviously, there are other options for each scenario than those I have mentioned.
I have read about the evaluation techniques that you have mentioned. I would like to know details about meta learning, or some other methods where the data itself should pick the classifier with least effort on the user side. It would be kind if you suggest some papers for the same.
Dear Tiago,
Thanks for such an explanatory answer with examples. It would be appreciated if you could suggest some papers that explain the selection of classifier based on data-sets (some sort of review paper).
Thanks for pointing out the dimensionality of data that often restricts the visualization process. Is it a good practice to visualize higher dimension data by dividing them into lower dimensions.
I have attached a sample plot for two class problem depicting the distribution of four different features. What inferences can be drawn about the choice of classifier from the scatter plots?
There are possible two stategies to perform: a) knowing a bit of the underlying production process of your data, if your data also have a dynamical component, that means the characteristics of the data presented is also dependend of time (e.g. speech I can say something "veeery" slow or "very" fast - it results in the same characteristics but my classifier should be aware of these dynamic time warping. Than I should use a dynamic classifier as Hidden Markov Models. If this is not the case than I can try to use static classifiers as Neural Networks, or SVMs.
But as far as i know, there is no statistical test around, to decide this question. One method, that is addressed often in this discussion is, to compare different classifiers with the same features and than use that one with best performance. Bit this requires the knowledge of optimal parameter settings for all investigated classifiers, which one wold normally do after a suitable classifier is selected.
For statical classification tasks, you can also use the tool WEKA it is a datamining tool, but also includes tools for data pre-processing, classification, regression, clustering, association rules, and visualization (http://www.cs.waikato.ac.nz/ml/weka/)