We will find cross validation (CV) accuracy for features using SVM. Set of features those are having a high CV accuracy, we will select them. that's why i am talking about SVM for feature selection. We will use one heuristic technique also for optimization. So it means i want to apply meta-heuristic with any classification technique. So here i am starting with SVM classification technique. So please tell me is it good to start with SVM.
You can use SVM and genetic algorithm for finding the most effective features of a classification algorithm. See this paper “C.-L. Huang, C.-J. Wang, Expert Systems with Applications 31 (2006) 231–240
- A GA-based feature selection and parameters optimization for support vector machines”
Training an SVM with a wrapper technique and backward selection as feature selection method would work well, but the width of the radial basis function should be optimized for smaller vector lengths (with less features). Furthermore, training an SVM with many examples can become quite slow compared to some other methods. Therefore using immediately an SVM instead of 1-nearest neighbors is a much harder engineering job to get the optimal results.
I tend to agree with Alexey and Marco. From my own experience Random Forests are much easier to engineer, understand, and do not suffer from the painfully long training times you get on large datasets with SVM. But as Richard points out you won't know if SVM or RF is th best for your particular problem unless you try (and understand) them both!
As Marco pointed out, SVM can be used in a wrapper approach for feature selection. And I fully agree that, while this is fine for small-scale problems, it becomes quickly intractable for bigger tasks. A well-known embedded feature selection method using svm is to use the L1-norm (or approximate L0-"norm") weight regularizer in linear svms, which leads to a sparse weight vector whose non-zero components are associated to the features of interest.
The trade-off parameter can be set via cross validation. This can be very fast for even large-scale problems using fast linear solvers like liblinear.
Whenever someone mentions "feature selection" and "classifier" in one sentence, I have a knee-jerk reflex to say "PLS-DA" (partial least squares discriminant analysis) which I found really great to work with highly dimensional data sets. Previously I was using random forests, which are also very nice and offer out-of-the-box error estimates and nice variable importance scores. However, in my context, PLS-DA worked better.
There is a variant of PLS called sparse PLS, including also the sparse PLS-DA that can do a lasso- or elastic net like variable selection, such that each component has a limited number of variables with non-zero loadings. In R, it can be found in the mixOmics package.
SVM is good method but also Random Forest performs nearly good as SVM, you can do testing on Weka softwara and perform your data and see the evaluation. But I am suggesting the data you are planning to use compare it with SVM and check also how Random Forest performs, even though RF is tree.
Now LS-SVM finds good solution in many applications & has been proved by many researchers that it gives better performance than classical SVM algo. Refer more research articles on LS-SVM.
Quick aside (more of an FYI than likely a help). Different learning methods have different built-in assumptions -- this leads to the issue (already mentioned), that no one classifier is always good. Most feature selection methods also have these biases (wrapper approaches included). Hence, the "best" choice to evaluate features is often a matter of experimentation. The impact can be fairly dramatic (at times).
Those discussed above are good starting points, but, if you have the time/resources, you may want to experiment with a few to see *if* there is a 'better' performing one.
Classification accuracy totally depends on lots of parameters such as feature extraction methods used etc... but SVM found to be more used rather than ANN. So till researchers say the same
Depends on many conditions, including the case of study, window estimation, time horizons, bias-variance tradeoff and on how expert you are in using the classifier :) My Opinion.
It can work very well but needs to be handled with much care. You need to be very careful in selecting the appropriate kernel and tuning its parameters, otherwise the results can be disappointing.
Dear Dr. Michael Kemmler, can we use LASSO(L1-norm) penalization with linearSVM in Weka, I tried many feature selection methods for libSVM (leanr kernel) in WEKA but could not exactly found the way how to use LASSO with SVM for feature selection...