Today we have a variety of machine learning techniques available which are mature and used in various real-life problems. Among all these techniques, which one is the best and most commonly used machine learning technique with a good accuracy rate?
This depends on the problem at hand in an unpredictable way, so the way to go is to evaluate many different methods via cross-validation and then go with the best. Sometimes SVM will be best. Other times, Random Forests will give you better out of sample accuracy. Or go with a weighted combination of different methods (based on cross-validation accuracy), aka Super Learners. One important part of fitting these classifiers is to make sure you optimize your tuning parameters for your particular data/problem, otherwise accuracy will suffer. I have some slides on fitting such models for text data here https://dl.dropbox.com/u/25710348/CSSscraping/text2data.pdf; and an R lab demonstrating various techniques here: https://dl.dropbox.com/u/25710348/CSSscraping/scripts/Text2Data.R
You can also try kNN(k-nearest neighbor) algorithm, being instance-based machine learning approach. More training samples you take, better result is obtained.
It depends on your goal, I mean if you need a very precise method to classify data probably using Random Forest or tuning well a RBF SVM kernel you can find a very suitable classifier. If you're interested in classifying and interpret features importance, then a simple linear SVM is the best trade off, furthermore the computational cost of SVM is sufficiently low.
In research there is nothing called better. If there is a best way, it will be the only dominated way. For any problem you must specify if you can get enough data with output or not to chose first a super or non-supervised learning. Secondly, you must select on your goal, is it to classify or predict etc...... Third, which one is easier to program. Sometimes it is more easy to chose linear Reg over SVM because it is easy to implement. I hope this was helpful
No technique is better. Analysis of different technique has showed that every technique has a limitation. It depends on your application platform. Like GA and GP are good to use for scientific applications. A proper analysis of current application platform is required to implement one. It is a very good area of research to find out the technique which is is best one. But all the learning is experimental basis. It is hit and trial method. You can invent one by it, but it is time taking.
This question was originally asked by David Wolpert. In his worked he derived the No Free Lunch Theorem. This states that if you have no prior assumption on the nature of the problem we cannot expect an algorithm to be superior to the other. To my knowledge this theorem has not been proven incorrect.
A good reference to introduce you to this theorem is the book by Duda, Hart and Stork "Pattern Classification" and the work by Wolpert.
I feel the best approach depends on purpose of study or problem dependent. Also dependent on quality of features one extract from the datasets, availabality of good amount of quality data for training etc.