What Michael describes is known as the 'no free lunch theorem' of machine learning: Simply put, there is no best learning algorithm, only one best algorithm for a particular dataset.
Hi, there is no simple answer to your question. Random Forests are very good, someone might prefer something else. Generally, it is a good idea to use more methods and then ensemble their results together (e.g. using voting or averaging).
What Michael describes is known as the 'no free lunch theorem' of machine learning: Simply put, there is no best learning algorithm, only one best algorithm for a particular dataset.
I think the question should be improved first. The standard definition of accuracy from statistics literature says that an estimation that is less probable to be proved wrong is more accurate. For example if I say the temperature tomorrow is between -40 to +50 centigrade it is accurate, but not useful or informative to make a decision based on it. In contrast a prediction between 10-11 is very precise but can be wrong with a high probability. In machine learning we prefer to talk about: 'precision' and 'recall' based on true positive, true negative, false positive , false negative and 'F score' and 'G score' when a ground truth is available to compare results with. This is the case for classification.
The quality of an algorithm depends on the question and the data that you have in hand. How big is the data? Is the data longitudinal or not? is it high dimensional or low? is it consisted of regular shaped classes or not? what is the rate and structure of missings and if you can curate it? what is the type of majority of variables.
Based on above questions you would take different strategies for data curating, feature selection and classification and all of those decisions would affect your model performance.
You'd better know some basics of classification algorithms and know how to tune parameters. To get a good accuracy, selecting a classifier is not as important as tuning parameters. I recommend libsvm, you can find the mannual, pratical guide and FAQ on the Libsvm author's website. The author have a "standard" process to use libsvm for newbie, you can try it, although it is not based on weka(btw, weka can use libsvm for classification).
Hi Vaishali, If you have high dimentional dataset then you prefer SVM. Even there is no single classification algorithms is best. But if you combine two algorithms then you can get best result.
if u r searching for best algorithm then dont think of Weka software. u should not restrict your self with the software. there r many good algortithms which do not support weaka.you should use ensemble approach of algorith. wish u all the best.
Zero-R....however Kappa should be a concern for understanding the bias of data like entropy...random forests is resilient to overfitting and subset of predictors give better results for accuracy for most of the outcome.
Selecting suitable machine learning algortihm is problem-dependent. For example, Naive Bayes Classifier which does not have complex parameter setting is generally used in text classiffication problems. Logistic Regression classifier may give better solutions in cost learning problems.
I don't think that this question can be answered in this manner. The choice of an algorithm is dependent on the dataset. It basically depends on the distribution and type of data you are handling. The only option to the best of my knowledge is to try and evaluate.
I have intesively used WEka in my research. For my Case, Random Forest (using 100 base learners), AdaBoost.M1 (using J4.8 and 100 base learners), SVM (using SMO and p =3), and SVM (using RBF kernel with relative C and Gamma parameters) attained the best results. However, it is all depend on the problem that you are tackling. It might be different. I would suggest to use these classifiers as well as (ANN) to check your performance.
Check the following paper:
A Combination of Feature Extraction Methods with an Ensemble of Different Classifiers for Protein Structural Class Prediction Problem