This is a very big topic that is at the center of a big discussion. The short answear is it depends. In practice some form of cross-validation is typcally applied. However, there are ways to make an informed pre-selection.
1) What was the process that generated the data? Can they assumed to be Indepedent and Identically Distributed (IID) or not? If yes most methods are applicable in principle, if not you need to consider methods that explicitly deal with non IID data. HMM are one of them for example.
2) How many training points you have? More complex models require more training points to determine their free variables.
3) How many features do you have? Maximum likelihood methods often do not do very well with a very high number of features
4) How important is to be able to interpret the results of the model in each step? Some methods like decision trees are very good for that but perhaps not so good in terms of accuracy (they often suffer from large variance). The latter is addressed by Random Forests, which offer a lot better accuracy but are not so good in terms of interpretability
5) Do you expect to have outliers and how many of them? Some models are more robust than others
6) Is computational complexity an issue (related also to number of samples and features). For example non-linear SVMs are N^3 to the number of points and this can be a problem when you have more than a few hundrend thousands.
7) Do you have killer features already engineered or would you prefer to have them model learn them also? Deep learning architectures might be of interest in the latter case (but they can also be computationally expensive)
8) Application specific considerations: can your model closely match the assumptions of the problem at hand? For example various kernels for SVMs capture things like rotation invariance, etc. that might be important for a given application. If a model can captures these it will likely perform better than one that does not (deep learning will try to find these relations also but it really needs a lot of data to it)
This is a very big topic that is at the center of a big discussion. The short answear is it depends. In practice some form of cross-validation is typcally applied. However, there are ways to make an informed pre-selection.
1) What was the process that generated the data? Can they assumed to be Indepedent and Identically Distributed (IID) or not? If yes most methods are applicable in principle, if not you need to consider methods that explicitly deal with non IID data. HMM are one of them for example.
2) How many training points you have? More complex models require more training points to determine their free variables.
3) How many features do you have? Maximum likelihood methods often do not do very well with a very high number of features
4) How important is to be able to interpret the results of the model in each step? Some methods like decision trees are very good for that but perhaps not so good in terms of accuracy (they often suffer from large variance). The latter is addressed by Random Forests, which offer a lot better accuracy but are not so good in terms of interpretability
5) Do you expect to have outliers and how many of them? Some models are more robust than others
6) Is computational complexity an issue (related also to number of samples and features). For example non-linear SVMs are N^3 to the number of points and this can be a problem when you have more than a few hundrend thousands.
7) Do you have killer features already engineered or would you prefer to have them model learn them also? Deep learning architectures might be of interest in the latter case (but they can also be computationally expensive)
8) Application specific considerations: can your model closely match the assumptions of the problem at hand? For example various kernels for SVMs capture things like rotation invariance, etc. that might be important for a given application. If a model can captures these it will likely perform better than one that does not (deep learning will try to find these relations also but it really needs a lot of data to it)
As some here have mentioned, it truly is a tricky question to answer... But in my experience, I would say go with a maximum margin classifier such as support vector machines. It can be considered the best off the shelf classifier to date.
That's a very interesting question and, unfortunately, there is not a precise methodology for tackling the problem. However, there are some guidelines, as Dr. Merentitis well said. I will add another one: there is a field, data complexity, which studies this issue from a practical point of view, showing the sweet spot of every learner in a systematic way. See the following article for a comprehensive review:
Núria Macià, Ester Bernadó-Mansilla, Albert Orriols-Puig, Tin Kam Ho: Learner excellence biased by data set selection: A case for data characterisation and artificial data sets. Pattern Recognition 46(3): 1054-1066 (2013)
This field tries to characterize every learner performance using distinct metrics that tell distinct facets of the problem space, therefore showing when a learner is better than other and why. In addition, the same authors wrote a program that handle these metrics and that can be used freely. This program, DCoL, can be downloaded from the following link: http://dcol.sourceforge.net
I only can agree the thread, however assuming you want a concrete recommandation, my 5-cent is the following:
1. start with a Bayesian classifier (e.g. naive Bayes) -> if it does not better than random then look at your data applying the following tests: a. are there any obvious pattern that you can encode (e.g. boundaries; sequences meaningful for a human expert) -> if yes, then encode it; b. how imbalanced are these data (positive vs. negative instances) -> if yes then compensate the imbalance (see cross-validation, bagging, bootstrapping methods);
2. if any of the above looks effective, then consider using more advanced learning algorithms: for large multi-class classification problems (e.g. a million class x millions of features), consider lazy learning & low complexity algorithms (e.g. k-NN); for small-medium scale classification problems then consider kernel-based algorithms (e.g. SVM, NN...).
Recall and Precision, Sensitivity and Specificity are what you are looking for.
These are Accuracy measurement metrics that we use heavily in the industry (where the algorithms should meet customer's criteria of correctness). If you are looking for practical implementations, checkout: ROC curves and TPR, FPR.
Have you considered an algorithm portfolio approach. This type of approach tests many algorithms and gives you an idea of the algorithms performing the best.