Every machine learning algorithm has its advantages and disadvantages over other algorithms. Now, if we are asked to use an algorithm for classification then how do we decide whether we should go for SVM, GMM, or any other topic modelling technique?
Depends on the use. For example: if you aim to make something that will work in real time you have to start with ML algorithms that are fastest and easy to embed. The best way to find the answer to your problem is to use all of them and check which one is fastest and provide the best results in classification. Almost all of these ML algorithms are available in matlab and they ask the input data in the same format. you can make then a small code that computes the SEN and PPV for each algorithm and the same input data. Then you can make the decision. good luck
Many things can be considered to choose a proper algorithm but I think the priority is data status. Most ML techniques are specialized for specific data distribution. For example, linear or nonlinear processing. Therefore we need to scrunitize data first. To analyze it we first apply simple model and then add more complex ones one by one. Hope to help you with this.
In fact there's a field that tries to answer your question: see (1) T.K. Ho, M. Basu, Complexity measures of supervised classification problems,
IEEE Transactions on Pattern Analysis and Machine Intelligence (2002), and (2) N. Macià, A. Orriols-Puig, E. Bernado, T.K. Ho, Learner excellence biased by data set selection: A case for data characterisation and artificial data sets, Pattern Recognition.
A small amendment on Luca Parisi comment. I agree with you that my proposed method it can be affected by bias. And I strongly believe that there is no ML which can work perfect on any data. in the literature almost every person has their own preference based on objective or subjective reasons. I suggested to Muhammad to start with something and then he can decide if he needs other ML or not. Moreover, if the feature selection is not done properly this will affect also the performace of the classifier. Regarding the comment on speed I stand by it. In my work I am developing classifiers for real time processing, in this case the speed of training and classification needs to be virtual zero. in the end, it depends on what data you process.
to resume: what I found that it works (at least in my case with physiological signals) is a ML based on physiology combined with decision trees. good luck and thank you Luca for your comments
The best and the simplest practical way to compare classifiers i.e., various classification methods (algorithms, approaches) is to use the double (nested) crossvalidation, a.k.a. double resampling. It can take quite a lot of CPU time, but it is worth of trying to get the best algorithm for some particular type of classification tasks.
I think it is difficult to chose a classifier for a given problem or to judge that a given classifier is better than other without testing the classifier in different test sets relating to the problem. Another issue which I think is very important is the features that you select for a given problem. This is a very critical choice. With regard to the size of the dataset, most of the time the classifier which has more training data tends to give better result than the other classifiers.