Top Ten Learning Algorithm?

Noura H. Al Nuaimi @Noura_Al_Nuaimi

01 January 1970 34 3K Report

From your point of view which is your first choice?

6: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART

Wassila Guendouzi Popular answer

Hello,

I suggest this article "Top 10 algorithms in data mining".

The corresponding link is : http://link.springer.com/article/10.1007/s10115-007-0114-2

Best regards.

Osama Alomari

Based on my best knowledge, There is no exist independent-domain learning technique.

SVM is able to yield stable generalization, but you need tuning its parameter to be appropriate to specific task.

By the Way,

K-mean is clustering method.

Apriori is Association mining technique.

Maytham Alabbas

It depends on the size, quality, nature of the data. Also, what you want to do, how much time you have and others.

Samer Sarsam

Hi Noura,

You need to compare algorithms in terms of their functional similarity. You're referring to several algorithms from different domains. K-means and EM are clusterer algorithms. Apriori is used for mining association rules, while PageRank is a ranker algorithm. So you can't compare them with rest algorithms (C4.5, SVM, AdaBoost, kNN, Naive Bayes, and CART) as they are used to solve classification problems.

On the other hand, for comparing classification algorithms, this depends on major data and algorithms characteristics.

HTH.

Samer

Wassila Guendouzi

Hello,

I suggest this article "Top 10 algorithms in data mining".

The corresponding link is : http://link.springer.com/article/10.1007/s10115-007-0114-2

Best regards.

Noura H. Al Nuaimi

Thanks Wassila Guendouzi , will look to it.

Noura H. Al Nuaimi

Samer Sarsam, let's say I am doing classification problem and the data is high dimensional. What is your preference algorithm?

Karam M Sallam

Hi Noura

According to the No Free Launch Theory, there is no algorithm/method/classifier is the best for all types of problems/data. You will find one algorithm performs good at the beginning and bad at the end based on the the data and algorithm characteristics. Thus researchers used more than one classifier in a single framework ( such as ensemble methods) to tackle this problem.

Regards;

Samer Sarsam

Noura, I often use linear classifiers for high-dimensional data. If a linear classifier is not sufficient, I try kernel methods and neural networks. In addition, I found that Random forest, Bagging, and LibLINEAR also produce excellent result. But again, there is no one best method can work perfectly all the time.

HTH.

Samer

Grzegorz Dudek

I think that neural networks (MLP) should be added to this list. They are universal approximators and are widely used in regression and classification tasks.

Anbazhagan S.

SVM

But it depends on your numerical data.

Noura H. Al Nuaimi

Samer, thanks for sharing. This is what I was looking for "people experience"

Noura H. Al Nuaimi

Grzegorz Dudek, I did not use MLP with classification before. Most of the time I run to decision tree and random forest.

Khalid Haddouch

You can't compare these all algorithms because some of them are used for prediction and authors are used for description.

Khalid Haddouch

and you can read the proposed paper by Mlle Wassila Guendouzi

Noura H. Al Nuaimi

Khalid Haddouch, Yes you are right. But, I am asking which is your first choice

Khalid Haddouch

It is in function of dressing problem (complexity, size,....), defined objective, type of data, ... . It is difficult to choose one whether for prediction or description. For example, is what you choose SOM, k-mean or CAH to cluster the groups of clients using some criteria?

Saptarsi Goswami

As far as my experience goes, I always start with naive bayes for classifiers and k-means for clustering. They provide a good baseline!

Issa Atoum

All above comments are sound. I choose Naive Bayes for classification.

Muhammad Ali

C4.5 is perhaps best and more simplified algorithm.I have tried so many tine in WEKA for classification problems.esp for decision trees.

Sujata Chakravarty

k-Means, SVM, kNN, Naive Bayes and many Evolutionary Algorithms like PSO, GA, DE, CSO etc

Desmond Bala Bisandu

SVM, K-Means, ...

Luma Alsamia

learning algorithm choosing is depended on the application and what we want to do, so the sorting here isn't enough to determine which the best!

Noura H. Al Nuaimi

Thanks Waldemar Koczkodaj will look to your publication :)

Kheireddine Lamamra

I think that the choose of the learning algorithm is depended on the type of the application. You can consult these documents which are interesting:

NEAGA, I., & HAO, Y. (2013). Towards Big Data Mining and Discovery. In Short Research Papers on Knowledge, Innovation and Enterprise, Part 2-Innovation, KIE Conference Book Series (pp. 35-43). 2013 International Conference on Knowledge, Innovation & Enterprise.

Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., ... & Zhou, Z. H. (2008). Top 10 algorithms in data mining. Knowledge and information systems, 14(1), 1-37.

Dogan, N., & Tanrikulu, Z. (2013). A comparative analysis of classification algorithms in data mining for accuracy, speed and robustness. Information Technology and Management, 14(2), 105-124.

Samuel Wong

It depends on the type of DM problem to be solved.

Noura H. Al Nuaimi

Kheireddine Lamamra, thanks for the documents. will go over them.

Samuel Wong, Yes your right. but, there is always a first choice.

Noura H. Al Nuaimi

Thanks Waldemar Koczkodaj for sharing your paper, will go over it

Noura H. Al Nuaimi

Laurent Borderie, you mean we can do our changes on the algorithm using the reverse engineer?

Jieying Wei

Hello,

I think Extreme Learning Machine (ELM) isa hot choice nowadays.

Compared with SVM and traditional neural network, the training speed of ELM is very fast, which requires less manual disturbance and strong generalization ability for heterogeneous data sets.

Uwe Reichel

You might be interested in the comprehensive Fernandez-Delgado et al. (2016) comparative classifier evaluation (179 classifiers, 121 data sets; please see link). Overall they found, that Random Forest and SVM classifiers performed best.

http://jmlr.org/papers/volume15/delgado14a/delgado14a.pdf

Noura H. Al Nuaimi

Thanks Uwe Reichel for the link. Yes, most of researchers would start with SVM. However, it is surprise me that random forest RF is better than decision tree DT, since DT is more sophisticated algorithm.

Khalid Alkharabsheh

I am conducted a set of experiments using different classifiers, and I obtained the best results with Random forest. In my work case Random forset was the better

Uwe Reichel

@Noura H. Al Nuaimi: I don't quite get your point that a DT was more sophisticated than an RF, since the latter actually consists of DTs. Each tree is trained on a feature subset only, so that the ensemble of these weak classifiers turns out to be pretty robust against overfitting to the training data. This makes RFs quite powerful.

Badges
Science topic

Similar topics
Computer Science
Data Mining

Publisher Accept Survey Paper - Journal Suggestion?

Journals that accept survey papers?

Get Linkedhashmap item index?

K-greedy algorithm JAVA?

Feature Selection for big data?

How can we select specific attributes using WEKA API?

Feedback defines the constitution of an organism?

How to learn more about SPSS and its Application?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

Do you know best mines of western part of Afghanistan?

Is Galaxy.org good to use for research for analyzing data and for publication?

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

Measuring the Intelligence of a Species?

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?