What is the difference between Statistics , Machine Learning, Data mining and Pattern Recognition?

Ji He Popular answer

In layman's language, statistics is a way to infer patterns from data based on existing model; machine learning is a heuristics to have the computer form its own model from the data; data mining and pattern recognition are applications (not methods) that can be done through either statistics or machine learning; and pattern recognition is a sub-field of data mining. Many people would just claim they do all of them, I guess.

I do woodworking and carpentry using routers and saws, etc., BTW ;)

Ji He

I do woodworking and carpentry using routers and saws, etc., BTW ;)

Hoang Thanh Lam

For me, data mining is a process that discover useful and surprising knowledge from data. Data miners get raw data from users and users may ask them questions:

Tell me what is important in the data? in this case we have frequent pattern or association rule mining.

Tell what is unexpected or surprising in the data? in this case we have outlier, change or abnormal detection.

I want to see something about the data? in this case we have visual analytics.

Many useful knowlegdes discovered from the data are then exploited for building prediction, recommendation or classification models.

Joël Quinqueton

I think that the difference is basically an historical one.

Statistics is the earliest of these 4 fields, first coming as applied Mathematics. There are works on classification in the beginning of the 19th century (even before Fisher's 1936 seminal paper on "the use of multiple measurements in taxonomic problems").

Then came Pattern Recognition (PR), in a period (the 1970's) where Computer Science was centered on perception problems (OCR, Speech Recognition, image Processing,...). Machine Learning (ML) appeared in the 1980's as an Artificial Intelligence field.

Data Mining (DM) appeared later, as a subfield of Data Base Engineering.

Of course, from the functional point of view, Ji He is right as PR and DM can be considered as applications of ML, as well as ML can be considered as application of Statistics to Computer Science.

Hassan Nasser

Machine learning focuses on prediction, based on KNOWN properties learned from the training data.

Data mining (which is the analysis step of Knowledge Discovery in Databases) focuses on the discovery of (previously) UNKKNOWN properties on the data.

The two areas overlap in many ways: data mining uses many machine learning methods, but often with a slightly different goal in mind. On the other hand, machine learning also employs data mining methods as "unsupervised learning" or as a preprocessing step to improve learner accuracy. Much of the confusion between these two research communities (which do often have separate conferences and separate journals, ECML PKDD being a major exception) comes from the basic assumptions they work with: in machine learning, performance is usually evaluated with respect to the ability to reproduce known knowledge, while in Knowledge Discovery and Data Mining (KDD) the key task is the discovery of previously unknown knowledge. Evaluated with respect to known knowledge, an uninformed (unsupervised) method will easily be outperformed by supervised methods, while in a typical KDD task, supervised methods cannot be used due to the unavailability of training data.

Source: Wikipedia.

For my work, I do maximume entropy inference from neural network data and I suppose it is a kind of mixture betwenn statistics and machine learning.

What do you think the killer application of machine learning?

Feedback defines the constitution of an organism?

How to learn more about SPSS and its Application?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

Do you know best mines of western part of Afghanistan?

Is Galaxy.org good to use for research for analyzing data and for publication?

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

Measuring the Intelligence of a Species?

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?