How does Bayesian inference compare against other machine learning models?

The goal of Bayesian optimization is to find an optimal configuration of a system with a limited budget of experimental trials. These methods employ a probabilistic surrogate model to make predictions about possible outcomes of unobserved configurations. To search for optimal configurations, we define an acquisition function that uses the surrogate model to assign each configuration a utility. Configurations with the highest utility are tested on the system, and the process repeats. The performance of a Bayesian optimization algorithm is therefore determined by three components: the surrogate model, the acquisition function, and the methods that numerically optimize the acquisition function.

Herman Carstens

Strictly speaking, Bayesian inference is not machine learning. It is a statistical paradigm (an alternative to frequentist statistical inference) that defines probabilities as conditional logic (via Bayes' theorem), rather than long-run frequencies. Since the calculus of conditional probability is usually intractable, numerical methods such as Markov Chain Monte Carlo (MCMC) are used to determine the ("posterior") distributions on the parameters of interest. The output of the Bayesian model is therefore a probability distribution, not a point estimate, as mentioned by others in this thread. It has its advantages and disadvantages, depending on what you compare it to, and can be built into various existing algorithms, such as ANNs, Random Forests, Regression, etc. It can also be applied to model selection (Bayesian optimization) and many other problems, because it is not an algorithm like the standard ML algorithms: it is a different way of thinking about probability.

Robert K Niven

Thanks Masab! Bayesian inference is a rigorous method for inference, which can incorporate both data (in the likelihood) and theory (in the prior). However, its use has been clouded by a lack of understanding of the concept of probability by many scientists and engineers, as an assignment based on what you know, rather than (merely) a measureable frequency. Have a look at some of my tutorial slides here to read more about this, and make sure you read Jaynes 2003.

Some of us are working in this direction to fix this anomaly!

Danilo Rastovic

Bayesian inference je good for quasi- regularization and the choise of optimal actions.

Nadheesh Jihan

Bayesian inference is a method used to perform statistical inference (e.g. inferring values of unknowns given some data). It is not a machine learning model, it is much more. The learning process based on Bayesian inference is called Bayesian learning. Try this to understand concepts related to Bayesian Learning : https://wso2.com/blog/research/part-one-introduction-to-bayesian-learning

Conceptually, you can perform Bayesian inference to train the almost all the machine learning models that we know (i.e. linear regression to complex deep learning models), once we design the corresponding probabilistic model. This explains how to perform Bayesian inference with traditional linear regression model : https://wso2.com/blog/research/part-two-linear-regression

However, Bayesian inference is not tractable for most of the models (due to the normalization term in the Baye's theorem). Therefore, in practice, we mostly use approximation inference techniques (which gives close enough answer to Bayesian inference) such as variational inference (Deep probabilistic models based on Variational Inference - http://dustintran.com/papers/TranHoffmanSaurousBrevdoMurphyBlei2017.pdf). You will be surprised to find that most of the complex tasks are now solved using probabilistic models that follow approximation inference techniques instead of classical machine learning (and deep learning). However, it is not easy to understand and implement such models, it requires understanding a lot of statistical concepts etc. Therefore, the community that adopts Bayesian inference for machine learning (or deep learning) is limited to few those who have sufficient knowledge to handle them effectively.

Also, it is important to point-out that Bayesian Learning may perform better in terms of accuracy for certain scenarios compared to traditional machine learning and deep learning models, and yet for simple regression and classification tasks, it may exhibit similar performance. Moreover, Bayesian learning enables the following features contrast to conventional ML;

Incorporating prior knowledge or beliefs to the observed data when training models
By using Bayesian updating, you can incrementally update the model
Flexible modeling of features via hierarchical models
The ability to express uncertainty of estimated model parameters and predictions.

The final feature is only possible if we perform full Bayesian learning when we estimate the probability distribution of the model coefficients (that represent the probability of all potential values to be the model parameter). Uncertainty of predictions can be useful to understand the confidence of the model about each prediction (notice that this is different from the confidence intervals from frequentist statistics), and to detect the regions that lacks sufficient training data (Bayesian optimization is evolved around this concept).

Saif Eddin Jabari

At a very high level, machine learning (ML) mostly concerns the solution algorithms used for fitting, typically maximum likelihood estimation. Bayesian inference (BI) has to do with how probability is interpreted (a measure of belief) and the consequences of this (e.g., parametric randomness as opposed to random samples in frequentist probability). So, they are not mutually exclusive notions. BI involves determining a posterior distribution, which involves solving an integral. Depending on how you cast the numerical integration problem, ML techniques can come in handy. A key similarity is that BI involves updating a posterior distribution as more data become available; you can think of this as a learning process. Also, prior distributions in BI can help produce regularizers (in a natural way) in ML.

Abhijeet Singh

Bayesian inference is a probabilistic system, it gives probability. Other system can be called better (may be) as they give prediction.

Yosra Mohammed

It's widely used in machine learning. Bayesian model averaging is a common supervised learning algorithm. Naïve Bayes classifiers are common in classification tasks. Bayesian are used in deep learning these days, which allows deep learning algorithms to learn from small datasets.

Shafagat Mahmudova

Dear Masab Ahmad,

look the link, maybe useful.

https://wso2.com/blog/research/comparing-bayesian-and-classical-learning-techniques

Regards,

Shafagat

What percentage of program completion time is spent in synchronization on GPUs?

How do recent works against Spectre and other architectural attacks stack up in performance and usability?

Has machine learning improved current weather modeling predictions?

What type(s) of machine learning or AI algorithms are considered most dangerous for automation of jobs?

What are the implications of computing on today's environment?

Which is the best parallel implementation for Dijkstra's algorithm in graph analytics?

What is the purpose of performance predictors? Which predictors are the best available?

What are your views on the latest advancements in quantum computing for machine learning algorithms?

Are quantum computers going to be used as general purpose machines? or as accelerators connected to the CPUs of today?

Why are Nvidia's GPUs more popular than AMD's?

Feedback defines the constitution of an organism?

How can I prepare virus for a TEM or SEM imaging?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

How can I apply boundary conditions in an orthotropic steel deck numerical model using ABAQUS software?

Can you suggest reliable sources defining "3D mesh" and "3D city models"?

Is there an alternative to a multinomial regression which allows the DV to be non mutually exclusive?

In order to run Multinomial Logistic Regression, is it required that the data be in the long format?