Bayesian inference is a machine learning model not as widely used as deep learning or regression models. Why is it not as widely used and how does it compare to highly used models?
It's widely used in machine learning. Bayesian model averaging is a common supervised learning algorithm. Naïve Bayes classifiers are common in classification tasks. Bayesian are used in deep learning these days, which allows deep learning algorithms to learn from small datasets.
The goal of Bayesian optimization is to find an optimal configuration of a system with a limited budget of experimental trials. These methods employ a probabilistic surrogate model to make predictions about possible outcomes of unobserved configurations. To search for optimal configurations, we define an acquisition function that uses the surrogate model to assign each configuration a utility. Configurations with the highest utility are tested on the system, and the process repeats. The performance of a Bayesian optimization algorithm is therefore determined by three components: the surrogate model, the acquisition function, and the methods that numerically optimize the acquisition function.
Strictly speaking, Bayesian inference is not machine learning. It is a statistical paradigm (an alternative to frequentist statistical inference) that defines probabilities as conditional logic (via Bayes' theorem), rather than long-run frequencies. Since the calculus of conditional probability is usually intractable, numerical methods such as Markov Chain Monte Carlo (MCMC) are used to determine the ("posterior") distributions on the parameters of interest. The output of the Bayesian model is therefore a probability distribution, not a point estimate, as mentioned by others in this thread. It has its advantages and disadvantages, depending on what you compare it to, and can be built into various existing algorithms, such as ANNs, Random Forests, Regression, etc. It can also be applied to model selection (Bayesian optimization) and many other problems, because it is not an algorithm like the standard ML algorithms: it is a different way of thinking about probability.
Thanks Masab! Bayesian inference is a rigorous method for inference, which can incorporate both data (in the likelihood) and theory (in the prior). However, its use has been clouded by a lack of understanding of the concept of probability by many scientists and engineers, as an assignment based on what you know, rather than (merely) a measureable frequency. Have a look at some of my tutorial slides here to read more about this, and make sure you read Jaynes 2003.
Some of us are working in this direction to fix this anomaly!
Bayesian inference is a method used to perform statistical inference (e.g. inferring values of unknowns given some data). It is not a machine learning model, it is much more. The learning process based on Bayesian inference is called Bayesian learning. Try this to understand concepts related to Bayesian Learning : https://wso2.com/blog/research/part-one-introduction-to-bayesian-learning
Conceptually, you can perform Bayesian inference to train the almost all the machine learning models that we know (i.e. linear regression to complex deep learning models), once we design the corresponding probabilistic model. This explains how to perform Bayesian inference with traditional linear regression model : https://wso2.com/blog/research/part-two-linear-regression
However, Bayesian inference is not tractable for most of the models (due to the normalization term in the Baye's theorem). Therefore, in practice, we mostly use approximation inference techniques (which gives close enough answer to Bayesian inference) such as variational inference (Deep probabilistic models based on Variational Inference - http://dustintran.com/papers/TranHoffmanSaurousBrevdoMurphyBlei2017.pdf). You will be surprised to find that most of the complex tasks are now solved using probabilistic models that follow approximation inference techniques instead of classical machine learning (and deep learning). However, it is not easy to understand and implement such models, it requires understanding a lot of statistical concepts etc. Therefore, the community that adopts Bayesian inference for machine learning (or deep learning) is limited to few those who have sufficient knowledge to handle them effectively.
Also, it is important to point-out that Bayesian Learning may perform better in terms of accuracy for certain scenarios compared to traditional machine learning and deep learning models, and yet for simple regression and classification tasks, it may exhibit similar performance. Moreover, Bayesian learning enables the following features contrast to conventional ML;
Incorporating prior knowledge or beliefs to the observed data when training models
By using Bayesian updating, you can incrementally update the model
Flexible modeling of features via hierarchical models
The ability to express uncertainty of estimated model parameters and predictions.
The final feature is only possible if we perform full Bayesian learning when we estimate the probability distribution of the model coefficients (that represent the probability of all potential values to be the model parameter). Uncertainty of predictions can be useful to understand the confidence of the model about each prediction (notice that this is different from the confidence intervals from frequentist statistics), and to detect the regions that lacks sufficient training data (Bayesian optimization is evolved around this concept).
At a very high level, machine learning (ML) mostly concerns the solution algorithms used for fitting, typically maximum likelihood estimation. Bayesian inference (BI) has to do with how probability is interpreted (a measure of belief) and the consequences of this (e.g., parametric randomness as opposed to random samples in frequentist probability). So, they are not mutually exclusive notions. BI involves determining a posterior distribution, which involves solving an integral. Depending on how you cast the numerical integration problem, ML techniques can come in handy. A key similarity is that BI involves updating a posterior distribution as more data become available; you can think of this as a learning process. Also, prior distributions in BI can help produce regularizers (in a natural way) in ML.