In Bayesian statistics a parameter is assumed to be a random variable. Assigned to it therefore is a prior probability distribution. A sample is drawn and on the basis of the likelihood of the sample a posterior distribution is obtained for the parameter. The Bayes solution is the posterior mean given a quadratic loss function or the posterior median given an absolute loss function.
On the other hand the classical approach to statistical inference is such that the parameter is assumed constant.
Bayesian inference is a different perspective from Classical Statistics (Frequentist).
Simply put (And probably too simple):
For a Frequentist, probability of an event is the proportion of that event in long run. Most frequentist concepts comes from this idea (E.g. p-values, confidence intervals)
For a Bayesian, probability is more epistemological. Which means that is his/her belief on the chance of an even occurring. This belief also known as prior probability comes from the previous experience, knowledge of literature e.t.c.
Bayesian inference use Bayes theorem to combine the prior probabilities and the likelihood from the data to get the posterior probability of the event.
Posterior probability (in lay terms) is the updated belief on the probability of an event happening given the prior and the data observed.
When I started off with Bayesian inference, articles like this(see below) helped me a lot, so passing forward the goodwill.
The whole theory of classical statistics is based on classical theory of probability, which is frequency in long run or limiting frequency (thats why named frequentists). But, In fact it is not possible and feasible to repeat an experiment for infinite many times/repetition.
Subjective belief have no computational significance.
In classical statistics, We use sample data to infer about population parameter (e.g. MLE etc...) so It's an Inductive Inference/reasoning.
When it comes to testing and intervals, we use confidence interval for the constant parameter theta or whatever.
While in Bayesian Statistics, We also consider subjective probability.
We quantify our beliefs and then later can update it on the basis of available data. This uses Bayes Rule.
In Bayesian, We quantify our prior belief about population parameter and construct a functional form and call it as Prior Distribution (Thats the reason why parameter is Random Variable in Bayesian), and then on the basis of data we find out likelihood function, then we multiply these two function in order to come up with a final function/distribution called Posterior Distribution. Hence in this case we are not only dealing with data but also taking care of past/prior information of parameters, Therefore Bayesian is Deductive reasoning/Inference.
We use credible intervals in bayesian. Parameters are RVs and have probability distribution.
Because of its beauty Bayesian is used in some crucial cases when large data is unavailable.
Bayesian have also pro and cons; 1. One conflict is on the selection of prior. 2. Complexity in sampling when there is no appropriate function form, However MCMC techniques are useful in most of the cases.
The given answers have touched various relevant aspects of the interplay of discussed theories. I would like to recall significant (and I believe decisive) decision-making view point (precision and details are sacrified for answer brevity).
L.J. Savage's formalisation of decision making under uncertainty led to conclusion that that any uncertain (unknown, vaguely specified, fuzzy, random...) quantity has to be modelled by probabilities in order to make adequate decisions. These probabilities inevitably have a subjective character as the uncertainty is relative to the subject making the decisions. The fact that any standard statistical problem can be formulated as a decision task recommends to use Bayesian view as the primary one. It does not decrease significance of frequentist view, which has played an important historical, technical and standardisation role and which brought a lot of (primarily asymptotic) deep results. Good news is that, with informative data, Bayesian outcomes generically converge to those obtained via frequentist methodology but as somebody said "we all be dead in long run".
@BOOK{Sav:54, title = {Foundations of Statistics},
I've tried to visualize some of the ideas discussed in the thread so far in this Demonstration of Bayesian Ideas: https://agrogan.shinyapps.io/shinyBayes/
It may be important to understand that Andrews demonstration is about Bayes theorem, what is not the "Bayesian idea". In the demo, Bayes theorem is used to adjust purely frequentist estimates according to frequencies in the population, using the connction between conditional probabilities (via Bayes theorem).
But the "Bayesian idea" or "Bayesian statistics" is about the definition of a random variable. A frequentist would not accept a parameter as a random variable, because randomness, for a frequentist, is associated with variation in replicated observations (and a parameter is neither observed nor can it vary). A Bayesian, in contrast, can see a parameter as a random variable, because for a Bayesian randomness is "lack of knowledge" or "uncertainty in judgement", what can refer to unseen data just as well as to observable but unreplicable events and to unobservable things like parameters. The connection to Bayes theorem is that the probability distribution over a parameter is modified by the likelihood of the data (which is based on the probability distribution of the data).
In classical inference, parameters are fixed or non-random quantities and the probability statements concern only the data whereas Bayesian analysis makes use of our prior beliefs of the parameters before any data is analysis. It applies the Bayes rule in order to make the probability estimate of a hypothesis taking the observed information.
there is a major difference in classical and Bayesian inference is that, parameter is also a random variable and follow some specific pattern or distribution in Bayesian inference. we choose such distribution of parameter that our results will likely to near about true results. So we can say that its all about belief that we have about parameter's pattern.
In classical inference there is no conception like parameter is also a random variable.
There is a good answer by Michael Lanier of the Southern Illionois University on https://www.quora.com/What-is-the-difference-between-Bayesian-and-frequentist-statisticians .
"Frequentists are usually looking at P(data| parameter), note the parameter is fixed, the data is random. The Bayesian is looking at the P(parameter|data) the parameter is random and the data is fixed.
Note that what we usually want are inferences about parameters, not data. We want to know “What does the data say about the parameter.” not “what does the parameter say about the data”. Traditional frequentists methods often are misunderstood as they conflate these two (which are the same only under certain situations).
To illustrate this consider the well known confidence interval. A 95% confidence interval is defined as:
The differences have roots in their definition of probability i.e., Bayesian statistics defines it as a degree of belief, while classical statistics defines it as a long run relative frequency of occurrence.
One important consequence of the probability definition is that Bayesian statisticians assign probability distributions to uncertain variables such as parameters, predictions etc, while classical statisticians cannot assign probabilities to all uncertain variables e.g., parameters, instead confidence intervals and regions are used to quantify the parameters uncertainty.
I recently demonstrated that it was a mistake to use likelihood function in the Bayes Theorem, which leads to the so-called reformulated Bayes Theorem (by some authors) that is the basis of the Bayesian statistics. If this mistake is corrected, the frequentist and Bayesian inference may unify, and in fact, reunite. Certainly, this is an extraordinary claim. But please refer to the preprint that might have provided extraordinary evidence.
Huang H 2020 A new Bayesian method for measurement uncertainty analysis and the unification of frequentist and Bayesian inference, preprint, DOI: 10.13140/RG.2.2.35338.08646, available on ResearchGate: https://www.researchgate.net/publication/344552280_A_new_Bayesian_method_for_measurement_uncertainty_analysis_and_the_unification_of_frequentist_and_Bayesian_inference?channel=doi&linkId=5f7fd8a5458515b7cf71d5ec&showFulltext=true
The real problem arises, in my opinion, when people try to make uncertainty estimations about "derived parameters" using classical statistics as the starting point. Let's say that A, B, and C are three parameters of interest, of which A and B are "measurable", so there is data from which one can make estimations, whereas C is not measurable directly, but we know, however, that the following relation holds: C = f (A , B) , where f is some known function.
If one operates in the Bayesian probability framework, the problem of estimating C and its uncertainty characteristics is quite clean and straightforward. In fact, that can be done by applying Bayesian parameter estimation processes to obtain A and B in the form of uncertainty distributions, and then applying "distribution math" to obtain the uncertainty distribution representing C. In most cases, numerical Monte Carlo routines implemented in software are the simplest way to obtain a numerical representation of the C distribution, over which one may, is so desired, fit an appropriately assumed, and analytically formulated distribution function (e.g., a distribution of the beta, lognormal, normal, etc., variety).
If one instead is operating in the classical statistics realm, it becomes very problematic to define and calculate, from the estimations of confidence intervals for A and B, "confidence intervals" and "confidence bounds" relative to C. What I have seen people routinely do, however, is a contradictory application of a mix of classical and Bayesian techniques, i.e.: a) calculating "classical" confidence intervals for A and B, then b) treat these as percentile ranges of some assumed types of distributions representing A and B, and finally c) use Monte Carlo to obtain a distribution for C (which strictly speaking, it is not clear not what it is supposed to represent (a Bayesian distribution of C, directly representing its variability / uncertainty ? ... or the distribution of confidence intevals for hypothetical "measurement data" for C, which do not and cannot exist, since C is not directly measurable ?). I see a tremendous degree of conceptual messyness and inconsistency in this latter mode of proceeding ... :)
The GUM (Guide to the Expression of Uncertainty in Measurement) considers this problem: C = f (A , B) and establishes a framework to estimate the standard uncertainty (SU) and expended uncertainty (a half-width of confidence interval) for C, based on the law of propagation of uncertainty (LPU). The GUM uncertainty framework is often considered to be frequentist. The Supplement 1 (S1) to the GUM describes Monte Carlo method (MCM) for estimating measurement uncertainty, based on the principle of propagation of distributions. It is commonly considered that the MCM of the GUM-S1 is based on the Bayesian view. However, "... MCM does not require the Bayesian view; it can be implemented purely based on the frequentist view. The GUM-S1 seems to generate a misconception that MCM must be interpreted based on the Bayesian statistics." For detailed discussion on this, please refer to Article Why the scaled and shifted t-distribution should not be used...