There are plenty of debates in the literature which statistical practice is better. But both approaches have many advantages but also some shortcomings. Could you suggest any references that would describe which approach to choose and when? Thank you for your valuable help!
There are lots of papers on this which will be a better way to inform your opinion than a small number of brief responses. Maybe if we list examples of these. I'll start with Efron at http://statweb.stanford.edu/~ckirby/brad/papers/2005BayesFreqSci.pdf which I think provides a fairly direct answer to your question from someone whose opinions about statistics are much better to listen to than mine!
There are lots of papers on this which will be a better way to inform your opinion than a small number of brief responses. Maybe if we list examples of these. I'll start with Efron at http://statweb.stanford.edu/~ckirby/brad/papers/2005BayesFreqSci.pdf which I think provides a fairly direct answer to your question from someone whose opinions about statistics are much better to listen to than mine!
It is not very good practice to ask such questions. You will get nothing but another holly war between frequentists and bayesians as an answer.
In fact, your question is not correct. Both approaches were developed at the beginning of 20-th century. The progress of Bayesian techniques was somewhat delayed because they usually require much more processing power than frequentist ones.
In my own opinion, the bayesian approach is much more transparent one and covers more cases (some things like point-value estimations of physical parameters and unique events can not be described in frequentists terms in principle). Still, mostly they could be used interchangeably.
In recent ISCB (International Society for Clinical Biostatistics 2014), Dr. Thomas A. Louis presented "Bayes, why bother? "(Link Below-I can't find 2014 version, so I put older version of this talk), which is very interesting and related to this discussion.
http://www.stat.ncsu.edu/events/2013_tsiatis_symposium/Louis.pdf
Thank you for posting your resources! The reason I have posted this question (Alexander - my apologies for being politically incorrect) was to find some references or personal practice with respect to benefits and shortcomings for each approach in order to see which method (or both) would fit better in my research. I appreciate your input!
This file will be helpful for you.
http://www.austincc.edu/mparker/stat/nov04/talk_nov04.pdf
In my opinion, the frequentist statistics are applied in the discrete data set and bayesian statistics are applicable in the data set where every attribute is dependent on every other attributes and every attributes existence is purely conditional. Bayesian statistics usage has been seen in data showing dependency attributes and frequency statistics works on number of occurences of an attributes and their is not case of dependency.
more about it can be read from
http://www.stat.ufl.edu/archived/casella/Talks/BayesRefresher.pdf
Perhaps this link will help
http://oikosjournal.wordpress.com/2011/10/11/frequentist-vs-bayesian-statistics-resources-to-help-you-choose/
I think that the answer to the question can also be strongly dependent of the scientific discipline you are working in. And I think in a very specific way, because for some areas, an excess o variables, or a dependence of computer methods, would mean a dependency of Bayesian techniques or philosophy. But in some other areas, the very specialized nature of the field means that the only available techniques for an specific need is frequentist or Bayesian (but not both). Some fields of biology where the design of experiments is the dominant inferential family of methods, there could be no much incentive to turn Bayesian when the full set of linear frequentist models fulfills their needs. In some fields Bayesian and frequentist approaches have different niches with low overlapping. So, I think we just have to check all the available tools that can potentially solve our needs and use the ones that best get the job done, if frequentist and Bayesian are both available we probably will be asked by reviewers to employ the one that we didn´t (by Murphy's law).
Hi Olga,
Assuming your research does not deal with developing new methods in studying a certain relationship, my advice for you is to follow methods used in previous studies in the field. Papers on the subject you study usually give solid epistemology on why they chose a certain method. In addition, in most fields you can find review articles, which cover all methodologies used in your field, their robustness, and their advantages and disadvantages.
Putting aside all differences between the aforementioned schools of thought, there are some basic thumb rules which helps one determine which method is most suitable for his/her study:
a) In investigating the behavior of a variable within a continuous range of outcomes- frequency analysis serves your purpose. Meaning, if you are interested in differential changes in a dynamic space, this method allows the investigation of a broad array of hypotheses.
b) In investigating the behavior of a variable limited to a discrete range of outcomes, or the impact of chain reactions on a certain outcome- Bayesian statistics is probably the way to go...
Having said that, there are many cases where you'll need to use an integrated model. For example, if you're checking for the relationship between life-style (nutrition, sports, alcohol and substance abuse, etc.) and cancer, most chances are- you'll need to use both statistical inference practices...
With large sample sizes, both paradigms invariably come to the same conclusions!
If you have actual prior-information, then use the Bayesian statistics. If you apply the Bayesian statistics with information less (flat) prior-distribution, then the estimation result should be more or less equal to frequentist estimation. Note, that there are different Bayesian estimators.
From a practical standpoint, pre-widespread availability of BUGS (and BUGS-like packages) and ease of programming MCMC with desktops, Bayesian methods were only done in a limited number of circumstances, and frequentist methods (often with more assumptions) were used in more complex circumstances. Now Bayesian methods are often easier in complex situations.
From a pragmatic standpoint, having each available in your statistics toolbox is probably useful.
Before questioning the method (frequentist vs Bayesian) you have to specify your problem : design, structure of data, goals such as parameters of interest, estimation vs prediction, test of hypothesis, model comparison, etc..
Frequencies are appropriate when you can actually count something. For example, you might be able to actually count all of the cases of HCV in a population. In this case, you can get an actual fraction (prevalence).
Bayesian statistics is appropriate when you have incomplete information that may be updated after further observation or experiment. You start with a prior (belief or guess) that is updated by Bayes' Law to get a posterior (improved guess). For example, how likely is it that a new therapy for HCV will change the prevalence in the population. Clinical studies can provide a prior, but epidemiology is need to provide an update (the post).
There is a close connection between frequency and Bayes' approach, but you have to go through axiomatic set theory to get it Try Finite Probability by Feller, or Probability Theory by Hoel, Port, and Stone.
Bayesian approaches, assumed that, the parameter behaves as a random variable while the Frequentist statistics as MLE assumed that the estimated parameter as a fixed.
Doctor Jordi explain the difference clearly,
http://mathforum.org/library/drmath/view/52221.html
The difference between Bayesian statistics and regular (Frequentist)
statistics is essentially a different interpretation of what
probability signifies, and thus a different way to make an inference
about a population given that we have a sample of that population.
When I tell you, "The probability that this coin lands heads is 1/2,"
what do you make of it? There are a couple of ways to think about it.
A frequentist, and I imagine that you are more familiar with this
interpretation, reasons as follows:
If the probability of landing heads is 1/2, this means that
if we were to repeat the experiment of tossing the coin very many
times, we would expect to see approximately the same number of
heads as tails. That is, the ratio of heads to tails will approach
1:1 as we toss the coin more and more times.
A Bayesian, however, would interpret that statement in a different
way:
For me, probability is a very personal opinion. What a probability
of 1/2 means to me is different from what it might mean to someone
else. However, if pressed to place a bet on the outcome of tossing
a single coin, I would just as well guess heads or tails. More
generally, if I were to bet on the roll of a die and was told that
the probability of any face coming up is 1/6, and the rewards for
guessing correctly on any outcome are equal, then it would make no
difference to me what face of the die I bet on.
That is why the Bayesian point of view is sometimes called the
Subjectivist point of view. In other words, Bayesians consider
probability statements to be a measure of one's (personal) degree of
belief in a certain hypothesis in the face of uncertainty - a
subjective measure.
The two points of view are widely differing and affect the way in
which we conduct statistical inference. Allow me to elaborate.
In statistics, we make an inference, a guess about a population based
on a sample we draw from it. We may, for example, want to know what
the speed of light in vacuum "really" is.
[As reader Steve Dodge points out: "Since 1983, the speed of light has
been a _defined quantity_, set at the integer value of 299 792 458 m/s.
The meter is then defined as the distance light travels in vacuum after
1/299 792 458 s, and the second is defined in terms of an actual
measurement of an atomic system, in an atomic clock." So let's assume
that the following imaginary discussion takes place before 1983.]
We have a problem, however: our experiment is imperfect and random errors
will always crop up in our measurements, no matter how carefully we make
them. So say we repeat our experiment five times and observe the following
measurements on each experiment, in meters per second.
299,792,459.2
299,792,460.0
299,792,456.3
299,792,458.1
299,792,459.5
In this example, our population is the abstract infinity of all
possible measurements we could make. Our sample is the five
measurements we have made. Now we wish to estimate a parameter of this
population, namely the population mean, or the "true" speed of light
in a vacuum. How do we deal with the random errors?
For a Frequentist, there exists a fixed, true, but unknown speed of
light in vacuum. The Frequentist would assume that random errors have
a certain probability distribution (probably normal distribution, also
known as Gaussian, which looks like a bell curve) and would proceed to
take the arithmetic average of the above five measurements. The
resulting statistic (a statistic is a function of your sample) would
be used as an estimator for the population mean. The estimator itself
is a random variable, so we can say, as Frequentists, that
If we were to repeat this sequence of 5 measurements a repeated
number of times, approximately this many realizations of my
estimator will be this close to the true speed of light. However,
on this particular occasion where I have already calculated my
statistic, I have no clue how close I actually am to the true
value, but I feel comfortable that I am doing okay because of
certain properties that my estimator has on repeated uses.
For a Bayesian, the above paragraph is nonsense. The Bayesian DOES
have a clue how close this particular realization of his estimator to
the speed of light, because, unlike the Frequentist, she can make a
probability statement about this realization. The random errors have
no probability distribution. They are fixed realizations; they are
reality. Instead, a Bayesian claims that the speed of light is a
random variable with its own probability distribution. For a Bayesian
there is no "true" speed of light; there is only a certain probability
distribution associated with it.
In Bayesian statistical inference, we first make a guess on what the
probability distribution of the parameter in question is. This is
called a prior distribution. Then, we observe our sample. Based on our
observations, we use a theorem called Bayes' theorem (hence the name
for a Bayesian) and modify our guess about what the distribution of
the parameter in question is. This modified guess is called a
posterior distribution.
Summing it up, Bayesians and Frequentists give opposite answers to the
question: "Does there exist a true fixed and nonrandom population
paremeter, even if we cannot know its value because all we can see is
the realizations of SOME random variable?" Frequentists say yes;
Bayesians say no.
Does this answer your question? Please write back if you have other
questions or if you feel that I did not explain myself well enough.
- Doctor Jordi, The Math Forum
http://mathforum.org/dr.math/
Olga. In these days there is an interesting high rank debate about the key words of your question at http://errorstatistics.com/. It is a page on epistemology of statistical methods and history. Best wishes in your task, emilio
I truly appreciate everyone's input and explanations. I hope that one day the deep understanding of statistics becomes a norm in my linguistic field, and not just a tool that we usually learn without any clue what it does.
Bayesian approach is used when prior information is available. There are different kinds of priors like conjugate prior, improper prior, nonconjugate prior and Jeffrey's prior. According to me posterior information is updating prior information using sample information available at present. In classical approach we just rely our results on available sample information without using prior information.
Now question is which approach one should use? That depends on one's understanding, available information, expertise etc.
The choice of which approach to use is essentially the researcher's responsibility. it is first important to have a good understanding of the basic Statistical concepts behind these approaches, after which the researcher can easily make an obvious choice.
For me these sorts of questions depend on what these methods are being used for – ‘horses for courses’ rather than ‘winner takes all’ - to what end are they practically useful?
The last ten years have seen huge strides in the use of Bayesian-inspired computational tools but they are often used in a very un-Bayesian way in that they are based on default priors. But they do bring real practical advantages in what may be termed ‘realistically complex modelling’ to use the title of a Harvey Goldstein project (http://www.bristol.ac.uk/cmm/software/realcom/ ) and ‘highly structured stochastic systems’ to use the title of Peter Green’s book http://www.maths.bris.ac.uk/~peter/HSSS/
To that end I (with Jon Rasbash) constructed the following table of comparisons for maximum likelihood (IGLS) estimation and Bayesian MCMC analysis for multilevel models with complex structures. I have put aa ntroduction (from which the table comes) to Bayesian estimation for such models on RGate - most of it is about Likelihood estimation, however
https://www.researchgate.net/publication/260771330_Developing_multilevel_models_for_analysing_contextuality_heterogeneity_and_change_using_MLwiN_Volume_1_%28updated_June_2014%29?ev=prf_pub
Volume 2 also on Rgate has more applications of MCMC estimation.
Book Developing multilevel models for analysing contextuality, he...
If with large sample sizes the frequentist and Bayesian paradigms come to the same conclusions, as Adrian Esterman previously said, very often for small sample sizes only the Bayesian paradigm leads to correct conclusions. In the attached part of a chapter from my book on Probability and social sciences (2012), the frequentist methods used by paleodemographers to estimate the age structure of a population of a little number of skulls lead to an implausible structure, while a true Bayesian estimation leads to a correct one. In order to check the quality of the various estimates we used a population of nuns at the Maubuisson abbey (XVIIIth century), for which we have simultaneously a sample of skull and the actual age structure for the entire population of nuns.
Another point seems to be important for a researcher wanting to make a statistical inference and to which the frequentist (objectivist) and the Bayesian (subjectivist) approach give different answers.
Let us take the statement that the 95% confidence interval for an unknown parameter lies between two given values. In frequentist probability, you can only state that, if you draw many samples of identical size and if you build such an interval around the mean of each sample, then you can expect that 95% of the resulting confidence intervals will contain the unknown parameter. As the analysis is often confined to a single sample or even to a little number of samples, this answer makes little sense. In Bayesian probability, the notion of exchangeable events, introduced by de Finetti, makes it easy to provide a good answer, under certain assumptions, of course, but these can be clearly stated.
You will find a more detailed presentation of these two statistical inferences in the joint document, always from my book on Probability and social science (2012).
The objective vs. subjective distinction strikes me as more of historical interest (e.g., compare with Efron's take on the issue). For instance, don't see why a probability model for the prior is subjective but a probability model for the data (likelihood) is inherently objective.
The frequency interpretation is based on a tautological (circular) argument. Therefore "frequentist statistics" actually has no sensible philosophical foundation.
See
http://plato.stanford.edu/entries/probability-interpret/
http://joelvelasco.net/teaching/3865/hajek%20-%20mises%20redux%20redux.pdf
http://philosophy.anu.edu.au/sites/default/files/documents/Philosophers%20Guide%20to%20Pr.final_.pdf
Dear Jochen,
I am finding your answer unusually apodictic and assertive for a scientist.Every position is disputable and there are pros and cons arguments, including frequentist and Bayesian approaches. A single method is not good for all seasons. I would avoid contributing to holy wars: it is useless. So I will not cite any of the valid arguments of critics to Bayesian approach.
Dear Franco,
I apologize for giving seemingly "apodictic and assertive" answers. They are not meant to be. They should rather provoke. But they also honestly present the state of my understanding (and this, for sure, is always disputable!). I read a lot about all this and I never found any sensible argumentation for a "frequentistic interpretation" of probability. I was never able to resolve the fundamental tautological statement. I really tried hard. Because I learned statistics in the "frequentist" way, and the whole trouble began when I started to ask for the interpretation of probability. I someone can resolve this tautology then I will surely revise my point of view.
Because the metric of probabilities (p) is the same as the metric for relative freqencies (f), everything that can be done with f can also be done with p, and when we assume particular frequency properties of the sampling procedure we can derive - by probability calculus (what is nothing but a rel.freq. calculus in new clothes) the probability distributions of whatever estimates and interpret them as frequency distributions. But again, how do we reason to assume a particular sampling distribution? What is random in "random sampling"? It means that each item has the same probability to be selected, and that again means that the long-run relative frequency to select any item is equal. How do we assert this? By rolling a "fair" die (with as many sides as there are possible items)? This will give us the indices of the items we would have to select to get a random sample. But why should then the dice results have all equal long-run relative frequencies? God given? The theory only states that iff the p are all the same then all the lung-run f are all the same (or the other way around). Again, nothing is explained, the problem is just shifted one layer down.
Read carefully the definition: limn->Inf(h(x)/n) = P(x) in probability
the relative frequency of an event converges in probability to the probability of that event. The tautology is really obvious!
Apart from this problem I really have a severe problem as an empirical scientist to accept a measure that relates to something metaphysical.
Finally, I think that purposely ignoring correct arguments is more a religious behaviour than a scienticfic one...
Jochen, I find interesting your words: "What is random in "random sampling"? It means that each item has the same probability to be selected, and that again means that the long-run relative frequency to select any item is equal. How do we assert this? By rolling a "fair" die (with as many sides as there are possible items)? This will give us the indices of the items we would have to select to get a random sample. But why should then the dice results have all equal long-run relative frequencies?"
What I do not share is your concept of nature dice, it is similar to 6-sides cubes with 1,2,3,4,5,6 at the sides. Imagine an unbiased cube with sides 6,6,6,3,2,1. Each side has a chance of 1/6, the expected mean is not 3.5 as with a "normal" die, neither the average will tend to converge at the median if you make one million trials.
My vision is that natural phenomenon is frequently highly complex and it tends to replicate both the values of variables and their frequences after several trials. It may behave as the irregular dice explained. Therefore, I see no problem in asigning each registered item value a frequence of 1/N as a sound premise, even with randomness present. I see more problematic asigning an apriori distribution function to the N measured dataset.
Please explain what is the meaning of function h(X) in your expression
limn->Inf(h(x)/n) = P(x) in probability
My next reply depends on that meaning.
Thanks, emilio
Thanks, emilio
Emilio,
There are also real dice with more and less than 6 sides, for instance see here:
http://python2011.globalblogs.org/files/2011/10/dice.jpg
But this was only an example. instead of taking a die just think of a wheel of fortune with n segments. Then you can also generate "random" values of a continuous variable by taking the angle in which the wheel stops (relative to some arbitrary reference).
h(x) means the absolute frequence (counts) of the event x. h(x)/n is thus the relative frequence of the event x in n trials.
Jochen,
In the same way that the values of dice do not have to be 1, 2, 3 .. 6, the values of your roulette do not have to follow a linear series. In both cases the media U points to the median and U=(1+6)/2 in dice case, and to (1+32)/2 if your roulette has 32 spaces.
I think nature give us open data that we can measure, organize and interpret from itself with help of basic experiments, mathematics and logics, without asuming hard premises about distribution.
Thanks for explaining "h(x) means the absolute frequence (counts) of the event x. h(x)/n is thus the relative frequence of the event x in n trials". My point is that we can only study a N-fixed size sample, not to study events for multiple trials that imply different working samples-
Another point is that we must distinguish the elementary frequence of an event in the sample, from the cummulative frequence -or probability- of a value bigger or smaller than some X, according to data organization of sample. We can work the aggregated frequences from samples, and build workable Lorenz curves for analytical purposes and model interpretations. So I prefer to obtain X value and elementary frequence directly from sample, without asuming distributions forced to comply the Central Limit Theorem, neither estimating parameters called variances, pi-values, parametric families, etc.
So we may declare that we use different methods and obtain different results for the same question about same dataset. I have no problem in working any concrete sample made of positive values and contrast my results with other approaches. N must be smaller than 200 because I only use excel for my estimates.
Thanks, emilio
Dear Jochen,
I was not disputing about frequestist approach being theoretically founded or not. I see myself several defects in the frequentist approach. I was disputing about anyone indicating that the Bayesian approach is indisputably sound and usable in all circumstances. That for me is a principle criticism, since just I do not believe to religious-like attitudes in science.
To tell my modest opinion, also the Bayesian approach looks circular to me, or at least based on non-falsifiable principles, where I agree with Popper that this is a non-scientific approach. Bayesians (here either subjectivists and objectivists) advocate "prior information", THE pillar of their reasoning, because after that concept I see a lot of automatic procedure (a bad symptom). Apart the fact that the use of prior information cannot today be attributed totally and automatically to the Bayesian frame, how this prior information generates?
Not even considering it as a hypothesis, because it would be non-verifiable, it can considered as the posterior of a previous knowledge process. If in the latter a Bayesian approach was used, it needed, in turn, a prior, based on less previous information... and so back and back, until the prior meant "no information" –an apparently simple choice.
Instead, in my opinion, as I have already pointed out in another question of mine on ResearchGate, it is a very difficult choice. In all instances, no information means no prior information, thus the result of the Bayesian approach, though via another way (a useful double check) should be the same than the result obtained by methods not using prior information.
On the other hand, if the prior is really basically important in influencing the posterior, it would mean that added information is almost nil: in this case the final information totally depends on the prior – a big responsibility.
Thank you Franco for your response. I will share my opinion:
I am not sure how Poppers philosophy of falsification can apply in principle if the data is not of "logical nature" but rather of "uncertain nature". A hypothesis like "all humans have 32 teeth" can be falsified by finding a single case with not 32 theeth. But a hypothesis like "living vegan increases the expected life time of humans by 2 years" cannot be falsified. Any arbitrary amount of data can be collected, further and further supporting of discrediting the hypothesis, but there is no logical falsification possible in the Popperian sense.
Data alone is worthless. It gets a meaning only in a model. The model must be invented, and with the same time an opinion appears how the model should look like under different scenarios. There can never be "no prior". The model itself is already part of a prior. Priors about the model parameters may be chosen in a way that data will not too easily convince us.
And so I disagree with your statement that data adds no information. Data is the *only* thing that adds information. Without data you only have the prior, with lots of data your prior becomes irrelevant anyway. Using little data without considering a well-thought prior is dangerous.
A primary consideration is whether or not you know prior probabilities. I'd recommend Bulmer's text on statistics, he goes into this subject in detail.
I recently read a paper that stated quite controversially that Bayesian statistics should be taught to the masses and frequentist statistics should be left to the expert user only.
It all depends what you want to achieve and how you define probability (frequency in the long run or subjective or other).
In practice, researchers collect data to answer specific question (sometimes according to desired type I and II errors).
When the data is collected, (at least) one model is applied and then researchers want to make inference about the population parameters.
By construction confidence intervals (a-la Neyman) tell nothing about the parameters. This is because confidence intervals are designed to have certain frequentist properties in the long run: a 95% CI is a procedure that - when applied a large (read ---Infinite) number of times - is designed to cover the parameter 95% of times. Rigorously speaking, a 95% CI tells nothing about the parameter in a single application. Article Robust misinterpretation of confidence intervals
On the other hand, Bayes tried to solve Hume's problem of induction
Article The Rule of Succession
. Very loosely speaking, we generally want to look at reality (collect data) and draw conclusions that are "probable".That is why in data analysis, intervals that are constructed using Bayesian approaches have a probability interpretation, i.e. the interval is a probability statement about the parameter.
Fisher was very critical of Bayesian approaches because of the use of priors.
For that reason he proposed an alternative method (fiducial inference), that has been gaining a lot of attention lately. It is a procedure that obtains a measure on a parameter space without the use of a prior.
Article R. A. Fisher on Bayes and Bayes' theorem
Article On Generalized Fiducial Inference
*to sum it up*
//Conficence intervals are PRE data (have a 95% probability before you observe the data)
//Credibility Intervals (Bayesian) are POST data intervals
// Fiducial Intervals are POST data intervals.
Sometimes researchers might be more interested in making probability statements (fiducial and bayesian methods) rather than applying procedures that have certain desired probability in the long run (frequentist methods).
Olga Scrivner Doing some research you will probably see that some bayesian and fiducial methods have nice coverage in the frequentist sense. So to answer your question, if you want to make probability statements from your data, use a bayesian or fiducial method, checking how the posterior changes using different priors is also informative. Sometimes you might also want to use methods that have certain frequentist properties.
Frequentist properties are usually assessed when methods are compared: when you find a paper where simulations are done and they compute type I and II errors.