What is the raison d'être of uninformative priors? Isn't one of the Bayesian goals that of allowing to systematically incorporate prior information into inference?
# If we want inference to be driven solely form data, why do we even bother specifying a prior?
Inference neccesarily goes beyond data, and information is a quantity or entity that exists only in the relation of data and a context (what is here the current "state of conviction" [you may call it "konwlege"], the assumed models, and the kown circumstances of the experiment/study/survey/data collection).
So data alone is and cannot not enough or the only thing to make inference. One needs a model and a context to assert the information of the data (relevant to and related to the model and the context). The model can often be described in some formal way, like a statistical model, either being "obviousely reasonable" or derived from "obviousely reasonable" basic assumptions. The likelihood is the a function that relates the data to the model. This still is not enough to make inference. We still need a context to interpret this function (or likelihood ratios, or p-values from likelihood-ratio tests, etc.). An example can demonstrate this:
If I play lotto (the experiment) the event of winning the lottery has a very small probability under the hypothesis (model) that the result was just a wild guess. The probability of winning would get a mconsiderably higher probability if I had psychic powers so that I could foresee the next lotto numbers. Now I perform the experiment and I win. Wow :) The p-value of this result P(win|guessing) is very small. But would I take this result to conclude that I have psychic powers? Certainly not. But this conclusion is not driven by the data but by the context, by what all we know (or belive) about how the world works. In this context the result remains a "lucky" incident, nothing else, because there is no other data that would be simpler/better be explained from such an assumption and there is such a lot of data that is much more likely if we do not assume that I had psychic powers.
This was quite an extreme example, because we are so sure that psychic powers do not exist (what only means that we would need a hell lot of good data obtained from adequate experiments to reconsider its existence). There are other examples where the influence of the prior knowledge (or believe) is not that drastic.
What is the sense of a so called "uninformative prior"?
Priors are essentially arbitrary. There is no law that would prohibit that two people (agents, social groups, scientific communities) have considerably different priors. Several aspects of the priors can be attributed to (or be seen as consequences of) considering particular (prior) nformation. The above example uses an extremely "informative" prior because there is very much experience about people obviousely lacking psychic powers and because we have no idea how we would integrate the existence of psychic powers into the body of our other models. But if we could not agree on how this information is to be weigted, in other words: if we could not agree on a common wager in a bet for/against the existense of psychic powers? A hoped-for solution is to eliminate all the impact of our former experiencees and of the rest of our models, what would lead to a "non-informative prior".
A sensible non-informative prior is the uniform prior, what says that one considered all possible hypotheses as equally likely. It makes sense in a way that it expresses that we don't see any reason to prefer one of the hypotheses over any other (this is related to Laplaces principle of indifference). However, the hypothesis space may be transformed. But a uniform prior in one space is non-uniform in a transformed space. Here Jeffreys proposed priors that are invariant against transformations.*
# Isn't one of the Bayesian goals that of allowing to systematically incorporate prior information into inference?
No, you turned this around a bit. It is not to incorportae prior information into inference, it is to incorparate the current information into our beliefs (or knowledge). So we do have some beliefs before seeing the data, and the data exhibits a momentum to change our beliefs to a state after accounting for the new data. The data does not tell us where we *are*, it just tells us how far we have to move into which direction. You can see data as being a "force" that acts on masses. Defining a force makes sense only in relation to masses. No masses, no forces. The data changes the impulses, but it does not determine what impulse any mass has to have. This depends on the inpulse the mass had before the force was acting on it.
The Bayesian goal is to provide an objective and systematic way to calculate apply the "force" on a given "impulse".
---
* you may imagine a binomial experiment to infer a proportion (p). You may argue that the uniform for p in [0;1] is an uninformative prior. You can express the proportion as an odds ratio, or = p/(1-p). The prior chosen above won't be uniform on the scale of the odds ratio. Which one is correct? Jeffrey's prior for the binomial is invariant under this transformation. So giving the posterior obtained via Jeffrey's prior would be convertible between the proportion and the odds ratio. Hence, "non-informative" means that the posterior won't depend on the way the data is interpreted (as proportion or as odds ratio). I personally doubt that there is any "really objective" non-informative prior. That would be like attempt to describe the frequency of a wave without giving any time scale ("fequency" is an entity that exists only in conjunction with "time"; as soon as I remove "time" from my model it makes no sense or it is impossible to talk about "frequency").
There is nothing like an uninformative prior in the strict sense of the word. Uninformative just means, this prior will contribute minimum information in comparison with evidence. If there is little or no evidence, there obviously is nothing, which could dominate the prior.
So, you use the uninformative prior, if you have nothing else to use, but you want to use Bayesian framework. Bayesian approach without a prior - this does not work.
No prior is uninformative in the strict sense of the word. There are only priors which minimize the information supplied in comparison with the evidence. If there is no evidence, there is nothing, which could dominate the effect of the prior.
You use an uninformative prior, if you have no prior information at all, but still would like to use Bayesian framework. Bayes without a prior: this simply would not work
# If we want inference to be driven solely form data, why do we even bother specifying a prior?
Inference neccesarily goes beyond data, and information is a quantity or entity that exists only in the relation of data and a context (what is here the current "state of conviction" [you may call it "konwlege"], the assumed models, and the kown circumstances of the experiment/study/survey/data collection).
So data alone is and cannot not enough or the only thing to make inference. One needs a model and a context to assert the information of the data (relevant to and related to the model and the context). The model can often be described in some formal way, like a statistical model, either being "obviousely reasonable" or derived from "obviousely reasonable" basic assumptions. The likelihood is the a function that relates the data to the model. This still is not enough to make inference. We still need a context to interpret this function (or likelihood ratios, or p-values from likelihood-ratio tests, etc.). An example can demonstrate this:
If I play lotto (the experiment) the event of winning the lottery has a very small probability under the hypothesis (model) that the result was just a wild guess. The probability of winning would get a mconsiderably higher probability if I had psychic powers so that I could foresee the next lotto numbers. Now I perform the experiment and I win. Wow :) The p-value of this result P(win|guessing) is very small. But would I take this result to conclude that I have psychic powers? Certainly not. But this conclusion is not driven by the data but by the context, by what all we know (or belive) about how the world works. In this context the result remains a "lucky" incident, nothing else, because there is no other data that would be simpler/better be explained from such an assumption and there is such a lot of data that is much more likely if we do not assume that I had psychic powers.
This was quite an extreme example, because we are so sure that psychic powers do not exist (what only means that we would need a hell lot of good data obtained from adequate experiments to reconsider its existence). There are other examples where the influence of the prior knowledge (or believe) is not that drastic.
What is the sense of a so called "uninformative prior"?
Priors are essentially arbitrary. There is no law that would prohibit that two people (agents, social groups, scientific communities) have considerably different priors. Several aspects of the priors can be attributed to (or be seen as consequences of) considering particular (prior) nformation. The above example uses an extremely "informative" prior because there is very much experience about people obviousely lacking psychic powers and because we have no idea how we would integrate the existence of psychic powers into the body of our other models. But if we could not agree on how this information is to be weigted, in other words: if we could not agree on a common wager in a bet for/against the existense of psychic powers? A hoped-for solution is to eliminate all the impact of our former experiencees and of the rest of our models, what would lead to a "non-informative prior".
A sensible non-informative prior is the uniform prior, what says that one considered all possible hypotheses as equally likely. It makes sense in a way that it expresses that we don't see any reason to prefer one of the hypotheses over any other (this is related to Laplaces principle of indifference). However, the hypothesis space may be transformed. But a uniform prior in one space is non-uniform in a transformed space. Here Jeffreys proposed priors that are invariant against transformations.*
# Isn't one of the Bayesian goals that of allowing to systematically incorporate prior information into inference?
No, you turned this around a bit. It is not to incorportae prior information into inference, it is to incorparate the current information into our beliefs (or knowledge). So we do have some beliefs before seeing the data, and the data exhibits a momentum to change our beliefs to a state after accounting for the new data. The data does not tell us where we *are*, it just tells us how far we have to move into which direction. You can see data as being a "force" that acts on masses. Defining a force makes sense only in relation to masses. No masses, no forces. The data changes the impulses, but it does not determine what impulse any mass has to have. This depends on the inpulse the mass had before the force was acting on it.
The Bayesian goal is to provide an objective and systematic way to calculate apply the "force" on a given "impulse".
---
* you may imagine a binomial experiment to infer a proportion (p). You may argue that the uniform for p in [0;1] is an uninformative prior. You can express the proportion as an odds ratio, or = p/(1-p). The prior chosen above won't be uniform on the scale of the odds ratio. Which one is correct? Jeffrey's prior for the binomial is invariant under this transformation. So giving the posterior obtained via Jeffrey's prior would be convertible between the proportion and the odds ratio. Hence, "non-informative" means that the posterior won't depend on the way the data is interpreted (as proportion or as odds ratio). I personally doubt that there is any "really objective" non-informative prior. That would be like attempt to describe the frequency of a wave without giving any time scale ("fequency" is an entity that exists only in conjunction with "time"; as soon as I remove "time" from my model it makes no sense or it is impossible to talk about "frequency").
I think if one wants inferences solely based on data then that leads naturally to likelihood based inference. I think once you bother introducing a prior there is always some information that you can bring to the problem,