Let me rephrase this.
My central question is what is de finetti's theorem as opposed to the convergence result and which convergence result is deemed the central de finettis law of large number convergence result
I presume, that de finettis theorem is merely the idea that an exchangeable prior subjective probability distribution over sequence results expressed as
P(x1, x2,x3,.....etc ) that is not independent can be expressed as a probability over IID probability hypotheses, that is hypotheses on which the outcome is conditionally independent. IT is this, and not the associated convergence results that makes up what is called de finetti representation theorem.
For instance where there say r heads and n-r tails, P = integral (from zero to one) { K^r * (1-K)^(n-r) dQ(K) } where Q(K) gives the distribution function (the probability of probabilities, the probability for any given parameter K)). This being the content of the representation theorem, that the Prior of some sequence or rather a joint distribution can be given this decomposition in terms of the intergral? Is this correct?
Is the the whole point of this enterprise that simply a subjective prior that is exchangeable can be expressed as a probability of objective probabilities on which the data are conditionally independent; and this being so because in order to use bayes rules, a probability of probability is required, not just a prior probability for the outcome (and we if we have just have a prior probability then we could have the problem of old evidence or independence where no data alters the probability). BY allowing for an exchangeable prior, that is not independent that can be decomposed as a probabilities of probabilities, the subjective bayesian of de finettis sort who disagrees with the concept of probabilities over probabilites can make use of the results of classical statistics. That is acting as if 'there were such a probability over probabilities.
In the subjective bayesian paradigm, probabilities of probabilitiesare not really part of the picture yet they need some way of expressing their prior in a way that can learn from experience ie vai probabilities of probabilities in bayes theorem
This is de finettis result strong law result i presume I presume. The convergence of opinion results are often called 'de finetti's theorem' in the literature but in reality they should only be called corollaries of this result (the result being the decomposition into a probability of probabilties).Is this correct. Often they (that is both the decomposition result and the convergence result following from it) are unfortunately given the same name.
One convergence such result is (A): if the limiting relative frequency of the data = r/n, then the posterior distribution will converge to Q(K=r/n)=1. The problem here is that K is a chance hypothesis, so i presume the subjective bayesian must read K as itself a subjective credence, and one can then derive that K=r/n, by the law of total probability where K is a subjective posterior credence (othewise you have to make use of david lewis principle principle which forges a link between chances and credences and the purely subjective bayesian loses the ability to express events in terms purely of credences, and must make use of the notion of chance we they often disagree with. Is this what the subjective bayesian has in mind (that a prior probability can be expressed as a subjective credence over subjective credences, so that with subjective credence 1 the posterior subjective credence=r/n which simply collapses to posterior subjective credence=r/n).
OR
Another result (B) I see listed (see the discussion of Gillies 'philosophical theories of probability' starting at page 70) is that after an infinite amount of data, one can prove directly from the exchangeable priors P(xn+1=heads)=r+1/n+1, as n limits to infinity, (and so P(xn+1)=r/n) assuming that that the prior probabilities P'(any with sequence with R heads and n trials )= x and P'(sequence of R+1 heads and N+1 trials)=y are such that C=x/y =1 as n goes to infinity.
Is this result (B) distinct (does it follow from A or the other way around and is there a reason that de finetti convergence is rarely rarely expressed in this format- is because it assumption c above or because it does not follow from de finetti representation theorem- is the representation theorem and the result A which follows supposed to put on solid ground the cursory result B which is not as formally valid given the assumptions such as C above). In other words is (B) a different result from the first convergence method )A) above? (B) does not make use of probabilities of probabilities (the representation theorem) yet it is the one that is rarely cited. It would appear to be de finetti's interpretation and perhaps (B) follows as a result of A. I presume this would be the subjective bayesian preferred view(otherwise the subjective bayesian would have to interpret the probability of probabilities as a subjective probability of subjective probabilities) and it would defeat the entire point of not having to need probabilities of probabilites. IT gives directly the posterior probability for the outcome and does so without having to use the probability of probabilities simply instead using the exchangeable trial (so one merely needs a prior probability for each outcome, not a probability over probability hypotheses let alone the extra hypothesis that these hypotheses are subjectivist to make them mesh in bayesian terms). However not phrase convergence using IID random variables (or conditionally independent random variables) or hypothesis and so perhaps this is why this result is not favoured. By this i mean it tells us that the posterior probability of P(xn+1=heads) as n goes to infinity must be r/n but it does not give a general result for P(x=heads). I think this is a martingale result presumably showing that all n bigger then n+1 one will get the same posterior and that might be its advantage because otherwise it gives different prior probabilities for each outcome before hand (for anything event that has already occurred xn-4 for instance if its heads, P(xn-4= heads)=1.? Is this one of the problems and why formulation (A) is often preferred or is formulation A just a rigorous proof of this result B I presume that this result B is only to be used for xn+1 when the evidence concerns all and only that trials up to xn+1, and the posterior probability that it gives for P(xn+1/xn, xn-1.....) = expected value of the posterior probability of K in (A) above after n+1 trials, the posterior probability it gives for P(x3/x2,x1) will be the weighted expected posterior value of k given trials x1,x2 in A above Is this correct? By expected value i mean the weighed average across all K, weighted that is by the posterior probability for each such K given teh evidence at that point). If this is the intended result then why bother with the de finetti mixture interpretation to begin with and its associated convergence result (A) given that it invokes chances and if it is not, then what does de finetti's theorem prove except that it almost defeats itself by presuming that there are objective probability hypothesess (or is (A) supposed to be interpreted using a credence mixture of IID subjective credence distributions and this fills in the gaps for the proof of (B) and removes some of the problems that (B) has including, (c) as mentioned and the other problem which is that in (A) each variable has its own distinct credence value (ie Cr(xn+1=a) is attached a credence possibly distinct to cr(xn=a), distinct from cr(xn-1=a), insofar that there is no general hypotheses for Cr(x=a) in this approach (B).
Please tell me if this is correct, and the motivation behind one representation over another. OR which of the two representations is often referred when they speak of de finetti's convergence theorem or de finettis law of large numbers.?