Proof of Laplace rule of succession

The Laplace rule of succession is a rule of pure observational induction. Assume you are placed on the earth during darkness and have no idea where you are and the mechanics of rotation and revolution about the sun. After some time the sun comes up and after more time goes down and you are again in darkness. What probability can you assign that the sun will come up again? All you know (you know nothing but your observation) is that it came up once. You can only say it will or it will not so you assign a probability of 1/2. You have 1 outcome of 2 (will, will not).

The sun comes up on the second day and then goes down. What is the probability that it will come up again? Based on your experience it has come up twice. Now you can say it will or will not, but you have 2 sun rises for will, but, still, maybe it will not. Assign a probability of 2/3. The following day it becomes 3/4, then 4/5, then 5/6 ... then 1/1001. The rule of succession is an induction rule, you induct the sun will rise with a probability of 1/(observations +1) or your certainty increases with observation. You cannot say deductively that the sun will rise with a probability of 1/(observations +1), but by induction you can be very confident it will rise.

Laplace noted that had you known the mechanics of the earth's orbit and rotation you would not need any observations to be confident the sun will rise. The rule of succession applies to situations where what you know is limited to observation only.

Jacob Kopczynski

Good explanations for this are hard to come by, and Joseph's isn't a good one; the sunrise is a terrible example, as I'll explain a bit later.

There are two good intuitions I've seen for this. The first follows from the conditions where it applies: The Rule of Succession only can be applied when you know that the event might happen or might not. You know there is some probability of each of them happening. (This is almost always true, but not always. If you've never heard of an eclipse or the life cycle of stars, as far as you know there could be no chance at all that the sun will fail to rise tomorrow; considering it an immutable law of the world, which will keep going until something you've never conceived of occurs, is just good sense.)

Because you know either could happen, you set up your data collection with some invisible data; one success, one failure. Taking these into account will keep you from getting carried away; you know that either can happen, so you regress back toward the mean a bit and expect that in the end, it will land closer to 50%/50% than whatever fluky data you start with. Adding the invisible 1 success and 1 failure = 2 trials, you get s+1 / n+2.

The other is to think about it like you're sticking points onto a line. Take the line from 0 to 1, and pick a point at random, without looking. Where is it? Well, it's probably not at exactly 0.5, but that's your best guess, since it's just as likely to be to the left of that point as to the right. Now start picking other points, except that all you ever learn is whether they are to the left or the right of the first point. Maybe you want them to land to the left, so you track s lefts and n choices (not counting the first point that you're measuring everything by; that was special). Between the measuring point and the left edge, there will be s points you've already chosen, but there will be s+1 gaps between them. On the other side, there are n-s points and n-s+1 gaps; one to the right of each of the points, plus one extra between the measuring point and the point that's closest to it. So there are n+2 gaps total.

Now, you might point out (correctly) that those gaps won't all be the same size. You've picked all the points randomly; that's going to make some big and some small gaps. But before you've picked these points, or when all you ever hear is 'left' or 'right', you don't know anything about that. All these points were picked the same way, so there's no reason for any pair of neighbors to be closer than any other pair. So on average, you can treat them all as the same size, so the next new point you pick is equally likely to be in any of them, and the probability of the next point landing in a success spot is s+1/n+2.

Joseph L Alvarez

The LaPlace rule of succession is the probability of pure induction. It is based solely on observation. No other information is allowed. Arguments against the rule all assume there must be other information. The LaPlace arguments show the difference between induction and deduction, in particular, as regards probability. LaPlace's argument was for successive events:

"An event having occurred successively any number of times, the probability that it will happen again the next time is equal to this number increased by unity divided by the same number, increased by two units."

His example was the probability of the sun rising with no information other than successive risings of the sun. Note: Information on eclipses, positions of other heavenly bodies, mechanics, or descriptions of the sun.

Once the sun argument was demonstrated the rule for predicting the probability of succession was established. Once established it can be extended to occurrences with an average rate of occurrence.

Consider that you must decide if one of three things might occur, A, B, or C. Asked to predict if B will occur and having no prior information on its probability of occurrence, you can only state that B will occur or it will not. It does not matter that there are three choices, B will occur or it will not. That is your total information for making a prediction. The prediction is 1 chance in 2. Note: Given the information that A, B, or C will occur (instead of might occur) and that A, B, or C are equally probable, the prediction is 1 chance in 3.

Assume the typical balls in a urn problem. The balls are black or white. What is the probability of drawing a black ball. Given the knowledge, the chance is either white or black; 1 chance in 2. No other prediction can be made. Draw a ball, observe that it is white or black and return it to the urn. What prediction can be made for the next draw? The next draw will be white or black; 1 chance in 2.

Draw, record, and return 100 balls. Say the result is 23 black and 77 white. We now have observation to aid prediction. The next draw will be white or black; 1 chance in 2. Use the observation to make a prediction of the next draw. A prediction of black is 23 +1 divided by 100 + 2. Black was observed 23 out of 100 tries. A black on the next draw will be a total of 24 out of 101, but the prediction must include a chance for white; therefore 24 out out of 102.

Continued observation, say 10,000 draws, may indicate that there is 1 black for every 4 white balls. We have not deduced this ratio, but have induced the ratio. All evidence is based on our definition of probability as confined to this observation, probability equals frequency of observation. The definition is circular. The rule of succession states that induction leads to prediction with increasing ability to predict with increasing observation.

After 10,000 draws with 2,500 black, the prediction for black on the next draw is 2,500 +1 divided by 10,000 +2.

Mathematical proofs of the rule of succession are available on the internet.

How to learn more about SPSS and its Application?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

"A Markov-like Model for Patient Progression"?

Should I remove an item from a scale to raise Cronbach's alpha and McDonald's omega or is it better to leave it if they are both over .7 already?

Why 3 replicates for most biological assays? Is it enough to examine the data fits normal distribution?

Posthoc test lettering in JAMOVI?

How to back transform the results generated from analyses using log transformed with In(X+1) data?

Have you tried using Vizly for your data analysis? Use the link: https://vizly.fyi/?via=olatomide. How do you see it?

Can we eliminate the stress singularity at the tip of the crack by manipulating the elastic constants?

Is it appropriate for researcher(s) to collapse five or four rating Likert scales to three or two as the case maybe during data analysis?