Possibility does not have a distribution. Something is possible or it is not possible. Air travel was not possible before the invention of the hot air balloon. Air travel is easily possible for most persons on the earth, today. People have even traveled to the moon.
We could use the number of air flights a day and divide it into the population of the earth and find the probability that a person is traveling by air. This number is meaningless. The probability of air travel requires additional information and assumptions. The information and assumptions allow us to select probability distributions to model the likelihood that someone is traveling. The information and assumptions are conditional and apply to those conditions.
Possible events may be rare or have low probability. It is possible that a coin toss will result in the coin standing on edge. It is not very probable under ordinary circumstances.
I understood from your clarification that, firstly, the question of whether something is possible or not can only be answered with a yes or no. It makes me think about zero or one.
Secondly, what is possible may have a probability of zero (or nearly so). This means probability is somewhat limited to possibility (may sound apparent though). If so, in anyway, the two terms could be connected expressly.
So, if probability distributions exist, why should possibility (upon which probability is based) not exist?
It is not the probability that has a distribution. Probability is a normalized measure over a sampling space (with an associated sigma algebra). The sampling space is the "set of all possible outcomes", thus giving all "possibilities" that are considered as "measurable". The underlying algebra defines that outcomes can itself be sets of outcomes. The outcomes that contain only a single element are called "elementary outcomes (or events)".
The probability distribution is a feature of a random variable. A random variable is a function returns a numeric value. In contrast to a "usual" function, the value returned from a random variable is not a simple single number. It returns a whole set of numbers, one for each elementary outcome. The key is that it all these values together with an associated probability.
Let's look at a sampling space with the elementary outcomes "male" and "female". They may be called the "possibilities", at it is possible that the attribute we observe maybe male or female. We might further consider other possibilities, like "hermaphrodite", "intersex", "sexless". Deciding on this is based on subject knowledge. Here we keep it simple and focus on only two possibilities.
We can now define a random variable X that returns 0 for male and 1 for female (we could take other numeric values; it's only a convenient choice!). We must further define the probability distribution of this random variable, which associates the returned values with probabilities, like P(X=1)=0.3 and P(X=0)=0.7. For this random variable we can calculate moments like the expected value and the variance. Such a random variable is called a Bernoulli variable, and the probability distribution is called the Bernoulli distribution. This distribution is fully defined by a single value, giving the probability of X=1, because the other probability follows from the axioms.
If there are many or even an infinite number of possibilities, like the counts of something (the outcomes can be any natural number), the random variable will return as many different values (in the case of counts it makes pretty much sense to make the random variable just return these numbers!). The individual assignment of probabilities in such cases is cumbersome or impossible, so that one assigns the probabilities as a function of the values the random variable can take. These functions are sometimes derived from simple assumptions. For counts one can imagine the interval as an infinite number of small sub-intervals in which at most one event may happen, so that these outcomes can be modelled by a Bernoulli distribution (with an infinitively small probability for X=1). A bit of caclulus will lead us to the solution for the counts, which is known as the Poisson distribution that is defined by the parameter lambda. Given the value of lambda we can calculate the probability assigned to any number the ransdom variable can take (0, 1, 2, 3, ...).
When we allow an uncountably infinite number of outcomes, like real numbers we get from physical measurements, we facte the problem it is impossible to assign a finite positive probability value to all the values the random variable can take without violating the axioms. The trick here is to see that the integral of the values must be 1, and we can define a density that integrates to 1 over the domain of the random variable. Instead of assigning a (finite positive) probability value to a point within the domain we assign a probability density f(X=x), and probabilities are then given for integrable parts of the domain. The integral of f from -Inf to x, F(X
Jochen has described how one can model the distribution of possibilities. A distribution of counts will have a distribution of possible counts. A count of 25 might be possible in a given situation. A count of 25.38 is not possible. We can model the probability of a given count. This allows quantifing the possibility of a count as its probability. Nevertheless, 25 is possible or it is not. Say, the probability was modeled by a Poisson distribution with a rate of 10.7. Twenty-five is possible, but 200 is so unlikely as to be impossible.
Assume you have a large group of healthy, adult males and the average height is 1.6732 m. The distribution of heights might be well-modeled by a normal distribution. The model might be used to predict that 20.62 individuals fall between 1.4 - 1.45 m. It is possible that the actual number in this range is 15 - 25. All those numbers are possible, as are 14, 26, and zero. We can assign a probability for any number in the range or outside the range.
Assume in the group of males that no one was below 93 cm. There is a probability of someone shorter according to the probability model, but there is no possibility that someone, in that group of males, is shorter.
Assume these males were selected at random from a much larger population. It is possible that there is someone shorter than 93 cm in the larger population. It is not possible that someone is shorter than 20 cm and be a healthy, adult male. Nevertheless, there is a probability that someone is shorter.
Yes. You can use a probability distribution to describe a distribution of possibilities. The difference in meaning is important. Avoid use of 'distribution of possibilities' except when the meaning is clear and restricted to a specific situation. Best, do not use it.
Thank you Jochen Wilhelm. Your answer was somewhat insightful!!
I should agree that to avoid senselessness e.g. negative heights, when it comes to probability, keep things simple and operate within domain that allows integral of the working values not to exceed 1.
So, if we move away from the limiting fact of possibility having something to do with episteme, can't the sampling space within the confines of that "probability feasibility domain' be distributed in someway (and thus, expressly related to probability distribution) especially for intrinsic variables?
Dear Followers of this Interesting Question, including me:)
I have got a good place for presenting my almost unacceptable point of view, which is: THERE ARE NO PROBABILITIES IN THE REAL WORLD. However, I like working with a machinery called Calculus of Probabilities. Even more, I claim it is usefull. To answer the current question is really a hard job, since it relates notions from different stories. The mathematical meaning of probability is nicely explained by Jochen. Any meaning of probability outside mathematics requires special definitions, dependent on the culture in the wide sense of particular social group (say, nation). The sameabout possibility. For instance, many English words ending with --ability/-ibility etc. can be understood in two (at least) ways as the ability of appearing/performing of something OR as the measure/degree of this particular ability. This is my understanding of these words
For istance: relability is a feature of a person, or, in the maintenans theory, reliability equals a probability a particular device is ready to work properly. In the last case, the proability is a parameter of the admitted mathematical model. In the most popular interpretation the reliability indicates a portion of properly working examples of the device in the same (?) conditions. Therefore it is hard to compare words possibility probability feasibility, etc. The range of meaning in the everyday language is subject to linguists.
Another problem is the mentioned interpretation of probability. Returning to reliability theory, after acceptance of the probability of properly working parts no 1 and no 2 as equal to p1 and p2, respectively, and under admitting the assumption that the parts in the engine work and ae destroyed independently, we can calculate the probability that both are working properly as equal to the product p1*p2. And what does it mean? We have no money to trial say thousands of pairs of the elements to get after say 100 hours the portion of pairs working properly during this period. Instead, WE ARE HAPPY THEN WITH THE CALCULATED PROBABILITY.
And additional trouble.Sometimes we understand the probabilty as a measure of possibility. Nicest example: calculations of probability that "there exists life on Mars". In this case, the probability should be interpreted as a measure derived from our knowledge collecting all known details about the nature of the phenomenon called LIFE. Obviously, the life on Mars exists or does not exist. Both events are possible. In this case, however, probability is evidently a not existing number.
You very correctly noted that probability is nothing that exists in the real world, and that "Any meaning of probability outside mathematics requires special definitions, [...]".
I don't think that there are different definitions of the meaning of probability. What we read in most introductory ind even advanced stats books is that probability is "defined" as the limiting relative frequency of an event ("frequentist definition"). Sich is going back to von Mises, and we know well since many decades that this definition is nonsense. It is either circular or plain wrong. Already von Mises got into troubles with his definition as he needed to postulate a very complicated system of collectives that must obey a "random order" or their elements, where "random" eventually leads to a circular argument. espite all these problems, a sequence of observations is not a mathematical series and there needs not be any defined limit to the relative frequency. Logically, there are only two options that can be the case for an infinite series of observations: either the event happend infinitively often, in which case the ratio is Infinity/Infinity = 1, or it happens only countably often, in which case the ratio is a finite value divided by Infinity = 0. The definition does not allow any other probability value except 0 or 1.*
The only other definition of a meaning of probability is that probability is a normalized measure of a "relative degree of belief" (relative to a comprehensive set of possibilities). That is in fact "in our heads". Is is an epistemiological definition, allowing us to quantify how much we know about something (and not about the thing itself). More correctly it is not directly quantifying our knowledge but quantifying the change in knoweldge that should be induced by a given set of data/observations.**
The "classical definition" (the probability is the ratio of "favourable cases" to "possible cases", where each of the possible cases has to be equally likely) is not a definition of the meaning of probability but rather an argument on how to rationally assign particular numeric values to probabilities, when no other information is available.
"Returning to reliability theory, [...]" -- Note that the probability of the product failure is a measure of our expectation. It is not equal to the frequency with which the products fail. However, this frequency may be used as a practical proxy to assign a numerical value to the probability of a failure. Your comment reads as if p1*p2 would be "less good" than Nfailures/Ntrials. I don't agree, and I find this misleading. If p1 and p2 represents our state of knowledge about the failure of the parts, and if we have no information on how the failure of one part may influence the other part (-> "independence"), the p1*p2 correctly reflects what we know/belief about the reliability of the two-part-system. Thats'in our heads. The observation of a relative frequency of failures is a different thing. We may use this information to refine our knowledge/beliefs, as this provides additional information about the system we hadn't before. Note that more data could change our knowledge/beliefs differently again.
And finally, when talking about life on Mars, you seem to clearly fall back to the frequentist definition, what sais that a fact does not have a probability (only repeatable processes can have probabilities). Frequentists may argue here that one may consider collectives of parallel universes that do contain a solar system similar to ours, so that the process of this universe-generation has a well-defined probability of having life on the 4th planet of this solar system - we just are not able to estimate this based on experimental data. The "propensist" would claim that it is a physically inherent feature of the planet Mars to "tend to be animated".
---
* the propensity definition seems to be the attempt to allow this definition for "unrepeatable events" and comes with additional problems.
** the logical definition is similar, except that it assumes that the assignment of probability values is completely rational, so that probabilities reflect the rational degree of belief. It requires that some initial rational belief (before having seen any data) comes ex nihilo.
Thanks for your kind and wide explanation of our (of the human-being) possible understanding(s) of probability! I like it preatty much. The only point I would like to oppose is possible misunderstanding of my point of view with respect to the life on Mars. The example was put to contradict "frequenciests' interpretation". More precisely, for showing that we can build a (mathematically) consistent probabilities measuring something like results of our knowledge about the possibility of existence of life on Mars. And I accept such calculus, obviously without any thought about completely (in this case) unnecessary frequencies:)
Possibility: The range of available answers. If I want to know the tusk diameter of elephants, the possibility is the range in diameters limited by genetics. If I want to know the possible genders of a fungus, then it is all 17 known genders plus the 7 that we have not yet found.
Probability: This is the number of times we will observe any of the possibilities in a random draw from the population. Any value that is not a possibility has a probability of zero. All possibilities have a probability that ranges from 1 to some value infinitely close to (but not equal to) zero.
Possibility is a list of what can be. Probability is how likely any one of the possible outcomes will be observed.
A closed population has a fixed number of possible outcomes. The probability of an outcome is exactly equal to the frequency of that possible outcome. The exact probability distribution is the histogram of the possibilities. A probability distribution (normal, binomial, Poisson, ...) might approximate the possibility histogram. The probability distribution is an approximation of the possibilities and not necessarily true.
An open population or one that is fluid in number and/or too large to fully identify every member can only be approximated. Limits to possibility might be imposed, but the assumed probability distribution is not the actual possibility distribution. The distribution of possible outcomes is not known.
My 2 pence to "The distribution of possible outcomes is not known" ->
the hypothetical frequency distribution of the outcomes is neccesarily unknown, as it refers to some ill-defined hypothetical case (of infinite replications of some experiment that must vary somehow between replications in some undefined way),
the frequency distribution of an actual sample of observed outcomes is known, nessecarily, and
the probability distribution is a model we use to assign numeric values to the plausibilities of events (observations of possible outcomes); it is neither "known" nor "unknown" - it is assumed.
---
Probability is NOT relative frequency.1 However, relative frequency may(!) be used to calibrate probability.
Probability is also NOT a hypothetical frequency either.2 The limiting relative frequency is nothing that needs to exist. In fact, logically, the number of events in an infinite series of experiments can either be finite or infinite. If it is finite, the limes is 0, and if it is infinite, the limes is 1. This would not allow any probability value inbetween 0 and 1. However, we see the empirical fact that relative frequencies stabilize with increasing sample size, what is the justification to use relative frequencies to calibrate probabilities.
---
1 That should be obvious, although famous logicians and philosophers like John Venn were of the opinion that probabilities are in fact nothing else but these observed frequencies. For more discussion see https://philpapers.org/rec/HJEMR
2 Alan Hájek discusses more reasons in: https://philpapers.org/rec/HJEFAA-2
The hypothetical and actual frequency distribution of outcomes is unknown in an open population. Any hypothetical distribution is a model.
The frequency distribution of a sample of that population is known.
A probability distribution based on the frequency distribution of the sample is assumed representative of the population. Any numeric value assigned based on the model is not a datum. It is an assumed prediction of an outcome. The reliability of the prediction depends on the representativeness of the sample and the fidelity of the modeled frequency of the sample.
Plausibility of the prediction is yet another matter. We want to know how the plausibility of the outcome is distributed based on the sample data and our knowledge of the population.