Dear friends, hello! I have been interested in one question for a long time and I really want to discuss it with you. I will be glad to any of your responses. Thanks a lot in advance. Regards, Sergey.
We often read: a research was performed and it was established, based of statistical data processing, that output parameter A is significantly affected by factor B. A question arises - is it possible at all (in principle!) to reliably establish the influence of a particular factor based on the processing of statistics obtained as a result of a passive experiment. Suppose that the output parameter A is a person's longevity, and for simplicity we establish that A can take only two values - zero (less than 80 years) and one (more than 80 years). As an analyzed factor B, we take the drink of red wine over the age of 60 years. For simplicity we will consider this factor also as a binary, taking only two possible values - zero for "non-drinkers" and one for "drinkers".
Suppose cohort consists of 1000 people, and after processing the collected statistics we have got results:
a) 600 of them did not drink red wine over the age of 60, and 400 people drank it.
b) Of the 600 who did not drink, 360 did not live to be 80 years old, and 240 survived 80 years.
c) Of the 400 people who drank,160 did not live to be 80 years old, and 240 survived this age.
It is clear, that the factor “drinking of red wine over the age of 60” strongly influences for longevity - when switching from its value zero to one, the chance to survive 80 years increases from 40% (=240/(360+240)) up to 60% (=240/(240+160)). The using of standard statistical methods (Fisher's exact test) confirms this conclusion: p-value