The term “spurious correlation” refers to a high correlation that is actually due to some third factor. Consider some statistical dataset, where both input factors and output parameter are binary. Let's say that in a statistical dataset:
· factor A takes the value 0 M0 times, of which the output parameter takes the value 1 N0 times
· factor A takes the value 1 M1 times, of which the output parameter takes the value 1 N1 times.
In this case Risk Ratio for factor A is defined as RR = (N1/M1)/(N0/M0). Suppose, that RR >>1. How to research, is output parameter really depends from factor A or it is only due of some spurious correlation ?