Im currently reading the foundation of modern time series analysis by Terrence C. Mills
In the fourth chapter he discusses the concept of difference data to remove secular trend and the history behind it.
On page 36, section 4.7 it says:
"Cave and Pearson then computed correlation coefficients for all pairs of pairs of indicies at each level of differencing. we shall content ourselves with reporting correlations ( ± probable errors) between tobacco and savings for d=0,1,....6, the d=0 correlation being the focus of concern...
d Correlation
0 0.984 ± 0.005
1 0.766 ± 0.065
2 -0.044± 0.182
3 -0.327 ± 0.181
4 -0.380± 0.188
5 -0.402 ± 0.188
6 -0.432 ± 0.204
It is clear that the large positive correlation between tobacco and savings at d=0 does appear to be spurious: by d=3 the correlation is negative and by d=6 significantly so."
From this research quoted in the book, a true relationship was uncovered.
This is very interesting for me as a quantitative analyst and amateur social scientist in applying these methods to regression analysis.
However no limit for the "required" amount differences that need to be taken of both a the variables is given to uncover a true relationship.
Is there a limit for differencing? can you over difference data?