I have recently been working on using machine learning for yield prediction, however, I was exploring what inputs would be better at predicting yield. I am confused by only three papers that use historical yields as an input to predict yields for the new year. From the test results this does improve the prediction accuracy substantially. But does this count as data leakage? If not, what is the rationale for doing so? What are the limitations? (It seems that the three papers are from the same team.)
Three papers' links: https://www.sciencedirect.com/science/article/abs/pii/S0034425721001267
https://www.mdpi.com/2072-4292/12/8/1232
https://www.frontiersin.org/articles/10.3389/fpls.2019.00809/full