I have a serie of about 90 medium-range forecasts (3 months, daily) from 'old' and 'new' versions of the atmospheric model starting from the same initial data. How can I judge whether or not the change of serie-mean errors and scores (say, H500 RMSE) between two model versions is statistically significant. I suppose that there should be a conventional method to do it. The Student's t-test seems to be inappropriate in this case, because the errors of forecast (run each 24 hours) are autocorrelated and error series from 'old' and 'new' models are not independent.