I am currently learning the use of DWT decomposition in time series forecasting with ML models. However, I have a concern regarding potential data leakage during the wavelet decomposition process.

During the convolution step, wavelet filters may use future data points to compute the wavelet coefficients at different levels. This means that even if we split the data into training and testing sets first and then perform wavelet decomposition separately on each, the test set may still "see into the future" — unless we use a causal wavelet filter (i.e., db1).

Now, I have two main questions:

  • How can we effectively prevent data leakage during wavelet decomposition in time series forecasting?
  • Am I missing something? As I’ve come across many high-impact journal papers where wavelet-ML models are used for time series forecasting. These models often decompose the dataset using non-causal wavelet filters and then feed the decomposed signals into the ML model. These models often report very high prediction accuracy (even in 2- or 3-step-ahead forecasting). But if future data influences the decomposition, are those metrics truly valid?
  • More Jahidul Alam's questions See All
    Similar questions and discussions