I am applying random forest to a time series with both fast and slow changing processes. I will also try time series analysis based on state space models later, but for now I am curious of potential causes to the intermittent results that we obtain. 

We study the effect on an optimization algorithm that according to knowledge-based models yields up to 7% savings in fuel consumption. Since this is data from ocean-going vessels, the control over the experiments is limited. 

Essentially, we apply a low-pass filter to the time series data and then consider each measurement to be independent (that is, that the auto-correlative effect is negligible). I use the R caret package and has swept over different configurations of non-overlapping roll mean windows (low pass filter), different number of classification trees, different number of partitionings of data. The influencing variable where we want to check data is if an optimization of a control algorithm gives fuel savings or not.

The problem is that there appears to be random if the prediction model has an influence from the optimization or not. My hypothesis is that this depends on the randomness in combination with that the optimization has a small effect (my guess is that it is close to the error of the measurement). So, when sweeping over the configurations, I get models with and without influence from the optimization.

Any clues? 

More Jonas Mellin's questions See All
Similar questions and discussions