Imagine a standard machine-learning scenario:

You are confronted with a large multivariate dataset and you have a pretty blurry understanding of it. What you need to do is to make predictions about some variable based on what you have. As usual, you clean the data, look at descriptive statistics, run some models, cross-validate them etc., but after several attempts, going back and forth and trying multiple models nothing seems to work and your results are miserable. You can spend hours, days, or weeks on such a problem...

The question is: when to stop? How do you know that your data actually is hopeless and all the fancy models wouldn't do you any more good than predicting the average outcome for all cases or some other trivial solution?

Of course, this is a forecastability issue, but as far as I know, it is hard to assess forecastability for multivariate data before trying something on it. Or am I wrong?

More Fazla Rabbi Mashrur's questions See All
Similar questions and discussions