Hi there,

I am trying to build an ARDL model to predict recessions in the US using industry productivity, the difference in yields between 3 month and 10 year treasury bills, unemployment rate and the federal rate. The recession variable is a binary variable with a value of either 0 or 1. Is it correct to just lag the variables and use a probit function to regress recession on these variables or is that not a valid approach? When I do just do that and then try training my model and forecasting periods for which I have data, my model fits too well. It sits at almost 0 for most of the time (well more accurately it sits at numbers such as 5.18e-18) with the occasional one period spike and then jumps to something like 0.9997 when there was a recession.

My code looks like this:

probit rec l(1).rec l(4/6)d.m3 l(4/8)d.unr l(6/12).indprod l(3/5).fedr if tin(1985m1, 2003m1)

predict rec_hat if tin(2003m2, 2020m2)

tsline rec rec_hat if tin(2003m2, 2020m2)

Note the lag periods are not optimised (but I don't want to ask too many questions on one post), but the time plot created is robust to many different specifications.

I know it seems strange to have a model which is too good, but I know there has to be something wrong for it to be forecasting so accurately. In no period should there be 0 chance or 100% chance of recession.

More Ben Stewart's questions See All
Similar questions and discussions