I wanted to fit a regression model of the type Yi = α+ β1 X1i + β2 X2i +ui which is basically a cross-section model. However, as the dependent variable, Y is generated through past values in my research, I consider to add one or more lagged values of Y in the model as explanatory variables. Thus, I am considering to reformulate the regression model as- Yi = α+ β1 X1i + β2 X2i + µj ∑Yj +ui where j means number of lags. The number of lags might be 1, 2 or any other that maximizes goodness of fit. Intuitively, the model will suffer from multicollinearity as lagged Y and Xs are correlated. I am wondering whether it is a valid model or not!
Could anybody help me with this model and suggest which estimation method would be appropriate for it?
Thanks in advance.