I am trying to estimate panel (120 unit, 8year) linear model. When I use pooled OLS, difference estimation, FE or system/diff GMM, there is always AR(1) in the residuals. How do I remove this autocorrelation?
If the panel data is unbalanced, we have a sample of data in which each individual i has Ti observations over time. To deal with autocorrelation in panel data, a further complication of unequally spaced time periods may occurred. Specifically, the data may contain observations on individual i at time tij for j=1,2,...,Ni.
For the first-order autocorrelation, set eitij* = 0 if tij-tij-1 > 0. Then the modified one-step estimator r
where n* is the number of nonzero elements in e* and m* is the number of consecutive pairs of nonzeros eit*s.
The efficiency of the estimator of r may be improved by using Prais-Winsten or Cochrane-Orcutt iterative method.
Transforming Data
To remove the AR(1) component in the data, the following transformation is used:
Zit** = Zit* - rZit-1* for t > 1, and
Zi1** = (1-r2)½Zi1*
Denote Yit**, Xit**, and eit** for such transformation of Yit*, Xit*, and eit*, respectively.
For unbalanced unequally spaced panel data, the transformation must be modified to reflect the missing data.
Fixed Effects Model
The within estimator of b can be obtained by mean deviation regression for the transformed model:
Yit** = Xit**b + eit**
Random Effects Model (Baltagi-Wu GLS)
Similar to the weighted mean deviation regression for estimating the classical random effects model, Baltagi and Wu (1999) developed a GLS method for unequally spaced panel data with AR(1) autocorrelation.
Hypothesis Testing: r = 0
LBI test statistic is described in Baltagi and Wu (1999).
Like in all regression modeling, reacting to autocorrelation in residuals by GLS-type procedures is usually inefficient, as it implicitly imposes identical dynamics on all covariates and on the response. Dynamic modeling, in this case dynamic panel regression, cannot really be avoided. It is challenging, because of severe bias effects (Nickell bias) and because of the complexity of the required procedures (such as Arellano-Bond) but it is worth the effort.
One way out is to use GMM-SYS (better than GMM-DIF when persistence is particularly pronounced) and use instruments lagged more that 2 periods, ultil the LM test does not reject the null of absence of AR (3). However, the problem is not to have the AR(1) test significant, but to have the AR(2) test significant.
I agree with Robert that most likely you should include a lagged dependent variable... alternatives to GMM estimation are either bias correction (see e.g. Hahn/Kuersteiner 2002, Bun/Carree 2005) or a factor model (Lee 2013).
In the unlikely case that your problem is only autocorrelation in the residuals (and not an omitted lagged dependent variable), you could use HAC estimates for the standard errors or rely on bootstrap inference.
See Donggyu Sul, "X-Differencing and Dynamic Panel Model Estimation," Econometric Theory, 2014, 30(1), 201-251 (with Chirok Han and P.C.B. Phillips) for a very recent approach. Sul posts a copy on his website http://www.utdallas.edu/~dxs093000/papers/Recent%20Working%20Papers1.htm .
It would not be a surprise, if you are observing the same agent over time in a panel series, to have autocorrelation. In fact it should be expected. Your response, however, should not be to fall back on ready made solutions. It does not appear to me, at a substantive level, that we are advancing things by introducing lagged engogenous variables. Given the inertia that characterizes human actions that will obscure everything else.
If T, the number of time observations, is finite and small, while N, the number of agents is large, why not treat the problem as one of multiple regressions (using centered data) but not the clumsy difference method. This will allow for unrestricted autocorration which is estimable and thus allows efficient results, using seemingly unrelated regressions (SUR) methods. But don't get involved with GMM, IV and other terms which will only confuse the sitution.
Actually you can use in Stata the command xtgls (dependent variable) (independent variables), corr(ar1) and it takes into account AR(1) with the same autocorrelation parameter in each panel
I would suggest employing a test to assess if the model coefficients remain significant even after a violation of regression assumptions (non-autocorrelation, homoskedasticity etc). You can estimate the model using (NW) robust standard error. This can be done in R using following function: