How do you resolve AR(1) problem in your panel regression?

16 February 2014 13 6K Report

I am trying to estimate panel (120 unit, 8year) linear model. When I use pooled OLS, difference estimation, FE or system/diff GMM, there is always AR(1) in the residuals. How do I remove this autocorrelation?

Elias Rebuge

Dear Miga Bold,

See the instrutions below:

Autocorrelation in Panel Data

Consider the panel data model with AR(1) disturbances,

Yit = Xitb + ui + eit

eit = reit-1 + vit

where |r| < 0 and vit ~ iid(0,sv2). Both the fixed effects model and random effects model are possible.

Model Estimation

The model estimation involves the following steps:

Estimating r

From a mean deviation regression, remove the individual effect ui:

Yit-Yim = Xit-Ximb + eit-eim

where Yim = åt=1,...,TYit, Xim = åt=1,...,TXit, eim = åt=1,...,Teit.

We write,

Yit* = Xit*b + eit*

where Yit* = Yit-Yim, Xit* = Xit-Xim, and eit* = eit-eim.

The one-step estimator of r is obtained by

åi=1,...,Nåt=2,...,Teit*eit-1* / åi=1,...,Nåt=1,...,Teit*2

If the panel data is unbalanced, we have a sample of data in which each individual i has Ti observations over time. To deal with autocorrelation in panel data, a further complication of unequally spaced time periods may occurred. Specifically, the data may contain observations on individual i at time tij for j=1,2,...,Ni.

For the first-order autocorrelation, set eitij* = 0 if tij-tij-1 > 0. Then the modified one-step estimator r

(åi=1,...,Nåt=2,...,Teit*eit-1*)/m* / (åi=1,...,Nåt=1,...,Teit*2)/n*

where n* is the number of nonzero elements in e* and m* is the number of consecutive pairs of nonzeros eit*s.

The efficiency of the estimator of r may be improved by using Prais-Winsten or Cochrane-Orcutt iterative method.

Transforming Data

To remove the AR(1) component in the data, the following transformation is used:

Zit** = Zit* - rZit-1* for t > 1, and

Zi1** = (1-r2)½Zi1*

Denote Yit**, Xit**, and eit** for such transformation of Yit*, Xit*, and eit*, respectively.

For unbalanced unequally spaced panel data, the transformation must be modified to reflect the missing data.

Fixed Effects Model

The within estimator of b can be obtained by mean deviation regression for the transformed model:

Yit** = Xit**b + eit**

Random Effects Model (Baltagi-Wu GLS)

Similar to the weighted mean deviation regression for estimating the classical random effects model, Baltagi and Wu (1999) developed a GLS method for unequally spaced panel data with AR(1) autocorrelation.

Hypothesis Testing: r = 0

LBI test statistic is described in Baltagi and Wu (1999).

Elias Rebuge

To understand better see the attached file:

Robert Kunst

Like in all regression modeling, reacting to autocorrelation in residuals by GLS-type procedures is usually inefficient, as it implicitly imposes identical dynamics on all covariates and on the response. Dynamic modeling, in this case dynamic panel regression, cannot really be avoided. It is challenging, because of severe bias effects (Nickell bias) and because of the complexity of the required procedures (such as Arellano-Bond) but it is worth the effort.

Marco Vivarelli

One way out is to use GMM-SYS (better than GMM-DIF when persistence is particularly pronounced) and use instruments lagged more that 2 periods, ultil the LM test does not reject the null of absence of AR (3). However, the problem is not to have the AR(1) test significant, but to have the AR(2) test significant.

Norbert Schanne

I agree with Robert that most likely you should include a lagged dependent variable... alternatives to GMM estimation are either bias correction (see e.g. Hahn/Kuersteiner 2002, Bun/Carree 2005) or a factor model (Lee 2013).

In the unlikely case that your problem is only autocorrelation in the residuals (and not an omitted lagged dependent variable), you could use HAC estimates for the standard errors or rely on bootstrap inference.

Herve Alexandre

B. H. Baltagi and P. X. Wu, "Unequally Spaced Panel Data Regression with AR(1 ) Disturbances," Econometric Theory 15, 814-823, 1999

T. Daniel Coggin

See Donggyu Sul, "X-Differencing and Dynamic Panel Model Estimation," Econometric Theory, 2014, 30(1), 201-251 (with Chirok Han and P.C.B. Phillips) for a very recent approach. Sul posts a copy on his website http://www.utdallas.edu/~dxs093000/papers/Recent%20Working%20Papers1.htm .

Phoebus Dhrymes

It would not be a surprise, if you are observing the same agent over time in a panel series, to have autocorrelation. In fact it should be expected. Your response, however, should not be to fall back on ready made solutions. It does not appear to me, at a substantive level, that we are advancing things by introducing lagged engogenous variables. Given the inertia that characterizes human actions that will obscure everything else.

If T, the number of time observations, is finite and small, while N, the number of agents is large, why not treat the problem as one of multiple regressions (using centered data) but not the clumsy difference method. This will allow for unrestricted autocorration which is estimable and thus allows efficient results, using seemingly unrelated regressions (SUR) methods. But don't get involved with GMM, IV and other terms which will only confuse the sitution.

This would not work if N is small andf T is large

John Pitsopoulos

Actually you can use in Stata the command xtgls (dependent variable) (independent variables), corr(ar1) and it takes into account AR(1) with the same autocorrelation parameter in each panel

Satriyo Budi Cahyono

You may try to remove outliers, by winsorize or trim your data set. I ever tried that way, some of them work. But not all, of course.

Aneta Błażejowska

John Pitsopoulos how should I know when use corr(AR1) or corr(psar1)?

Ali Maâlej

Hello,

I have a problem about choosing between panel ar1 and ar1 specific

xtgls var-dep var-exp, panels (corr) corr (ar1)

xtgls var-dep var-exp, panels (corr) corr (psar1)

how can we choose between the two estimates.

Cordially, yours.

Swapnil Soni

I would suggest employing a test to assess if the model coefficients remain significant even after a violation of regression assumptions (non-autocorrelation, homoskedasticity etc). You can estimate the model using (NW) robust standard error. This can be done in R using following function:

plmmod

How to increase DNA yield?

Image-to-Physical Liver Registration Sparse Data Challenge for Image Guided Surgery and Intervention

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

Explain theoretically and with the aid of an example the concept of equation linear and not linear in variables and parameters?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?