The vast majority of economic relationships have a dynamic nature and features. It means that the present value of selected the variable strongly depend on its own lagged values.
Dynamic panel models contain dependent variable with one or more lags in according with its characteristics ...
There is no difference between static panel data and dynamic panel data. However, there is a fundamental difference between static and dynamic models used to analyse panel data.
Mehmed's answer gives the main point. To elaborate a little. The lags of the dependent variable contain the entire time path of the independent variables. To satisfy yourself of this statement in an intuitive way:
(i) write the most simple dynamic model (you can ignore the constant, the error tem and the group or cross-section subscript; these make no difference to the time-series logic of the argument:) Yt=aYt-1+ b*Xt (Eq.1);
(ii) substitute for Yt-1=aYt-2+bXt-1 in (Eq.1);
(iii) substitute for Yt-2 ... and so on.
After a couple of substitutions you will see that the influence of the lagged dependent variable tends towards zero but that the influence of the independent variable increases by way of an increasingly complex compound coeffcient. The economic intuition corresponding to the algebra is that "history matters": i.e. the depndent variable (X) is influenced not only by the current value of the independent variable (Xt) but also by values of the independent variable in the past (Xt-1, Xt-2, ... and so on). Consequently, in a dynamic model, the estimated coefficient on the current value of the indpendent variable (Xt) measures only the short-run or impact effect of X on Y. The long-run effect is larger, because it takes account of both the current and the lagged effects of X on Y. In general, the long-run effect of X on Y may be calculated after estimating Eq.1 as b/1-a (in Eq.1, drop the time subscripts, gather Y terms on the left, factorise and divide through by 1-a).
Why does this matter for panel analysis? There are both statistical and economic reasons.
1) Static models are (almost) always mispecified, because the within-group error terms are serially correlated, thereby invalidating both point estimates and statistical inference. Conversely, dynamic models tend to be correctly specified, because the dynamics are in the estimated part of the model rather than displaced into the error terms, which invalidates static FE/RE estimation.
(2) Dynamic models are much richer in economic content by virtue of being able to distinguish short-run and long-run effects of independent variables on dependent variables (this is the key to - for example - error-correction modeling).
First of all, you may justify just based on your research question's nature that is going to be analyzed, is it dynamic or not? That includes that the present value of your dependent variable is affected by the past values. It has to be from theoretical standpoint [though people has usual misunderstanding in this regard]. Hence you apply dynamic panel approach.
A static panel approach may applied in otherwise.
So, in dynamic panel we use lagged value in compare to the static panel. Dynamic panel is more advanced approach, however there are steps and criteria to employ dynamic panel model.
The following papers have utilized both approach, please refer as example;
I read all the discussions above.but I could not understand when we want to use dynamic models instead of static model?? And if one or two independent variables have to convert in to first difference to make these series stationary is it a matter to run static models??
When analysing panel data, I sugggest that you estimate a dynamic model by difference or system GMM
(1) when the time series dimension is short (say 15 or fewer periods - but it is difficult to be precise) and the cross-section dimension wide (say 20 groups or more - but, again, it is difficult to be precise);
(2) when it is useful to distinguish between the short-run (impact) and the long-run effects of changes in your independent variables (i.e. when you want to analyse rather than ignore the dynamics in your data); and, above all,
(3) when diagnostic testing of the within-group errors (residuals) from a static model reveals serial correlation, which means biased and inconsistent estimates and invalid inference (in Stata, the user-written programme xtserial implements this test for fixed effects models).
Above all, be guided by diagnostic testing. Static models estimated on the levels of variables can be expected to display serial correlation in the errors. As with any diagnostic failure, this is not a disaster. It is an instuction to respecify your model; in this case, as a dynamic model (i.e. one including one or more lags of the dependent variable among the regressors to capture the dynamics - relationships across time - in your data).
As for stationarity, with a short time-series dimension in your panel data, this is not something to worry about. The system GMM estimator is robust to non-stationarity (but not to serial correlation - remember to test for serial correlation in the errors). In any case, standard unit root tests cannot yield valid results from short time series. However, if your panel data has long time series (say, 25 periods or more) then stationarity does matter, but in this case you must use a different type of estmator (i.e. one suitable for panel time-series data, such as the mean group or pooled mean group estimator). Remember, there is no generally valid estimator: for panels with short time series (T) and wide cross sections (N), difference and system GMM estimation of dynamic models is generally appropriate; however, these estimators are not appropriate for long-T, narrow-N panels (i.e. panels in which stationarity/unit roots are likely to be an issue).
Manila asks: "Can the GMM estimation method be used in static panel data? In my dataset, I do not have any lagged variables." My reply: yes, as far as I know. As long as you have sufficient time-series depth in your panel, you can use - for example - either difference or system GMM estimation. The difference estimator instruments first differences of potentially endogenous variables by lagged levels of variables in your model; while the system estimator combines a model estimated in differences (with potentially endogenous differenced variables instrumented by lagged levels, as in difference estimation) with a model estimated in levels (potentially endogenous variables in levels instrumented by lagged differences). These are particularly useful approaches for the estimation of dynamic models, because the lagged dependent variable - which is the defining feature of dynamic models - is endogenous by construction. However, GMM estimation can just as well be used to estimate static panel models where one (or more) independent variables is (are) potentially endogenous. What matters is the availability of "internal" instruments (i.e. instruments formed from lags of the variables you have in the data set). Of course, internal instruments can also be combined with external instruments (if you are lucky enough to have such instruments available).
I hesitate to lay down rules about minimum sample sizes, if only because the variability in the data might to some extent compensate for small sample size. However, in this case, I think the safest answer to your question is no! Dynamic panel analysis is very data hungry; put more formally, all the properties of these estimators are asymptotic (i.e. worked out for infinite size samples, which - in practice - means that even samples with many hundreds of observations may be regarded as small). From memory (please check this) such studies as there are of the small sample properties of dynamic estimators do not reach down as small as T=15 and N=6. Moreover, dynamic panel models are suitable for wide-N and shallow-T panels. Remember also the problem of "too many instruments" and the loss of observations from lagging and differencing. I suggest a much simpler estimation strategy. Difference your data to remove the dynamics (check this by testing for serial correlation in the group residuals - you can use the Stata user-written programme xtserial for this). Then estimate a static model with fixed effects. If the joint significance of the fixed effects is not rejected, then this is your model (a static FE model). If the fixed effects are not jointly significant - as is likely after first differencing - then estimate a pooled model by OLS. OLS has known and good small sample properties.
Geoffrey Thomas Pugh dear geoffrey , first thank you for taking time for reply to all the question and being much much helpfull , i have a question i hope you would answer , am dealing with panel data N 24 - T 14 , i run pooled regression to check , it appears that my Durbin-Watson is below the 2 which strongly indicates serial correlation, after adding a lagged dependent variable it shows that the problem has been solved. next step i run hsiao test to check for homogeneity , its shows that my data is heterogeneous, i checked some papers , it appears a lot of them ignore this step under the assumption that the data taken from the same effect ( industrial firms data ) , and i read one in research gate says, check for suitable model by hausman test and check which of the models is significant random or fixed , after follow a GMM approach , would you recommend that ? and i read also that heterogeneity means that we cannot just run pooled OLS and the Fixed effect and random effect deals with heterogeneity fine , is that a valid information ? i hope you answer and give me a recommendation to clear my confusion
You are using a good range of diagnostic tests to identify the approach to estimation that best suits your data. That is good practice. Specifying your model with a lagged dependent variables defines it as a dynamic model. In the presence of slope heterogeneity, you need a dynamic estimator to address the problem of the endogeneity of the lagged dependent variable, which exists by construction (so you do not have to test for it). Once you have got this far, you will probably estimate your model by either difference or system GMM (and the best way to do this is by using David Roodman's wonderful user-written programme for Stata, xtabond2, which is free to download). In this case, the Hausman test is no longer applicable. Once you have specified your model with a lagged dependent variable and have tested for and found - or just assumed - slope heterogeneity, you will be using an estimator - difference or system GMM - that is a species of random effects (RE) model (i.e. the group-specific or fixed effects are a component of the error term, not separately estimated group dummy variables). To sum up. If your diagnostic testing reveals that you need to specify a dynamic model, then there is no decision to make regarding fixed or random effects estimation. The difference and system GMM models are both specified with random effects. If, on the other hand, your diagnostic testing reveals that you do not need to specify a dynamic model, then there is a choice to be made between FE and RE estimation of a static model; and here the Hausman test is useful.
I am trying to analyse the panel data but before that, i have done various diagnosis test such as normality (skates), linearity(Ramsey test), homoscedasticity(hottest) and serial correlation(xtserial). The result shows that my data is not normally distributed, not linear, heteroskedastic and existence of serial correlation. The problem now is that i do not know how to solve the serial correlation issue in panel data method. In addition, i can not apply GMM as the data set is small (only 900 observations and lack time-series depth). Similarly, i can not run 2SLS as I do not know the instrument variables. Therefore, I was just wondering if any one can suggest some solution. Some suggested and applied FGLS but i understood that FGLS is applicable only if the data has large T and small N. So can not use FGLS. Some of the people have suggested panel analysis (FE or RE) with robust cluster standard error in STATA forum. But still not sure so could you please let me know what is the best solution?
This answer to Sanjib and Radhouene (above) focuses on diagnostic testing of panel models. It considers alternatives to estimating a dynamic model should you fail to reject the null hypothesis of no within-group residual serial correlation. In particular, it comments on the possibility of estimating a static model either with
(i) heteroskedasticity and autocorrelation (HAC) robust standard errors or with (ii) cluster-robust standard errors.
I take it that your diagnostic testing was conducted by simply pooling your data, thereby treating it as one big cross section. The following relates to panel testing and estimation.
Testing for linearity in a panel context is complicated by having to take into account both “within group” and between group” variation. There is some work at the frontiers on testing panel data models for linearity, but – as far as I know – nothing in general use. The following offers some useful practical advice:
Regarding non-normality, the same variable transformations (especially into natural logarithms) that can help to induce linearity may also help to induce normality. In a large dataset (at least several hundred observations), non-normality should not impair either estimation or inference.
Heteroskedasticity is not much of a problem, as long as you estimate with either (i) heteroskedasticity robust standard errors or (ii) (better) with cluster-robust standard errors. The latter are available for all of Stata’s panel estimators.
This leaves serial correlation. This is easily tested in a fixed effects framework using the Stata user-written programme xtserial. Usually, you will find serial correlation (I would be surprised if you did not). Having failed to reject the null of no serial correlation, you must make a major strategic choice on how to develop your model.
1. Accept that evidence of serial correlation is evidence that a static model is misspecified, because it suffers from omitted dynamics. In other words, there are dynamics – i.e. relationships of dependency over time – in your error term, which means that the dynamics are unobserved and, worse, a source of bias and inconsistency in both estimation and inference. In this case, the solution is to specify a dynamic model – i.e. a model with one or more lagged values of the dependent variable among your right-hand side regressors – and to choose an appropriate estimator. (By the way, 900 observations should be sufficient for GMM estimation; in any case, it is the variation in the data that matters not only the number of observations.) A dynamic model may also yield additional economic information of interest (e.g. a measure of persistence in your dependent variable, which also enables you to distinguish between the short-run impact effects and the long-run effects of your continuous variables). Please see my contributions above for further discussion of this strategy.
2. Transform your variables into first-differences. If your variables are in natural logarithms, then the first differenced variables are the (percentage) growth rates (and so economically meaningful). First differencing may also remove fixed effects – whether group-specific dummy variables (fixed effects) or group-specific fixed components in the error terms (random effects) – which greatly simplifies estimation (in this case, you can use OLS regression).
3. Accept the evidence of serial correlation in the residuals (never disregard evidence!) but decide that the unobserved serial correlation in the residuals is just a nuisance. In this case, you are deciding that the omitted dynamics are of no potential interest but, rather, just present a problem to be “fixed up”. And the fix is – as others have suggested – to estimate either with (i) heteroskedasticity and autocorrelation (HAC) robust standard errors or with (ii) cluster-robust standard errors. In general, I do not like such an approach. However, it may be justified if you believe on reasonable grounds that the dynamics are not interesting from an economic point of view, so that they might as well remain unobserved in the error term. This strategy is possibly also acceptable if your data lack time-series depth and you cannot afford to lose degrees of freedom (each lag of the dependent variable reduces the data available for estimation by one period). It might also be the case that your variables have gone through one or more transformations (e.g. being transformed into ratios, being weighted, and/or being subtracted from other variables to create differences) so that the additional complexity of dynamic modelling adds more costs than benefits. I have indicated the “benefits” of dynamic modelling compared to static modelling above. However, the “costs” include: (i) the substantial complication of adding an endogenous variable to your model (even in fairly large datasets – e.g. 900 observations – dynamic estimates can be notoriously unstable with respect to both specification and instrumentation); and (ii) in principle, although dynamic modelling is defined by the inclusion of the lagged dependent variable among the regressors, the origins of dynamic modelling in the time-series literature suggest that a fully-specified dynamic linear regression model should include one or more lags of each (continuous) independent variable (see Spanos, A., 1986, Statistical Foundations of Econometric Modelling, Cambridge, Cambridge University Press, p.530-31). In practice, such a model would probably be hopelessly over specified for most dynamic panel models, especially if the number of observations is limited.
Finally, you mention 2SLS. In a panel context, potential endogeneity of one or more regressors should not be too difficult to deal with. Unlike cross-section models, you will not need external instruments (although external instruments can be used if available). Using GMM, you can estimate either a static or a dynamic model with potentially endogenous variables using lagged levels and/or lagged differences as instruments.
Thank you so much Professor Geoffrey Thomas Pugh .
Please, I have another circonstance where I am dealing with unbalanced panel data with unequal spacing in time variables measures. In the litterature, it is recommended to use the "antedependence model", an extension of the lagged-response model, where the coefficient associated with a lagged response is occasion specific γi (accommodating unequal spacing in time).
Unbalanced panel data should not be a problem for most panel estimators and tests. Even when lack of balance can be a problem, there are sometimes alternatives that can be used with an unbalanced panel (for example, panel unit root tests - some require a balanced panel, some do not). The problem of unequal spacing is taken care of by Stata's xtset command. When using panel data, always first xtset your data. In some circumstances, gaps can be a problem but, again, there are sometimes alternative approaches (e.g. in estimating a system GMM model, instead of instruments in first differences so-called "forward orthogonal" instruments can be used). I am not familiar with the "antedependence model", so make no comment on this.
Geoffrey Thomas Pugh Thank you so much. I've tried the GMM system through the xtabond2 command (with forward orthogonal deviation) . However, again there is an issue is that the command does not work unless the time variable is recoded 1,2 and3 ... This assume that the measures are equally time spacing which is not the case. The need for time recoding is existing with other commands (e.g. xtdpdml, xtdpdqml ). I've done that however, the coefficients might be biased ( usually inflated standard error).
Thanks for the reply. I have tried PCSE but it seems that the PCSE does not work if the time periods are not common for all panels. Following error appeared when i run xtpcse DV IVs.
Number of gaps in sample: 106
no time periods are common to all panels, cannot estimate disturbance
covariance matrix using casewise inclusion
In addition, I have also tried xtscc which also solves the cross-section dependency issue but still, it does not work on my data set. Not sure why. Following error appeared.
Geoffrey Thomas Pugh I found your answers very informative and would appreciate your advice on the challenge I am facing. I have a dataset where N > T, but the time-series of the dependent variable is 4 years and that of some independent variables is 9 years. I also have an instrument variable in my dataset to address the potential endogeneity between dependent variable and a key independent variable.
1. Would a dynamic panel model be appropriate to use?
2. How to identify serial correlation both in the dependent variable and independent variable(s)?
3. Which dynamic model estimator would be appropriate to use?
The issue raised by bg Gulati is a common one - i.e. what to do when all the data available yields only a (very) small panel. Of course, if you do not have all the data available, then get it! But I'm sure that you will have already thought of that. Thereafter, the first step is to recognise the limitations of your data. In particular, recognise that is is only least squares estimators that have known and good small sample properties (i.e. that can yield unbiased, consistent and efficient - i.e. reasonably precise - estimates). In a panel context, this limits you to either pooled OLS (in effect treating your panel as a single cross section) or to fixed effects (FE) estimation. In practice, the former is not recommended, because it assumes that all of the cross-section groups are homogeneous, which is unlikely. FE estimation, however, allows for group heterogeneity by including a dummy variable for each group in the model. Strictly speaking, FE estimation should be preceded by diagnostic/misspecification testing. However, the value of such tests is probably limited in small samples. In this case, (i) transform your variables into logs (possible as long as you do not have zero or negative values) which may induce improved statistical characteristics with respect to the normality and homoskedasticity of the residuals and (ii) difference the data (the first difference of a log series gives the approximate growth rate of the variable, which is often a sensible or even preferred economic measure), which should take care of serial correlation in the group-level residuals (you can assume the presence of serial correlation among the residuals from an equation specified in levels). (Moreover, estimating a model in growth rates might attenuate problems of endogeneity.) (The standard test for serial correlation in FE estimation is Wooldridge's xtserial, which is available as a user-written programme in Stata.) This first-differencing strategy may yield modest but useful results. In contrast, you cannot obtain sensible results from a small dataset by applying estimators whose characteristics with respect to unbiasedness, consistency and efficiency depend on asymptotic results - i.e. on large samples, which in practical terms means a minimum of many hundred observations and preferably many thousand or more. Unfortunately, therefore, the small size of your dataset rules out the use of all dynamic and instrumental variables estimators. No dynamic or IV approach is applicable to a panel comprising 36 observations.
Dear Geoffrey Thomas Pugh, is it appropriate to perform panel unit root tests for series under 20 years in studies where GMM methods are applied? As far as I can see, in some of the studies conducted on the GMM model, although the number of vehicle variables is higher than the number of countries, the model is considered valid if the number of observations is large. Do you think this is the right approach?
Dear Hikmet. With 20 years of annual data, I would certainly perform panel unit root tests. This is for at least two reasons. (1) To get an idea of the statistical generating mechanism of your data - i.e. its dynamics (whether it is stationary, containing a unit root, a unit root in the presence of drift and/or a deterministic trend, etc.) - it is not just the number of observations that is important but also (and more important) the span of the data, so that panel unit root tests applied to 20 annual observations will be more informative than if applied to, say, 20 quarterly or weekly (etc.) observations. (2) 20 years should be sufficient to reveal long-run characteristics of the series, although not necessarily sufficient to take account of major structural breaks if these are close to the either the beginning or end of the series. As for the next part of your question - " in some of the studies conducted on the GMM model, although the number of vehicle variables is higher than the number of countries" - I do not think that either difference or system GMM modelling are appropriate, because these are "wide-N" estimators (i.e. they are appropriate for datasets with very large numbers of cross-section units). If the number of cross-section units is small, then consider another class of model - i.e. panel time-series estimators such as the mean group and pooled mean group approaches.
dear Geoffrey Thomas Pugh, thank you very much for your reply. In models where the number of vehicle variables exceeds the number of countries, the probability values of Sargan tests are usually 1,000. This situation is criticized by many referees.
A Sargan (or Hanson) test yielding a p-value of 1.00 is not - repeat not - evidence of non-rejection of the null of (overidentifying) instrument validity. Rather, it is a sign that the test is so weak (perhaps because of insufficient observations) that it can never reject the null. In other words, p=1.00 is telling you that the test is useless.
Dear Geoffrey Thomas Pugh, if I understand you well, in my case with dynamic model explaining inflation by weather variables with n less than 15 and t=19 it would be preferable to use the mean group and pooled mean group approaches. I'm currently applying GMM model (xtabond)
I do not think that a definitive answer is possible. So much depends on the characteristics of your data. For example, is there a significant degree of autocorrelation in your dependent variable? If so, then any static model is misspecified (dynamics are omitted by definition) and thus likely to yield biased estimates. So I would start by testing a fixed effects model for serial correlation (autocorrelation) of the within-group errors. (The Stata user-written programme xtserial will do this for you.) If you cannot reject the null of of no serial correlation in the residuals, then estimate a static fixed effects model. However, if, as is likely, the null is rejected, then you can either attempt to eliminate the dynamics by first-differencing your model (but continue to check for error autocorrelation) or specify a dynamic model. The problem with first-differencing is loss of information. However, if you first transform your variables into natural logs and then take first differences, you will have a model in growth rates. It is up to you to decide whether or not this is economically sensible. If you specify a dynamic model, then probably your best choice will be to estimate by system GMM (I would recommend the excellent user-written programme for Stata, xtabond2). However, this is a "wide-N" approach and you have very few cross-section units (countries). If you have a high level of variation in both the time-series ("within") and the cross-section ("between") dimensions then it may be possible to estimate a dynamic model. However, you must pay careful attention to (i) the model diagnostics and (ii) be aware that difference and system GMM estimates can be wildly sensitive to even small changes in instrumentation and other model specification choices. With your relatively small dataset, I suggest doing everything you can to limit the number of instruments used in estimation (in xtabond2, you can do this by limiting the number of lags used to form instruments, using the "collapse" command - this is especially useful - and by factorising instruments using the "pca" command). Unfortunately, there is (to the best of my knowledge) no reliable automated way to optimise the instrument set; so you will have to do a lot of trial and error to find a valid instrument set. If the instruments are not valid, then the model estimates are likewise invalid.
@Geoffrey Thomas sir, I have panel data of BRICS Countries from 1996-2020 and I found presence of cross sectional dependency. Plus variable are I(1) or I(0) . To the best of my knowledge, The suitable technique for this data is cs-ardl but main concern is small N. How should I proceed please suggest ?
Any econometric models exist at two levels: (i) as a statistical model (or generating mechanism) that has to make assumptions about the data (e.g. regarding the exogeneity/endogeneity of the independent variables, the distribution of the errors, dependence/independence of the error terms, and so on); and (ii) as an economic model in which estimated parameters can be interpreted for their economic meaning. In any econometric model, the statistical level is fundamental, because if the statistical assumptions of the model do not hold in the data then any economic interpretation must be more or less invalid. Hence, you must use whatever diagnostic tests and checks are available to investigate the validity - or otherwise - of the statistical assumptions of your model. If one or more tests indicate that one or more assumptions is not justified, then you should respecify your model. I gave an example in my previous post. If you estimate a static fixed effects model then you should always test for serial correlation among the within-group errors. If the test rejects the null of no serial correlation then you need to respecify your model to account for the omitted dynamics (i.e. the dynamics that by definition are excluded by a static specification).
Unfortunately, to the best of my knowledge, there is no single textbook treatment that covers the whole range of diagnostic tests that you will need to specify a panel model with good statistical characteristics. However, I attach three PowerPoint lectures from one of my own courses on panel modelling together with an accompanying Word document on dynamic panel modelling . These cover much of the ground in a (more or less) non-technical manner. The first lecture introduces panel modelling; the second focuses on diagnostic tests for fixed and random effects models (including xtserial); and the third introduces dynamic panel modelling by difference and system GMM. Finally, the Word document attempts to give a connected account of dynamic panel modelling, including the available diagnostic tests. Beyond these teaching materials, use the references in the Word document. Any of the textbooks should be fine for static panel modelling (fixed and random effects models). For dynamic panel modelling, I recommend two brilliant articles by David Roodman, who developed the Stata user-written programme xtabond2, which implements everything discussed in these two papers:
Roodman, D. (2009a). How to do xtabond2: An introduction to difference and system GMM in Stata. The Stata Journal, 9(1) pp.86-136.
Roodman, D. (2009b). A Note on the Theme of Too Many Instruments. Oxford Bulletin of Economics and Statistics, 71(1) 135-158.