I already used Arellano and Bond estimator, however as far as I know this approach does not fit well with small samples. Recently, i read that Blundell and Bond estimator is better for these kind of panels.
I agree with Daniel, however it depends on whether you want to specify a dynamic model or not.
In the static case, you can use a standard within estimator to capture fixed effects, which will provide consistent and unbiased estimates for your regressors of interest. Note that because N is small if you specify a Least Squares Dummy Variable estimator, the actual estimates of your fixed-effects will be inconsistent, so you cannot draw inference from these.
If you wish to specify a dynamic panel data model however, then typically you would want to attempt the system/diff gmm estimators to deal with the substantial Nickel bias under small-N panels. The problem you have however is that the number of instruments used by these estimators explodes with T. if N is not >>T then your tests of instrument validity will be invalid (perfect Hansen-J stat of 1) and you may have overfit your endogenous lagged dependent variable, therefore removing the endogenous component.
As a result, in the dynamic case your choice is either to pool the data as Daniel outlines, or to use a standard fixed-effects estimator which may introduce some bias but this may be preferable to omitting the fixed effects.
These are both very good answers. The dataset is two small in either dimension for any approach other than OLS, which remains the only estimator with known small sample properties. So, try OLS with (i) pooled data, (ii) pooled data with group fixed effects (within estimation) and, if you must, (iii) try both of these approaches with a lagged dependent variable. The latter will yield estimates that are biased and inconsistent, but that will still be more useful than anything from an estimator with properties known only asymptotically. One more point to bear in mind. In such a small sample, default standard errors will be more valid than cluster-robust standard errors, which depend on large numbers of clusters. And do not over-interpret your results!
You can apply the difference GMM/system GMM technique to estimate the model for panel data with small T and large N. You can use the Eviews software. Open the data set using Eviews and then select the option Quick and then select Estimate equation, and then select GMM method and then click the dynamic panel wizard and finally select the dependent variable, after then input the regressors. Select the next button and then select the difference option and then select next option. If you are interested in difference GMM then put the variables in the transform(difference) box. If system GMM then put only the variables which are transformed in transform (difference) box and the variable which are not transformed put in a no transformation box.
I would take issue with Sharif. GMM is a large sample approach; hence, unknown - and probably bad - small sample properties. The issue is not one of software but of the assumptions made in deriving the estimation method. With such a small dataset, as I argue above, it is OLS or nothing.
I agree with all the colleagues above. N=8 is too small to work with System-GMM (I assume you need to estimate a dynamic model). I am afraid of, as Prof. Pugh points out, OLS or nothing!
I agree with first post (Daniels' one). The dimension of your panel justifies by itself an OLS approach. There is nothing more to do. Searching for an extended justification, I would say that a fixed estimator would need degrees of freedom, ones you do not have enough.
Hi Alberto, as suggested the OLS estimator is your 'optimal' approach.... because both N and T are small. If you apply the GMM estimator (which is designed for large N and small T) whatever estimates obtained will be grossly inconsistent!
Sorry to reply you all so late, but I really appreciate your valuable and useful explanations. I will take into account to apply a pooled OLS as the best methodology for my research.
Henk has a point. In principle, strong prior information may compensate for a small sample size. However, the key word here is "strong". Without very solid priors, econometric basics continue to hold. In particular, among the range of
econometric approaches available to applied researchers, it is only pooled OLS and Fixed effects models estimated by OLS that have known and favourable small-sample properties. In turn, these depend on a range of assumptions that, in turn, may be tested by standard diagnostics (e.g. error terms that are normally distributed, homoskedastic and free from serial correlation, and that there really is a linear relationship in the data between the dependent and independet variables). Beyond OLS, I would need a lot of convincing that choice of technique can substitute for data. In the case of a very small dataset - such as the one mentioned in the question initiating this discussion - I would follow the advice of those contributions suggesting estimation by OLS.
Henk Folmer Hi Prof. Folmer,I would like try the Bayesian approche to analyse panel data. I am wondering whether you can recomemd some textbooks, papers and there are some R-packages to do it?
Jaya IGNM, and Folmer H (2019) Bayesian spatiotemporal mapping of relative dengue disease risk in Bandung Indonesia, Journal of Geograpical Systems (on line) and the references therein
In a random effects model, covariates must not be correlated with any potentially omitted variables. Unless this possibility is tested and addressed (e.g. by appropriate instrumentation), estimates are likely to be biased and inconsistent and inference invalid. So, be careful!
In reply to Yuheng Ling, yes - a fixed effects model might well be appropriate. As I said in my reply of September 8th (above), "among the range of
econometric approaches available to applied researchers, it is only pooled OLS and Fixed effects models estimated by OLS that have known and favourable small-sample properties". In contrast, random effects estimators are most unlikely to be appropriate in a small sample.
In reply to Sau Mai, here the standard dynamic panel estimators (estimated by difference and system GMM) may not be appropriate for at least two reasons: (i) T is relatively long, giving rise to a proliferation of weak instruments (although their number can be reduced); and, (ii) mainly, because N is small. Consequently, you might usefully investigate the use of panel time-series estimators, such as the "mean group" and "pooled mean group" approaches, both of which can reasonably be applied to a dataset of your dimensions.
Geoffrey Thomas Pugh I have read your replies and they are so well explained and detailed. I have another question. I'm using panel data which includes N = 65, T = 15. I intended to deploy one or another GMM-type approach. But then I had to split entities into subgroups in order to examine their behavior, which leads to the small N of each subgroup.
So, as far as I'm concerned, I just only can deploy approaches that you mentioned (i.e., pooled OLS or fixed effect) to work with these subgroups. Is there any other option?
Regarding panel data, I recommend the following https://cran.r-project.org/web/packages/plm/vignettes/plmPackage.html. Though designed for R, it gives helpful insights into panel data econometrics. Hope this helps. Good luck!
In reply to Sang Nguyen, N=65 and T=15 should give you a good size of dataset for GMM estimation; in particular, difference- and/or system-GMM estimation of dynamic panel models. Several of my PhD student have achieved worthwhile estimates from datasets smaller than this one. However, as you note, once you split your sample, especially by cross-section groups, N falls and GMM approaches become problematic (insufficient groups and too many instruments). One approach is to reduce the number of instruments. David Roodman's (brilliant) xtabond2 (a user-written programme for Stata) offers several ways to do this. However, be aware that GMM estimation can become very fragile as the number of observations in ratio to the number of instruments falls: i.e. restrict the instrument set one way and you obtain one set of results and diagnostics; restrict the instrument set another way and you are likely to obtain quite different diagnostics and estimates. Accordingly, I suggest the following approach. Estimate using all of your data and use interaction terms to distinguish different groups of countries. For example, if you have two groups of countries - Group A and Group B - do not divide your sample and run separate regressions for each group. Instead, define a dummy variable for Group B observations (=1; so Group A observations=0). Then interact this dummy variable with your variables of interest (or, indeed, all of your independent variables). Interaction just mean multiplying them together, so that the values of the interacted variables for Group B are just the original values of some independent variable (i.e. the original values multiplied by one), and the values for Group A are all zero (i.e. the original values multiplied by zero). Then estimate your model with all the original variables (including the constant) plus your dummy variable for Group B plus all of the interaction variables that you have defined. Then, your estimates will give you directly estimated effects for Group A and derived estimated for Group B. (i) The effects for Group A are still directly estimated, given by the original constant and the estimated coefficients on the original independent variables. (ii) In contrast, the effects for Group B have to be derived - the constant for group B is the sum of the original constant and the dummy variable for Group B; and, for each variable, the Group B point estimate is the sum of the estimated coefficient on the original variable (i.e. the Group A estimate) and the estimated coefficient on the interacted variable (which estimates the difference between the Group A and the Group B effect). The computation of the standard errors on the derived Group B effect is complicated, but it is easily handled by Stata's nlcom postestimation command (I'm sure that this can be accomplished also in any modern regression package). This approach is essentially a neat way to do two or more regressions using all of your data. If you estimate with a complete set of interaction terms, then you will obtain the same estimates as you would obtain from dividing the sample. However, your estimates will be more efficient (i.e. more precise).
I broadly agree with Pugh's suggestion but the aapproach could become more complicated if you have more number of countries instead of just two, and many regressors. Then even the degree of estimation in such regressions may tend to be small.
Thank you kindly for your help, Professor Geoffrey Thomas Pugh . I will do it again with "MG" and/or "PMG" to check the results. I hope to continue to be instructed if I face any difficulty.
Sau Mai : HTH! Generally, it's important to start with EDA first, so let me recommend these works showing some basics (they relate specifically to panel data EDA and data transformation, simple yet effective approaches):
Poster A Visual Framework for Longitudinal and Panel Studies (with ...
Poster Is Cash Dividend an Everlasting Stimulus? Impact of Cash Div...
I have a panel of 12 cross-sections (N = 12) and 10 time horizons (T=10). I understand that using system GMM could be inappropriate in this case, my question is can i break the time horizon into two sub-samples of (N=12, T=5) and estimate two separate models?
@Henk Folmer thanks for response. If I may ask how will the introduction of a dummy enabled me to estimate a single model with GMM (using the Arellano-Bond/Blundell and Bond). Will this solve the problem of bias and inefficient estimators? Thanks
I agree with Henk's suggestion to use a dummy variable. However, the dummy will need to be interacted with your variables of interest (this is explained in more detail in my post of January 23rd, above, in this thread).
I'm exactly in the same situation than Sang Nguyen. My sample contains n =32 and t=19. In the overall analyze, the proliferation of instrument is due to the number of explanatory variables but can be managed by using xtabond2. In the subgroup analyze, once I split the sample, n takes less than 14 while t remains 19. the interaction with dummies of subgroups will increase the number of explanatory variable and so the number of instruments and the use of xtabond2 becomes problematic. In my case what could be the appropriate approach for subgroup analyze?
Regarding Prithu Sharma's question, if both variables of interest in the interaction term are exogenous, then we classify the whole interaction term as an iv instrument in xtabond2.
One of the major problem would be strict exogeneity of excluded/included instruments assumption, and validity checks. These involve identification of the model questions.