I have a panel data with both N and T small, 8 and 7 respectively. Is the system GMM a good methodology? If so, how can I test it?

07 May 2017 46 7K Report

I already used Arellano and Bond estimator, however as far as I know this approach does not fit well with small samples. Recently, i read that Blundell and Bond estimator is better for these kind of panels.

Hugh Dance

I agree with Daniel, however it depends on whether you want to specify a dynamic model or not.

In the static case, you can use a standard within estimator to capture fixed effects, which will provide consistent and unbiased estimates for your regressors of interest. Note that because N is small if you specify a Least Squares Dummy Variable estimator, the actual estimates of your fixed-effects will be inconsistent, so you cannot draw inference from these.

If you wish to specify a dynamic panel data model however, then typically you would want to attempt the system/diff gmm estimators to deal with the substantial Nickel bias under small-N panels. The problem you have however is that the number of instruments used by these estimators explodes with T. if N is not >>T then your tests of instrument validity will be invalid (perfect Hansen-J stat of 1) and you may have overfit your endogenous lagged dependent variable, therefore removing the endogenous component.

As a result, in the dynamic case your choice is either to pool the data as Daniel outlines, or to use a standard fixed-effects estimator which may introduce some bias but this may be preferable to omitting the fixed effects.

Geoffrey Thomas Pugh

These are both very good answers. The dataset is two small in either dimension for any approach other than OLS, which remains the only estimator with known small sample properties. So, try OLS with (i) pooled data, (ii) pooled data with group fixed effects (within estimation) and, if you must, (iii) try both of these approaches with a lagged dependent variable. The latter will yield estimates that are biased and inconsistent, but that will still be more useful than anything from an estimator with properties known only asymptotically. One more point to bear in mind. In such a small sample, default standard errors will be more valid than cluster-robust standard errors, which depend on large numbers of clusters. And do not over-interpret your results!

Suresh Chand Aggarwal

What about using Fully Modified OLS?

Sharif Hossain Md

You can apply the difference GMM/system GMM technique to estimate the model for panel data with small T and large N. You can use the Eviews software. Open the data set using Eviews and then select the option Quick and then select Estimate equation, and then select GMM method and then click the dynamic panel wizard and finally select the dependent variable, after then input the regressors. Select the next button and then select the difference option and then select next option. If you are interested in difference GMM then put the variables in the transform(difference) box. If system GMM then put only the variables which are transformed in transform (difference) box and the variable which are not transformed put in a no transformation box.

Geoffrey Thomas Pugh

I would take issue with Sharif. GMM is a large sample approach; hence, unknown - and probably bad - small sample properties. The issue is not one of software but of the assumptions made in deriving the estimation method. With such a small dataset, as I argue above, it is OLS or nothing.

Miguel Ángel Tarancón Morán

I agree with all the colleagues above. N=8 is too small to work with System-GMM (I assume you need to estimate a dynamic model). I am afraid of, as Prof. Pugh points out, OLS or nothing!

Freddy Rojas

I agree with first post (Daniels' one). The dimension of your panel justifies by itself an OLS approach. There is nothing more to do. Searching for an extended justification, I would say that a fixed estimator would need degrees of freedom, ones you do not have enough.

Bosede Ngozi Adeleye

Hi Alberto, as suggested the OLS estimator is your 'optimal' approach.... because both N and T are small. If you apply the GMM estimator (which is designed for large N and small T) whatever estimates obtained will be grossly inconsistent!

Suresh Chand Aggarwal

Let me again say that please see if instead of simple OLS, we can use FMOLS.

Freddy Rojas

Suresh, can you elaborate please?

Giancarlo Ferrara

I suggest to change you approach and to use a mixed model. It works well for Small N and T and also for unbalanced data, see Mundlak (1978).

Henk Folmer

You may want to apply a Bayesian approach

Geoffrey Thomas Pugh

What is the reasoning and/or evidence that either a mixed model or a Bayesian approach is appropriate for a dataset of the size indicated above?

Alberto Roca

Sorry to reply you all so late, but I really appreciate your valuable and useful explanations. I will take into account to apply a pooled OLS as the best methodology for my research.

Abdelhadi Benghalem

How about converting data from annual to quarter in order to increase the number of observations?

Henk Folmer

Bayesian statistics comibines prior information and sample information. Strong prior information compensates for a small sample size

Geoffrey Thomas Pugh

Henk has a point. In principle, strong prior information may compensate for a small sample size. However, the key word here is "strong". Without very solid priors, econometric basics continue to hold. In particular, among the range of

econometric approaches available to applied researchers, it is only pooled OLS and Fixed effects models estimated by OLS that have known and favourable small-sample properties. In turn, these depend on a range of assumptions that, in turn, may be tested by standard diagnostics (e.g. error terms that are normally distributed, homoskedastic and free from serial correlation, and that there really is a linear relationship in the data between the dependent and independet variables). Beyond OLS, I would need a lot of convincing that choice of technique can substitute for data. In the case of a very small dataset - such as the one mentioned in the question initiating this discussion - I would follow the advice of those contributions suggesting estimation by OLS.

Yuheng Ling

Henk Folmer Hi Prof. Folmer，I would like try the Bayesian approche to analyse panel data. I am wondering whether you can recomemd some textbooks, papers and there are some R-packages to do it?

Thank in advance

Henk Folmer

You may take a look at

Jaya IGNM, and Folmer H (2019) Bayesian spatiotemporal mapping of relative dengue disease risk in Bandung Indonesia, Journal of Geograpical Systems (on line) and the references therein

Yuheng Ling

Henk Folmer Thanks. I see, that‘s a random effects model.

Henk Folmer

Correct. But covariates can be straightforwardly included

Geoffrey Thomas Pugh

In a random effects model, covariates must not be correlated with any potentially omitted variables. Unless this possibility is tested and addressed (e.g. by appropriate instrumentation), estimates are likely to be biased and inconsistent and inference invalid. So, be careful!

Yuheng Ling

Geoffrey Thomas Pugh Agree. I am wondering that in your context, whether a fixed effects models is appropiate?

Is it possbile to implement a fixed effects model by a bayesian approch? Any recommended textbooks and papers?

Thanks in advance

Sau Mai

My panel data, with T&N are nearly equal (20years & 21countries), is mixed-non-stationary. So, which model is suitble? Thank you for your suggestions.

Andrey Davydenko

Sau Mai You can find relevant methods here Conference Paper Is Cash Dividend an Everlasting Stimulus? Impact of Cash Div...

Geoffrey Thomas Pugh

In reply to Yuheng Ling, yes - a fixed effects model might well be appropriate. As I said in my reply of September 8th (above), "among the range of

Geoffrey Thomas Pugh

In reply to Sau Mai, here the standard dynamic panel estimators (estimated by difference and system GMM) may not be appropriate for at least two reasons: (i) T is relatively long, giving rise to a proliferation of weak instruments (although their number can be reduced); and, (ii) mainly, because N is small. Consequently, you might usefully investigate the use of panel time-series estimators, such as the "mean group" and "pooled mean group" approaches, both of which can reasonably be applied to a dataset of your dimensions.

Sang Nguyen

Geoffrey Thomas Pugh I have read your replies and they are so well explained and detailed. I have another question. I'm using panel data which includes N = 65, T = 15. I intended to deploy one or another GMM-type approach. But then I had to split entities into subgroups in order to examine their behavior, which leads to the small N of each subgroup.

So, as far as I'm concerned, I just only can deploy approaches that you mentioned (i.e., pooled OLS or fixed effect) to work with these subgroups. Is there any other option?

Thanks in advance!

Mariya Neycheva

Regarding panel data, I recommend the following https://cran.r-project.org/web/packages/plm/vignettes/plmPackage.html. Though designed for R, it gives helpful insights into panel data econometrics. Hope this helps. Good luck!

Suresh Chand Aggarwal

Thanks Mariya for a helpful link.

Geoffrey Thomas Pugh

In reply to Sang Nguyen, N=65 and T=15 should give you a good size of dataset for GMM estimation; in particular, difference- and/or system-GMM estimation of dynamic panel models. Several of my PhD student have achieved worthwhile estimates from datasets smaller than this one. However, as you note, once you split your sample, especially by cross-section groups, N falls and GMM approaches become problematic (insufficient groups and too many instruments). One approach is to reduce the number of instruments. David Roodman's (brilliant) xtabond2 (a user-written programme for Stata) offers several ways to do this. However, be aware that GMM estimation can become very fragile as the number of observations in ratio to the number of instruments falls: i.e. restrict the instrument set one way and you obtain one set of results and diagnostics; restrict the instrument set another way and you are likely to obtain quite different diagnostics and estimates. Accordingly, I suggest the following approach. Estimate using all of your data and use interaction terms to distinguish different groups of countries. For example, if you have two groups of countries - Group A and Group B - do not divide your sample and run separate regressions for each group. Instead, define a dummy variable for Group B observations (=1; so Group A observations=0). Then interact this dummy variable with your variables of interest (or, indeed, all of your independent variables). Interaction just mean multiplying them together, so that the values of the interacted variables for Group B are just the original values of some independent variable (i.e. the original values multiplied by one), and the values for Group A are all zero (i.e. the original values multiplied by zero). Then estimate your model with all the original variables (including the constant) plus your dummy variable for Group B plus all of the interaction variables that you have defined. Then, your estimates will give you directly estimated effects for Group A and derived estimated for Group B. (i) The effects for Group A are still directly estimated, given by the original constant and the estimated coefficients on the original independent variables. (ii) In contrast, the effects for Group B have to be derived - the constant for group B is the sum of the original constant and the dummy variable for Group B; and, for each variable, the Group B point estimate is the sum of the estimated coefficient on the original variable (i.e. the Group A estimate) and the estimated coefficient on the interacted variable (which estimates the difference between the Group A and the Group B effect). The computation of the standard errors on the derived Group B effect is complicated, but it is easily handled by Stata's nlcom postestimation command (I'm sure that this can be accomplished also in any modern regression package). This approach is essentially a neat way to do two or more regressions using all of your data. If you estimate with a complete set of interaction terms, then you will obtain the same estimates as you would obtain from dividing the sample. However, your estimates will be more efficient (i.e. more precise).

Suresh Chand Aggarwal

I broadly agree with Pugh's suggestion but the aapproach could become more complicated if you have more number of countries instead of just two, and many regressors. Then even the degree of estimation in such regressions may tend to be small.

Sau Mai

Thank you kindly for your help, Professor Geoffrey Thomas Pugh . I will do it again with "MG" and/or "PMG" to check the results. I hope to continue to be instructed if I face any difficulty.

Sau Mai

Dear Dr Andrey Davydenko . Thank you very much for your help.

Andrey Davydenko

Sau Mai : HTH! Generally, it's important to start with EDA first, so let me recommend these works showing some basics (they relate specifically to panel data EDA and data transformation, simple yet effective approaches):

Poster A Visual Framework for Longitudinal and Panel Studies (with ...

Poster Is Cash Dividend an Everlasting Stimulus? Impact of Cash Div...

Sang Nguyen

Thank you Geoffrey Thomas Pugh Your answer was super detailed and informative!

Momodou Jallow

I have a panel of 12 cross-sections (N = 12) and 10 time horizons (T=10). I understand that using system GMM could be inappropriate in this case, my question is can i break the time horizon into two sub-samples of (N=12, T=5) and estimate two separate models?

Henk Folmer

Depends pretty much on the model structure. 2 separate models could be tricky

One way to proceed is one model with a dummy.

Another approach is Bayesian analysis. See inter alia Jaya and Folmer (2020) J.of Geographical Systems

Momodou Jallow

@Henk Folmer thanks for response. If I may ask how will the introduction of a dummy enabled me to estimate a single model with GMM (using the Arellano-Bond/Blundell and Bond). Will this solve the problem of bias and inefficient estimators? Thanks

Geoffrey Thomas Pugh

I agree with Henk's suggestion to use a dummy variable. However, the dummy will need to be interacted with your variables of interest (this is explained in more detail in my post of January 23rd, above, in this thread).

Beguerang Topeur

I'm exactly in the same situation than Sang Nguyen. My sample contains n =32 and t=19. In the overall analyze, the proliferation of instrument is due to the number of explanatory variables but can be managed by using xtabond2. In the subgroup analyze, once I split the sample, n takes less than 14 while t remains 19. the interaction with dummies of subgroups will increase the number of explanatory variable and so the number of instruments and the use of xtabond2 becomes problematic. In my case what could be the appropriate approach for subgroup analyze?

Geoffrey Thomas Pugh

Regarding Prithu Sharma's question, if both variables of interest in the interaction term are exogenous, then we classify the whole interaction term as an iv instrument in xtabond2.

Alla Koblyakova

One of the major problem would be strict exogeneity of excluded/included instruments assumption, and validity checks. These involve identification of the model questions.

Basant Megahed

which method would be suitable if I have N=8 and T=10?

Henk Folmer

Again, take a look at Baysian analysis

Basant Megahed

Henk Folmer thank you very much, I have another question to you. Can I use System GMM with N=20 and T=10?

Badges
Science topic

More Alberto Roca's questions See All

State of art in natural disasters?

Are increasing the costs of disasters in the affected countries.

01 August 2024 1,794 2 View

Looking for TEM images of osteoclasts under CC BY license: could somebody provide it to me?

I'm looking for TEM images of osteoclasts that I could use in the online Histology Atlas e-Histology (https://e-histologia.unileon.es/Home/index_en.html) under CC BY license or with permission...

27 May 2024 2,735 0 View

Is there a commercially available cell line from human cancer-associated fibroblasts (CAFs) isolated from malignant pleural mesothelioma or lung ca ?

Dear community, I am currently working on a proposal focusing on cancer-associated fibroblasts (CAFs) from malignant pleural mesothelioma (MPM). In my search among commercial cell line providers,...

05 May 2024 5,572 4 View

How can the resistance to movement exerted by fine (non-sandy) sediments be calculated?

I am trying to make a simplified model of the movement of meiofaunal animals in marine sediments. Depending on their mode of movement, meiofauna can be classified as either "interstitial" (i.e....

25 February 2024 7,111 1 View

How to Resolve Error in AutoDockTools ?

Hello everyone, I am working with a protein predicted with AlphaFold to which I have added a heme group and optimized using CHARMM-GUI. However, when trying to select it in the AutoDockTools...

17 January 2024 9,426 0 View

Can I use glycerol as a preservation medium for plant samples that will be scanned in a CT Scan?

I need to collect plant samples (stems, sores, etc.) to later be scanned in a micro tomograph, and I would like to know if glycerol is a good idea for a preservation medium, or can it generate...

08 January 2024 3,071 1 View

I need to identify this geophilomorph?

The specimen in the photo was captured in Tenerife (Canary Islands). I am sure it is a myriapod of the order Geophilomorpha, but I can't identify the species, could you help me? Thank you very...

19 December 2023 2,329 0 View

Is biodiversity a continuous function?

According to Andersson (1990), "The only thing generally agreed upon is that variation in phenetic parameters is not continuous and that character states are not combined randomly". In the same...

02 November 2023 7,869 4 View

Ease of Doing Business in the Philippines?

How are we faring?

20 October 2023 1,657 1 View

Q-TwisT (Quality adjusted survival analysis)?

I am trying to do the partitioned survival plot for a Q-TwisT analysis. I am not sure what I should use as censoring when calculating the Toxicity state as the survival curve is the time that the...

14 September 2023 5,305 0 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Explain theoretically and with the aid of an example the concept of equation linear and not linear in variables and parameters?

In Econometrics

07 August 2024 6,142 4 View

Is Galaxy.org good to use for research for analyzing data and for publication?

Hello all, I wanted to know, can I use galaxy (USA, Europe or Australia) platform for analyzing the shotgun data, and can it be used for publication purpose as well? Thanks :)

06 August 2024 6,610 4 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

Better ways to analyze the qualitative and quantitative data in a sequential explanatory mixed method approaches

04 August 2024 2,703 6 View

How can I interpret the data without the need of solving it manually?

How can I interpret the data gathered without solving?

03 August 2024 9,054 3 View

Why can't academics earn the money they deserve?

Only Journals make money from the articles we have worked on for years. Academics do not earn money from their refereeing. Then shouldn't the solution be a system in which academics can earn...

01 August 2024 6,469 6 View

Conjugation of PEG-Amine to an Amino Acid Using EDC?

I am attempting to conjugate PEG to an amino acid at the C-terminus, for the purposes of producing nanoparticles. I have been told that PEG modified with amine groups can be used for this purpose,...

31 July 2024 2,033 1 View