I want to examine the simultaneous relationship between the two using microeconomic (household) data. I am using consumption as a proxy variable for income. I am confused about the modelling of it.
This relationship should be modelled at some aggregate level (as inequality involves looking at the dispersion of income among several agents).
You should decide first if you are going to look at the correlacion between both variables or at the causal relationship from one to another. If you are in the latter case, you need to think about the possible endogeneity of the right-hand-side variable. In this case, a possible strategy is IV (2SLS is a way of applying it). After determining in which case you are, you should carefully read the literature on the effect of inequality on growth (or the opposite) to see which IVs they use.
How do you measure education? I assume it is household spending (as part of total private consumption?) for education and you want to analyse, if and how income influences education (not the causal relation from education to later income, because that can hardly be analyses by household data). Before you consider any econometric estimation, it is necessary to analyse the education system (mainly schooling and high school and university studies), because spending is strongly dependent on how the state supports it and if support is dependent on income.
Another problem is, that consumption may not be a good measure of income, because other household characteristics (age, children) will have an important influence. But even if you had data on household income these characteristic would likely play an important role.
After a careful analysis of these points, you can hopefully find a good specification of model. You should try to apply the most simple statistical methods first. Then you may discover that you should apply a more elaborated one.
José-Ignacio Antón Thank you for your response, I am going to look for causal relationship only between them. For endogeneity issue, I will look for the possible variables as IVs. But problem is I am having, that no study used micro econometric data such as at household level for studying interrelationship between these inequalities. Studies have been done at more country levels like Checchi (2001), Gregorio and lee (2002) and Yang (2009). I really appreciate your answer thanks once again.
Anton Rainer Thank you so much for the answering my query. I am looking education through levels of education attainment as the literature suggests. I do consider this point how education system is in the country, that I will analyze it.
I know consumption would not be good proxy for income but In India, there is no secondary data for incomes.
Can I fit some macro variables like Government's expenditure on education as well in model while using household level data? Education will be influenced by the government's expenditure at different levels of education primary , secondary or tertiary.
As your education data do not refer to the education (spending) itself, but to the educational level (which was gained by past education), it is clear that you try to analyse the effect of this level on income. Therefore, you have to find a specification of an income equation with the educational level as one of the explaining variables. You must also include family characteristics, because you use consumption as a proxy for income, and consumption will of course be dependent on the number and age etc. of the household members. These characteristics are likely to influence income, too. This influence will, of course, be weaker, and can hardly be separated. Before an econometric estimate you should look at diagrams of the variables chosen and calculate correlations between variables (also between explaining ones). If you have enough data, I recommend, to choose household types and to analyse the relations for every type separately. If your data include time series, you can, of course, use aggregate (National Accounts) data as explaining variables.
For analysing the influence of income on the education level, you would need very long time series for a fixed set of households, and even then it would likely not be a promising project.
Two questions: The education level is an individual characteristic, how is it transformed into a household date (level of the main earner or some average of the (earning) members)?
To José-Ignacio: What is the reason to include a link between distribution and growth in this study?
Anton Rainer Thanks for the explanation. I would like to clarify some points here, I will see the effect of income (here per capita Monthly consumption expenditure of household) on educational level and simultaneously the education impact on income. For this, I will include household characteristics as my explanatory variables and use education also in the model. One issue is that I have had data for four different time periods from 1983, 1993-94, 2004-05 and 2011-12 and the data set is not panel one. Thanks for your suggestion to use household types to analyze the relations.
For your question
I will consider the education level of household head and other characteristics of household with gini of per capita monthly consumption expenditure as my dependent variable.
I think, it is against logic to derive both "causal" relations education-->income, income-->education, if education is measured uniquely (in your case: a level, in general, reached at the end of the (formal) education). As a thinking experiment, you can take possible carreers of people. A and B may have been born in 1980 in a poor family. A got a university degree in 2005 (because he got a support by the state), B have only finished primary school. 2011-12 A earned much more than B. Therefore, statistics will likely show that a higher level of education leads to higher income. But in 1983, 1993/1994 and 2004/05 they lived in poor households (their parents' incomes were low), with the household of B even with a higher income, because he was already in a job. How would this show up in your data, I think, it is not possible to follow A and B or their households in these data sets, because households would change.
The income nowadays is likely dependent on the education level, but the today's lefel depends on education activities (schools, universities) of past years, where the lags are different and may be very long. These past education will likely depend on household income, i.e. income of the parents, but meanwhile the children would have founded their own household, and one cannot find any relation in statistics. For the analysis of the link income-->education level, you need a statistics of those who reached certain levels with the income of their parents (more exactly: the income of the household in which they lived until they reached that level.
Anton Rainer Thanks for the suggestion. The dataset has many problems like this dataset is cross sectional one so in short we can say that over the years it would be repeated cross sectional data. I could see the change in educational and income inequality.
Yes, you are right there will be lag and that could be used as instrument variable but this would be difficult to get lagged variables of education and income since it is cross sectional data not a time series, am I right?
In one sense, there would one direction causal relationship than two way causal relationship between them.
Your data are not time series, because they contain, I think, different households every year . Maybe that for those with 2 years the households are the same. Then it would be, in principle, possible, to analyse effects with one-year-lag, but never students' careers until their highest degrees (=levels). Therefore, you should restrict your analysis on the question, how the education level influences income. Even that is difficult enough.
Anton Rainer Yes, you are right it is not the time series data as every year they survey different households. I have to analyze the effect of education and its inequality on income inequality. Thank You for your help and I really appreciate it.