"measure the impact of one variable on the other" **IS NOT** related to "Testing hypothesis"!!
Your example (I know, it is just an example) leaves a lot of questions open, before any though about the statistical analysis should be wasted. I am afraid that this is not only the case in your example but also a common problem in many "studies".
For your example, some of the questions are:
- to what "population" should the answer refer?
(the wohle world? industrial nations? a specific country? a specific religion?...)
- what cofactors will you consider?
(influence of other incomes of family members?, jobs (only of the women or of the man or of all family members)? size of the familiy? ...)
- will you look at the absolute or relative impacts?
and so on.
Finally, it is the question of how you want to model the family income, to get a model estimate for the contribution of womens labors, depending on cofactors and possibly interactions between the factors. This is scientific thinking! And this inevitably leads almost automatically to the way how this model can be analyzed statistically. Statistics will not and can not substitute scientific thinking. Statistics is a language and a toolbox that maps to scientific thinking, so to say.
If you have a good model, then it is also clear how to test hypotheses about coefficients in this model. If this is required at all...
A note on Xiaoliangs answer: it is not clear how "income" is measured. If it is a currency, you should take it as such (i.e. a quantitative variable) and not categorize it (this throws away information). However, one should think if currency is an adequate measure for the intended purpose. Maybe the real purchasing power would be a better measure. Or -something different, just to play around- a more direct measure of satisfaction, stress level, effective luxury affordances, and so on. For each of these variables it is important to think hard how a reasonable definition and operationalization should look like.
I guess one way is to divide the women income into three or four categories within the whole family income. Then make a criteria to judge if in the first group is lower contribution, in the fourth group is highest contribution. About the criteria we need more information. Hopefully, it can be helpful. Good luck with your study!
"measure the impact of one variable on the other" **IS NOT** related to "Testing hypothesis"!!
Your example (I know, it is just an example) leaves a lot of questions open, before any though about the statistical analysis should be wasted. I am afraid that this is not only the case in your example but also a common problem in many "studies".
For your example, some of the questions are:
- to what "population" should the answer refer?
(the wohle world? industrial nations? a specific country? a specific religion?...)
- what cofactors will you consider?
(influence of other incomes of family members?, jobs (only of the women or of the man or of all family members)? size of the familiy? ...)
- will you look at the absolute or relative impacts?
and so on.
Finally, it is the question of how you want to model the family income, to get a model estimate for the contribution of womens labors, depending on cofactors and possibly interactions between the factors. This is scientific thinking! And this inevitably leads almost automatically to the way how this model can be analyzed statistically. Statistics will not and can not substitute scientific thinking. Statistics is a language and a toolbox that maps to scientific thinking, so to say.
If you have a good model, then it is also clear how to test hypotheses about coefficients in this model. If this is required at all...
A note on Xiaoliangs answer: it is not clear how "income" is measured. If it is a currency, you should take it as such (i.e. a quantitative variable) and not categorize it (this throws away information). However, one should think if currency is an adequate measure for the intended purpose. Maybe the real purchasing power would be a better measure. Or -something different, just to play around- a more direct measure of satisfaction, stress level, effective luxury affordances, and so on. For each of these variables it is important to think hard how a reasonable definition and operationalization should look like.
Actually i want to know weather their is any significant contribution of women labors on the family income. women labours means those women who engaged in handicraft work at their home working for at least six hours a day. i assume all the women belong to particular area and their family members are also involved in the same type of job. i have collected the data regarding the total house hold income from 500 families and also the income of the female labours in the age group of 18 -40 years working in these families. from each family only the income of a one women labour has been taken into consideration. now i want to know weather their is any positive or negative contribution of women labours in the total family income or it is neutral. hope your good self will understand my problem.
You write: "Now i want to know weather their is any positive or negative contribution of women labours in the total family income or it is neutral."
This question can NEVER be answered with statistics and on variable data. *YOU* can define an answer, *YOU* can decide whether or not you claim that there is a contribution. The big question is: on what exactly should your decision be based? Surely on the data (and statistics provides the tool to summarize the data to the information relevant to your question), but this is not enough! Your decision can be right or wrong, and neither your data nor any statistic will give you a hint!(*) So you must think what you can win with a correct answer and what you can loose with a wrong answer. Then you can use this and the data to optimally balance the risks and the hopes. Here again, statistics provides the tools to get the optimal strategy(! not the optimal result - that's something different!) for the decision (eventually hypothesis tests) - but it requires that you explicitly state the relative relevance of the risks (losses) and the hopes (benefits). I doubt that this will be possible for you in any reasonable way. Therefore I doubt that statistics can help you making a decision.
I would strongly suggest NOT to decide something, but rather to use the data to ESTIMATE a reasonable impact (whatever size it may have). The interpretation how relevant this impact might be is a matter of your expertise and not of statistics.
(*) Being able to reject a null hypothesis gives no information about the null hypothesis (or the alternative hypothesis) and also no information how likely you decision will be right or wrong!
sir when i put my data on a simple excel sheet in one column total income of the family and in another percentage contribution (women income /total income *100) in a simple graph i found the down word sloping curve shown that contribution of women labour decreases as the family income increases in other words we can say women contribute more to low income families than high income families. but on the bases of the graph i can not say that their is negative relation between family income and contribution of women because for this i have to develop any model where i can say on the basis of certain confidence level that this is my result. so i shall be very thankful to your good self please suggest me any model for my problem. i am not sure weather i should use CD function or CES function or any other function/model
A family's income is derived from a handicraft that is practiced in the home.
A woman member of the family works at the handicraft six hours a day.
What is the net family income produced by the woman?
You have the total family income data from 500 households.
You have the individual income from a woman in each of the households.
You have your answer. The data are all you need. No statistical test or description is necessary.
Is there more?
Your question was paraphrased from information you provided. It can be further paraphrased to logical statements for a straight forward If-Then argument. If you plan to use this data in a more extended logical argument, you should consider the proper structure of the argument and data. Jochen Wilhelm discussed considerations you may need to include in a more extended logical argument.
You state that the fractional income of a woman decreases as the total family income increases. That should be expected, if you are plotting constant A over increasing B. Does a woman's income per hour worked decrease with increasing total family income?
Fractional income does not appear to be the best metric.
Total income can increase because more family members are working, because work routines become more efficient, because better equipment and supplies were acquired, or because of other or a combination of factors.
Well, as I said, the major problem is a subject-specific and can only be tackled by specialists in the field. It is not a statistical problem. A lot of thoughts have to be invested here, the question of how then to measure or even test what is at the very end of this to-do list, and the answer will be found easily when all other questions are adressed completely and thouroughly.
At the out set i am very thankful for your response and valuable comments. As your good-self might be knowing that in any model when we have to find out the causal relationship we assume other things remain constant mostly in social sciences. it is not the question of efficiency or more family members that matters hear. i am not talking hear that as women will work more its contribution will decrease that is not the case. i am considering a particular time period that is only one day consisting of six hours of work. so efficiency or what we say learning by doing model in development economics does not work hear. i had to found out the contribution of women labour in different household income groups however all are engaged in same occupation that is handicrafts. why i found negative relation is that in low income families percentage of women earnings to the total income is more in comparison to the families of high income groups. for example a family whose daily income was Rs 1000 , 20 percent was contributing by women on the other hand those families whose income was Rs 200 only 8 percent was contributing by women labour.
It is difficult to suggest a model without access to the data and conditions. Most important is not knowing the premises of the argument.
Yes, subtracting the woman's income from the total MIGHT be appropriate. Other information is necessary. The lack of information was why effects of efficiency or equipment was suggested. Further methods may be suggested by answering questions about the work and work structure.
You are relating income. Does income relate to units produced? Does one person, working alone, produce each unit? It might be more appropriate to consider average number of units produced per person instead of total units or total income.
Structure your argument to best answer the question you are asking. First list the given information in simple terms. For example:
Unit produced, cost per unit, profit per unit, number of persons necessary per unit, and/or other information necessary to define income.
State the premises of your argument in simple terms. For example:
If total family income increases the average units produced per person increases.
A woman is a member of the family.
Therefore, a woman's income increases.
I do not presume to understand your particular problem. The above was suggested to help convey information, but in no way can it be your circumstances. As Jochen stated, your problem is subject specific. Specific information and data are necessary to form an argument and construct a model.
Sir very thanks for making my problem a bit simple
let me explain the whole procedure i have fallowed and how i get the income
First of all i have collected the total monthly income of all the family members and deducted it the total expenditure and then the net income was divided by 30 days to get the daily income of the family. similarly income of the female workers per day was calculated. now i have the net income of family and also the net income of women workers. now other things constant i want to see does the income of female workers have any significant impact on the total income of the family. when i run the regression( net family income excluding women income as dependent and net income of the women workers as independent variable) i get some new results
Number of obs = 500
F( 1, 500) = 49.47
Prob > F = 0.0000
R-squared = 0.3362
Root MSE = 893.17
| Robust
family income | Coef. Std. Err. t P>|t| [95% Conf. Interval]
I would not consider this significant. If your model were perfect, the R-squared indicates that a woman's income accounts for 33% of the variance. This not good support for your model.
Gross income minus cost equals net income. Correct.
Is this the net income from the handicraft, alone?
How do you assign a person's income? Units produced?
What is the average net income per person in the family?
What is the average net income per male in the family?
What is the average net income per female in the family?
What is the average net income by age in the family?
How does family size affect the per person net income?
Family income compared to a woman's income is too broad.
Asaif, with n=500 it is not a big deal to get significant results in a simple model (such as yours). The p-value and the F-value do not convei any useful information to judge if the model is good or if you found something worth noticing. The R-squared values tells us that "only" 33 percent of the total variance in your data are explained by the predictor, what might or might not be ok. Also the coefficient of +8, taken "out of the blue" is not helpful to discuss ythe resonability of these results. These summaries do not show if the model fits to the data (i.e., if the structure is adequate). One would first look at the plots. Visualize the data, show it. Are there patterns? Is there a linear or a curved relationship? How do the residuals look like? One really has to *explore* if the model can be adequate for the data (what does not mean that the whole analysis is meaingful - this is still a subject-specific question to be answered by experts, not by statisticians). Then you can come up with a (simple as possible) model that describes the data as good as possible. But when you explore what kind of model could suit your data, then and tests loose their meanings. So in your situation you should not look at p-values at all and you should not argue along p-values (or other statistics you can calculate like Fs and Ts and R²s and so on). In an exploratory analysis, these statistics are "only" (more or less useful) summaries, not telling you anything beyond the data itself (what is intended for generalizations or required by hypothesis tests). Ok, well, sure it does tell you something more, surely some generalization is possible, but there is no way to strictly quantify or control any error-probabilities. So you can make a best guess, and others may follow you thinking that this might be a good guess, but noone can tell precisely how good this guess actually it.
Persons income is based on piece wages that is how much units a person will complete. however wages are fixed that is same rate per unit.
if i calculate the average income of all the family members and also consider the size of family does it helps me to find the impact of women earnings to total income of the family if so how?
Respected Jochen
sir when i draw the normal probability of residuals i get the linear curve
sir no doubt 33 percent alnd p value might not be useful for me
i am confused now what should i do because i have asked the same question to my subjects expert from my department also but no body help me to solve my problem.
My boss told me you have to defend your self on the basis of some statistical model that is why i am so much worried about my problem also i know there are researchers like you who despite there busy schedules are ready to help simple research students like me.
Any way thanks For both of you for taking interest in my problem.
Sir thanks i got my answer now as in social science we can use the model i have used and one of my friend told me have just explained me how social science models are different from applied sciences
I respectfully disagree with your friend. Social science models are not different from applied science models. There is a mistaken belief among some social scientists that because there as so many variables in social science that criteria are relaxed or special groupings may employed with lax definitions. Further, there is pressure in all sciences to find a positive result in order to publish, i.e., the statement, "Found no difference between ...," is to be avoided.
All scientific data analyses start with a valid argument to reach a conclusion.
I gave an example argument:
If total family income increases the average units produced per person increases.
A woman is a member of the family.
Therefore, a woman's income increases.
I do not know what your argument is, but following the example if a woman's income does not increase, the conclusion is false or the premises are false. It may be necessary to examine the premises and/or accept that a woman's income does not increase.
You asked:
if i calculate the average income of all the family members and also consider the size of family does it helps me to find the impact of women earnings to total income of the family if so how?
You could make two observations. Does a woman's income follow the average income as the family income increases and does a woman's income follow the average income as the size of the family increases. You should also show how the average income follows the family size.
You may find interesting relationships. You may find as Jochen suggested that the sample size is inadequate.
I did not mean to say trhat the sample size is inadequate. I meant to say that the interpretation of "significant" findings is of no use. This is generally only of little to no use in research, however (although most people think the opposite). It becomes only evident when the sample size is so large that any irrelevant tiny effect will be "significant". Do not interpret the significance, instead interpret the effect size.
Totally agree. The reference to sample size was cited only as an example. Effect size is more important than significance. Perhaps, I should have said please read the suggestions supplied by Jochen.
All the suggestions made should be considered. Data and premises about the data should be examined before settling on a final model and accepting it because of significance. An effect may be stronger if viewed another way or it may disappear. Either will be informative.
The best critic for a model is the modeler. Try to tear it apart before someone else does. You may learn a lot.
i will try to fallow your beautiful and experienced suggestions and also the suggestions of Jochen
i have think again to develop the model that will suit my research problem. i agree both of you sir effect size is more important than significance.
however in social science there is an important assumption that is "other things remain constant" that is way i was assuming the family size and other variables constant in my model.
however many thanks to both of your good self for helping me .
sir i have one more problem that as i am working on participation rate of women in labour market for which i have used the profit model i have six variables in my model and when the run my model i found all my variables significant but as your good self knows we can not interpet the coefficients of probit model directly and we have to work out marginal effects. when i run the marginal effects through stata i found in one of my variable (x5) very low marginal effect close to zero (-1.29e-07) with p value 0.0000 no doubt it means other things remain constant the variable is statistaclly significant but it also means i1 unit increase in this variable will reduce the dependend variable (.29e-07)units which means zero.
sir please help me is there any problem in my model or i have to interpet differently
All sciences assume "other things remain constant" if there is no way of knowing what the other things are. In the case of family size, you should have this information if you can isolate a woman's contribution.
You are comparing a woman's income to increasing family income. If you have no way of knowing why the income increased, you have no rational basis for concluding a woman's proportionate decrease. All you can do is report a possible curiosity.
I cannot offer any further criticism to your model without access to the data, the manner of collection, and the structure of the model. I do not know the premises of your argument and how the data and data collection relate to the premises. (Programs like STATA cannot evaluate your argument or your methods, so metrics produced by them can only have meaning to you with respect to your premises.) Jochen and I have offered suggestions based on your statements. I can only repeat the first two paragraphs above.
Thanks sir for your suggestions i will try to fallow your suggestions and build my model accordingly. Yes i will also try to find out the reasons and as i have data on all the earnings members of the family so i can find out the impact of family size also ...and i have to think once again other variables that i have to include in my model.
This depends on whether the dependent variable is interval or categorical/binary type of data. If it is interval type of data (e.g., weight, production, obesity, etc), we apply linear regression model (i.e., Y = alpha + beta * X + error).
However, if it is categorical/binary (e.g., recovery status (recovered/not recovered), mental disorder (healthy/disordered), success/failure), then we apply logistic regression models (i.e., logit[prob(Y=1)] = alpha + beta * X).
I also recommend to use many predictor variables to test the hypothesis either based a multivariate method that depend on the argument of your research or nature of the dependent variables. please read Hair et al 2013 Multivariate data analysis
It depends if you have any covariate or not as well as if they are continuous or categorical. Independent t test and a regression would be my suggestion
There are many methods of measuring association b/w two or more variables e.g. regression/correlation/anova etc but in social sciences b/c of lack of performing controlled experimental, causal interpretation is difficult because of endogeneity issues. There is ongoing research on this issue. The book Mostly harmless econometrics discusses some of these issues. Some of the methods which are found useful are for cross sectional data instrumental variables methods are being used. For panel data careful use of fixed effects model helps in causal interpretation as it controls for many of the individual /time specific unobserved variables which may be correlated with variable of interest. This unobserved heterogeneity hinder causal interpretation of variable of interest on dependent variable.