I want each regression coefficient to represent (each country's) deviation from the grand mean, instead of deviation from one category (country), before and after introducing control variables. Dependent variable is life satisfaction.
I am not entirely sure this answers your question, but in Stata you have the option to run a regression excluding the intercept, which then allows to use all levels of the predictor variable
Demean the dependent variable, i.e. subtract its grand mean, such the their sum equals 1.
Then run your regression, without intercept, and a dummy for each country. Now the coefficients measure the country's difference against the grand mean,
Thomas, I imagine that you have data of population in each country. This give you 15 subgroups, from which you get their mean and the grand mean. First work your main variable so you may study its distribution among total people of 15 nations. Later, do the same for each nation. Then you may compare each of them with grand mean value. emilio
hI. I differ from Harald. We only need to know the fractions of population of each country and it does not matter if each one made its data from different N dataseries. It only needs to weight each nation´s media with fractions of population of the 15 countries. Thanks,emilio
Thanks for helpful advice! The best option seems to be to demean the dependent variable. However, then I need to weight for the large differences in sample size, otherwise the grand mean will be too heavily influenced by a few countries. I am not sure how to do this weighting in SPSS; but I should be able to find it out :) Cheers guys!
I agree with Emilio. What I was trying to say was that if you restrict the coefficients to "sum to zero" (as is often done, a.f.a.i.k.), then this is not equivalent to comparing with the grand mean unless you have an equal number of observations for each country, rather you will compare with a weighted mean, the weights being proportional to the number of observations.
On a second thought -- the right answer on the issue discussed by Emilio and me depends on the definition of "grand mean". Does it mean that each country gets the same weight, or that each observation gets the same weight?
I have some different opinions. In statistics, if we compare two or more means from different groups. Those groups should be mutually exclusive, no subjects should belong two different groups. When you compare the mean difference between a country and all countries (grand mean), the subject in that country is a subset of whole population. Those two groups are not independent anymore.
If you really want to do so, you should consider all countries as whole population, each country is a sample. The grand mean is a known parameter (true) (no standard error). Then use one group t-test or normal test to test if the mean from a given country is equal to the population mean (grand mean).
You can use means option or lsmean (adjust mean) to get the output data set with mean, standard deviation, size or standard error. Then with a simple statements to calculate t or z and p values.
In addition, pariwise comparisons say use means/ is more meaningfull than those comparing with grand mean.
Harald, I understand "grand mean" as the weighted average of all nations with respect to their populations. If one nation has 2 million people and the 15 nations together have 100 million, then its frequence is 0.02. Thomas is concerned about "grand mean will be too heavily influenced by a few countries" but that is normal when some nations have very high variable mean and high fraction of population with respect to total ones. I imagine that samples though diferent in size are representative ones of each nation, so each national media must be well estimated. Thanks, emilio
The paper does not speficy what is meant by "grand mean", but I speculate that if simply refers to the mean level of loneliness across all 12.248 individuals. The paper include data from 14 countries, N ranging from about 300-1100.
If this is grand mean for all 12248, comapring with it, some countrie should have negative difference, some should have positive.All difference in Table 3 are positive for 14 countries.
If using SAS/Logistic regression, the default design matrix using 1 -1 (or (1, 0, -1) ) coding system. It will show 13 parameters, the the 14th. it is equals the (sum of 13 parameters)*-1. I.e the sum of all is zero.
If only country is the predictor, the parameter is almost compare to the grand mean of all 12248. But not exactly. In multiple logistic regression. It is not, it compare with adjusted grand mean (exp(intercept+mean effect of continuous factor+ 0 of categorical).
Again, we do not usually report exp( parameter of a country) as OR for that country with grand mean. We need report exp( parameter of country i-parameter of country j), etc.