I am confused, do anyone know whether the random effects can apply continuous variables or only categorical variables? Thank you so much for your help in advance!
No you do not have to change variables to be categorical - that is definitely not what to do as you will be losing lots on information and losing power to detect effects.
You need to read things on panel data analysis where countries are at level 2 and year is the occasion at level 1 - you have a panel study where the country is the repeated entity. The advantage of the random effects specification is that you can model time varying variables as well as time - invariant variables (or cross sectional enduring effects).
In this paper people not countries are the level 2 entity but the basic ideas apply - you will see that it is continuous time (year) that is allowed to have a random effect (as well as the allowed to vary intercept)- you model the country trajectories and then try and account for that by time constant effects ( eg the average temp through time) and the time varying effect (becoming colder or hotter than average)
This a paper that looks at modeling county differences over time and argues the case for random effects - it has Stata commands that go with it
No, I think modelling a continuous variable as a random effect does not make sense. You will have as many different values as observations, so there is no possibility to estimate a random variation of your response depending on the value of this continuous predictor (the model is unidentifiable).
I think a continuous predictor will enter the model as a fixed effect to get a multiple regression model. This models "random intercepts", and if you include the interaction it will also model "random slopes".
If it is perfectly possible to have allowed to vary random differential effects for continuous and categorical variables. The effects at level 2 associated with a constant are often called random intercepts; those at level 2 associated with a continuous predictor are random slopes ; for those associated with a categorical predictor the random slopes become random allowed vary differentials
Thank you Uwe, Kelvyn and Jochen so much. Your answers are very helpful for me. Honestly, this is my first time to use random effects, so I am not good about it.
Maybe I should explain more about my data so I can understand better about it:
I have carbon emissions/sequestration for 50 counties from 1990 to 2012 as the dependent variable and then I transform emissions to be in log values because I have both negative (sequestration) and positive (emission) values for the emissions. And I have level of precipitation and temperature for each county from 1990 to 2012 as independent variable, so do I have to change precipitation and temperature to be categorical variables before I run random effects? I already check that my data is fit with the random effects rather than fixed effects. Could you please help me to understand the random effects better on my data? Thank you so much in advance for helping me.
No you do not have to change variables to be categorical - that is definitely not what to do as you will be losing lots on information and losing power to detect effects.
You need to read things on panel data analysis where countries are at level 2 and year is the occasion at level 1 - you have a panel study where the country is the repeated entity. The advantage of the random effects specification is that you can model time varying variables as well as time - invariant variables (or cross sectional enduring effects).
In this paper people not countries are the level 2 entity but the basic ideas apply - you will see that it is continuous time (year) that is allowed to have a random effect (as well as the allowed to vary intercept)- you model the country trajectories and then try and account for that by time constant effects ( eg the average temp through time) and the time varying effect (becoming colder or hotter than average)
This a paper that looks at modeling county differences over time and argues the case for random effects - it has Stata commands that go with it
To me your problem really looks like a multiple regression problem.
Your response can be modelled as a function of precipitation and temperature. These are just two continuous predictors.
If you have several measurements for each country, these measurements will (likely) be correlated, and there you could (you should) use "country" as a random factor to account for this correlation.
However, I think that some relevant predictors are missing that should be available: economic status of the countries, population, wealth, development, ...
I am sorry but the similarity to a multiple regression is superficial - the whole point of a panel analysis is to use a country as its own control - and that is what you are trying to achieve with the within-between and Mundlak specification random effects model (or a fixed effects model with country dummies) - you are trying to isolate the effect of a change in X -partialling out by design/analysis other X's that you have not measured . Of course you have to be careful and examine the plausibility of the assumptions. But that is the spirit of the within estimate.
Kelvin, I don't see how single values x of an X can be used to fit this as a random factor. How can the variance of this x be estimated?
"county" should be a random factor - but this also works only when there are several measurements for a country.
Leaving country aside and considering the measurements of a response Y for which the two continuous predictors X1 and X2 were known and given you are actually interested in Y = b0 + b1*X1, assuming that X2 will have some impact: how will you use X2 as a random factor? How do you do this practically?
My understanding is that this would require to introduce an individual parameter in the model for each value of X2 (ignoring any interaction with b0/b1, so this is just a "random intercepts" model):
and where the coefficients ai (i=1...n) are modelled as ai ~ N( E(ai), VAR(ai) ).
Here, X2 is not modelled as a continuous variable but as a categorical variable with as many levels as there are different values. This will use up all degrees of freedom, so the model is overspecified. Also, VAR(ai) can not be estimates when ai is a single value.
It seems you have a completely different idea in mind, but I don't get it.
Pattarawan is asking about a repeated measures design and consequently there are two levels ( country and occasion)and a random intercepts and slopes model is a standard approach to modelling trajectories that is widely applied.
But is is also possible to have random effects with a single- level model eg
Yi = B0X0i + B1X1i + (e0iX0= + e1iX1i)
where there are two residuals: e0i associated with the constant X0i and e1i associated with a continuous predictor X1.These two residuals can be assumed to come from a multivariate Normal distribution such that
so this is direct modelling of heterogeneity. Such models can readily be estimated with Goldstein's (1986) Iterative Generalised Least Squares algorithm.
If the variable X1i is a categorical dummy however it is is not possible to estimate the full quadratic variance function given above but only a 'linear' version so that this equation is estimated:
Var(e0i + e1i) = sigma2X0i2 + sigmaX0iX1i
The way this works is that you estimate the variance function first and if you want to you then can estimate the specific (posterior ) residuals.
I just wonder if this is called a random (or mixed) effects model. To my understanding this would require that the coefficients (B0 or B1) would be modelled as "random" (i.e. using a probability distribution rather than assuming a fixed value). Is this the same? Still not sure if I understood everything correctly. I will think about it (but don't stop giving me further hints & lessons, if you feel like!).
Yes these are mixed or random effects models where the 'effects' are assumed to come from an underlying distribution - the beauty of Goldstein's work is that the modelling of heterogeneity can be done at any level including a single level model .
This can be very important - imagine a drug treatment - two drugs could have the same average reduction for blood pressure but one is more consistent in doing so - that is explicit modelling of impact heterogeneity. We have explicit modelling of the variance as well as the means.