Post Hoc tests are just different ways to adjust p-value regarding the number of comparisons performed. So, if you have two factors and only one is significant (I assume that there is no significant interaction either), you actually have four groups, one to each level of your significant factor. You can calculate, now, the means of your four groups (each group is formed by all individuals with that level, regardless the other factor).
Now, calculate all difference between groups (there are six differences). Which one are significant? All difference that is greater than the Honestly Significant Difference (HSD), which can be calculated as:
t(a/2) * sqrt(MSR / n),
where t(a/2) is the quantile of the Tukey distribution, with "m" means and "df" degrees of freedom, MSR is the residual mean square from ANOVA table and n is the size of the groups. How to do that at R????
qtukey(0.95, nmeans = m, df = df) * sqrt(MSR / n)
Replace m, df, MSR and n properly...
If you have different group sizes, replace "n" by 2*n1*n2 / (n1+n2), where n1 and n2 are the sizes of the two groups under evaluation. In this case, you have the Tukey-Kramer procedure, suitable to comparisons of groups with different sizes. In R,
I agree the easiest way is using TukeyHSD or even the pairwise.test functions. However, for me, I get stuck using R when I want to extract the LSMEANS for the fixed effects. Does anyone has an idea how to get this? Thanks
Normally I solve it with glht and some little trick for interactions. For example with both solutions you can compute the differences by the following way.
But, on the other hand not is easy to find a good tutorial about this kind of statistics. Under my own point of view, you must interpret this results carefully
Kind regards and I hope that you enjoy the example
Are you sure that using LSmean to compute post-hoc for a lmer model takes into account the random effect? if, not, I presume it is unecessary to use lmer but instead lm...?
lsmean does account for the random effect in the multicomp (however, you have to be sure your model is fitted using ML and not REML which would provide false results). PLus, if I may say, that's the nature of your sampling design (or your data) that drives the fact that you'll do lm or lmer (eg if your sampling desing is nested) not the possibility to do post-hoc test or not.
Another solution is to do bootstrap or MCMCsamp to estimate the SE, but this will not result in p-values and the decision for marginal results will be all yours...
Thank you for your message. Could I ask if you have a reference for the using of ML with lsmean? It will be useful for my writing. I cannot find it anywhere.
thank you so much. indeed i have use glmer for my data. I'm currently searching how to do contrast comparisons. I have used mcp package but undortunately I have an interaction between a continuous and a categorical variables. I'm not sure that contrast is relevant when interaction are presents (I have a nice message "
"covariate interactions found -- default contrast might be inappropriate
2: In RET$pfunction("adjusted", ...) : Completion with error > abseps
3: In RET$pfunction("adjusted", ...) : Completion with error > abseps'.
I have read a lot of information about contrast possibilities and actually I have any idea how to perform them correctly regarding my design (repeated design with crossed effects).
Carole, if you have such interaction (factor vs. continuous variable), it does not make sense to do a multicomp as, what you have, is actually n equations (one for each level of the factor) and the significance of the interaction does already tell you whether the slope is significantly influenced by your factor or not.
Federico, I do not have a reference, I might have read it in the help of lsmeans but could not find it again (it's monday morning though...). There are some elements (but dealing with AIC model comparisons) in the MuMIn help (http://cran.r-project.org/web/packages/MuMIn/MuMIn.pdf, p.5 for example) and in the AICcmodavg help also (http://cran.r-project.org/web/packages/AICcmodavg/AICcmodavg.pdf p. 10).
thank you, I agree with that, it means that I couldn't go further? I also thought to LSMEANS because pairwaise comparisons are important for my hypothesis.
For me, it means that you have all the answers you need in the summary of your model (except maybe if your factor comprises more thant 2 levels). In this latter case, you should try to represent your equations to have the trends, and comment this, or fix your continuous variable and compare means and se.
I think that you don't need an specific reference. You can just cite lsmeans as Yoan suggested. I agree with Yoan it is a good approach representing the levels of the categorical variable (i.e. a slope per level) versus the continous variable.
lsmeans is a "standard" You don't need a specific reference. But, does anyone know any way to compute the effect size of this test? A referee ask me for it and I don't know how to compute it.
In the lsmeans output, it provides you mean and SE. In my study, I use Cohen's d effect size which requires mean, SD and number of subjects in each group. So I change SE to SD in Excel, then get the effect size based on pairwise comparison computed based on LMM.
I am also analysing data from mix modelling using lme function. I have encountered a problem like when I used 'method=ML' and 'method=REML'. the results produced different significance for the treatments during pairwise comparison by lsmeans function. Why this difference? and which method is best? anyone can suggest me!
I read somewhere (should be in the lsmeans help?) that for multicomparisons you should use the "ML". I have in mind that with "REML" there are uncertainities about the number of degrees of freedom but cannot assert that for sure.
Otherwise, to know how to cite a package, try citation("package") ! ^^
The book by Pinheiro and Bates (Mixed-effects models in S and S-PLUS) discusses multiple comparisons in the LME model. I don't have my copy handy, but Yoan Paillet's comment that the model should be fit using ML if you want to perform multiple comparisons is correct if I am remembering Pinheiro and Bates.
could you develop your answer a bit as it is not fully clear to me : you mean generalized (not mixed) models should use ML (which make sense if you do not have a random component) but that lsmeans uses REML, but for which kind of model ? Fixed only ?
That does not make sense to me as in this case lsmeans only performs a pure tukey (or other type of multiple comparison test) and your glm is equivalent to an anova or a lm if your error disctribution is Gaussian.
My clarification was only for linear mixed effect model , where you used 'lme' function under 'nlme' package, and not for GLM and ML. However, I can say 'ML' is biased for the estimation of variance components, but if you have larger sample sizes then the bias gets smaller.
I am quite confused with your answers about using ML or REML. I used lmer fit by REML like: mymodel=lmer(A~B*C+(1|D), where D is a random factor. Then, I used lsmeans to compare for each factor and their interaction which levels are significant, for ex.: lsmeans(mymodel, pairwise~A,ajust="tukey"). So, does the results gave by lsmeans (using the lmer model fitting with REML) is right ?
But what about: lsmeans(model, pairwise ~ FACTOR1 * FACTOR2 | DAY, adjust = "tukey") where DAY is continuous variable? Should DAY be treated as factor to get multiple comparison for different days (lets say from DAY 1 to DAY 10?).
Plus, if your treat day like a factor, you ignore the fact that your observations are ordinated (day one comes before day 2) and non-independant... This should first be integrated in the model (e.g. in the variance-covariance structure) before trying to perform post-hoc tests. And even with that, I don't even know post-hoc tests would be meaningful, because your model already tests whether the effect of "day" is significant (when it's coded as a continuous variable).
I need both: I want to see the effect of day on the response variable, plus I also want to day as a repeated variance on which variance-covariance structure will be built. I make new variable called DAYFAC to know the effects of day (how it differs from DAY1 to DAY10) and I use day as a repeated measures for variance-covariance structure! Is not it right strategy?
does someone know the difference between Anova type II, Anova type III following a lmer model? Indeed, the results are very different regarding the significance of a variable or interactions. I use Reml (small sample size : 148 subjects with 5 repetitions for each) with kenward rogers df approximation. my fixed variables are all categorical and there is a three way interaction. I only use random intercept. thanks a lot :-)
In addation to the previous answers, I would like to add some codes that allow us to get post hoc test for an interaction term as well as the classification of its different levels .
library(lmer)
library(lemerTest)
library(lsmeans) # for lsmeans () and lsm()
library(multcomp) # for glht () and cld()
> model summary(glht(model, linfct = mcp(FIXED FACTOR ="Tukey")))
# -- 1) Least square difference (LSD) for the interaction "FACTOR 1:FACTOR 2":