I am co-director of the Centre for Multilevel Modelling at Bristol – we produce MLwiN – so you need to be aware of my potential bias!
Although they are now somewhat dated, the Centre commissioned a series of arms-length software reviews using a suite of data sets for the review – the following packages were covered
EGRET GENSTAT GLLAMM HLM MIXREG MLwiN R SAS S-Plus
SPSS STATA SYSTAT WINBUGS
The reviews and indeed the data are available from here
http://www.bristol.ac.uk/cmm/learning/mmsoftware/
A more recent review for discrete outcomes (where it can make more of a difference) is " Logistic random effects regression models: a comparison of statistical packages for binary and ordinal outcomes" BMC Medical Research Methodology 2011, 11:77 doi:10.1186/1471-2288-11-77
“We conclude from our study that for relatively large data sets, the parameter estimates from logistic random effects regression models will probably not be much influenced by the choice of the statistical package. In that case the choice of the statistical implementation should depend on other factors, such as speed and desired flexibility. Based on our study, we conclude that if there is no prior acquaintance with a certain package and preference is given to a frequentist approach, the following packages are to be recommended: MLwiN, the R package lme4 and the SAS procedure GLIMMIX. For a Bayesian implementation, we would recommend MLwiN (MCMC) because of its efficiency. If the user is also interested in (perhaps more complicated) statistical analyses other than multilevel modelling then he/she could choose WinBUGS.”
Since that review the Centre has started a major project on inter-operability (that is moving between packages) the results so far are (see http://www.bristol.ac.uk/cmm/software/)
Stat-JR is a brand new statistical software system: it constitutes a very different data analysis experience, featuring: an interface with a range of other statistical software packages, circumventing the need to learn software-specific techniques each time functionality of a new package required, but also providing tools to help teach software-specific knowledge to those wishing to learn; its own in-house MCMC based estimation engine (eStat) for complex data modelling (including multilevel models); open source templates allowing users to write their own Stat-JR functions; an eBook interface providing an interactive way of reporting and disseminating science, and an innovative tool for teaching statistics.
R2MLwiN: An R command interface to the MLwiN multilevel modelling software package, allowing users to fit multilevel models using MLwiN from within the R environment. It is designed to be used with versions of MLwiN from v2.25 onwards although some features will work with earlier versions.
runmlwin: A Stata command to fit multilevel models in MLwiN from within Stata.
MLwiN has had some inter-operability with WinBugs for some time (see MLwin MCMC Manual)
Finally
The Centre for Multilevel Modelling has a full online course (Lemma) that can be followed in R, Stata and Mlwin
The obvious advantages of R is that (a) you can do everything in one software and (b) there are so many packages that probably all of you multilevel needs are covered! (personally my default combination of packages is arm, multilevel, and either nlme or lme4 depending on the problem)
I found this list of multilevel in a different discussion and edited/added some of the entries. Probably a bit outdated but still quite comprehensive
amer -- Additive mixed models with lme4
arm -- Data Analysis Using Regression and Multilevel/Hierarchical Models
coxme -- Mixed Effects Cox Models
gamm4 -- Generalized additive mixed models using mgcv and lme4
GLMMarp -- Generalized Linear Multilevel Model with AR(p) Errors Package
glmmAK -- Generalized Linear Mixed Models
heavy -- Estimation in the linear mixed model using heavy-tailed distributions
hglm -- hglm is used to fit hierarchical generalized linear models
HGLMMM -- Hierarchical Generalized Linear Models
influence.ME -- Tools for detecting influential data in mixed effects models
kinship -- mixed-effects Cox models, sparse matrices, and modeling
data from large pedigrees
lme4 -- Linear mixed-effects models using S4 classes
lmeSplines -- lmeSplines
lmec -- Linear Mixed-Effects Models with Censored Responses
lmm -- Linear mixed models
longRPart -- Recursive partitioning of longitudinal data using mixed-effects models
MASS -- Main Package of Venables and Ripley's MASS (see function glmmPQL)
MCMCglmm -- MCMC Generalised Linear Mixed Models
MEMSS -- Data sets from Mixed-effects Models in S
mlmRev -- Examples from Multilevel Modelling Software Review
multilevel -- Multilevel Functions
nlme -- Linear and Nonlinear Mixed Effects Models
nlmeODE -- Non-linear mixed-effects modelling in nlme using differential equations
npde -- Normalised prediction distribution errors for nonlinear mixed-effect models
PSM -- Non-Linear Mixed-Effects modelling using Stochastic Differential Equations
pamm -- Power analysis for random effects in mixed models
pedigreemm -- Pedigree-based mixed-effects models
phmm -- Proportional Hazards Mixed-effects Model (PHMM)
RLRsim -- Exact (Restricted) Likelihood Ratio tests for mixed and additive models
I am co-director of the Centre for Multilevel Modelling at Bristol – we produce MLwiN – so you need to be aware of my potential bias!
Although they are now somewhat dated, the Centre commissioned a series of arms-length software reviews using a suite of data sets for the review – the following packages were covered
EGRET GENSTAT GLLAMM HLM MIXREG MLwiN R SAS S-Plus
SPSS STATA SYSTAT WINBUGS
The reviews and indeed the data are available from here
http://www.bristol.ac.uk/cmm/learning/mmsoftware/
A more recent review for discrete outcomes (where it can make more of a difference) is " Logistic random effects regression models: a comparison of statistical packages for binary and ordinal outcomes" BMC Medical Research Methodology 2011, 11:77 doi:10.1186/1471-2288-11-77
“We conclude from our study that for relatively large data sets, the parameter estimates from logistic random effects regression models will probably not be much influenced by the choice of the statistical package. In that case the choice of the statistical implementation should depend on other factors, such as speed and desired flexibility. Based on our study, we conclude that if there is no prior acquaintance with a certain package and preference is given to a frequentist approach, the following packages are to be recommended: MLwiN, the R package lme4 and the SAS procedure GLIMMIX. For a Bayesian implementation, we would recommend MLwiN (MCMC) because of its efficiency. If the user is also interested in (perhaps more complicated) statistical analyses other than multilevel modelling then he/she could choose WinBUGS.”
Since that review the Centre has started a major project on inter-operability (that is moving between packages) the results so far are (see http://www.bristol.ac.uk/cmm/software/)
Stat-JR is a brand new statistical software system: it constitutes a very different data analysis experience, featuring: an interface with a range of other statistical software packages, circumventing the need to learn software-specific techniques each time functionality of a new package required, but also providing tools to help teach software-specific knowledge to those wishing to learn; its own in-house MCMC based estimation engine (eStat) for complex data modelling (including multilevel models); open source templates allowing users to write their own Stat-JR functions; an eBook interface providing an interactive way of reporting and disseminating science, and an innovative tool for teaching statistics.
R2MLwiN: An R command interface to the MLwiN multilevel modelling software package, allowing users to fit multilevel models using MLwiN from within the R environment. It is designed to be used with versions of MLwiN from v2.25 onwards although some features will work with earlier versions.
runmlwin: A Stata command to fit multilevel models in MLwiN from within Stata.
MLwiN has had some inter-operability with WinBugs for some time (see MLwin MCMC Manual)
Finally
The Centre for Multilevel Modelling has a full online course (Lemma) that can be followed in R, Stata and Mlwin
Hi Lukasz, it should be said that it very much depends on what and how much you need to do with the software. If you need to do basic two- or perhaps three-level linear models with uncomplicated error assumptions built-in routines in commercial packages mentioned, such as SPSS or SYSTAT typically can do the job. STATA and SAS offer more advanced capabilities, but SAS for one, probably depends on the level of your licence. On-line resources of all sort, either generic such as UCLA, or specific to a package, such as STATA, offer a lot of advice about the modeling as well as the code and sometimes the data to back up the examples. STATA also allows the incorporation of user developed routines although I don't keep pace with what additions to its own multilevel abilities, which are pretty good, there might be. I also believe you can run HLM from within it, just as Kelvyn mentioned for MLwiN.
So, for many, many years now, MLwiN, previously MLN, and HLM have been the two best-know commercial packages for multilevel modeling, both associated with leading experts on multilevel modeling on opposite sides of the Atlantic. It's very hard to generalize about much of anything these days with all the open sources stuff out there, but odds are these two special purpose packages are going to be more efficient than routines running within other programs, by which I mean faster and/or able to run bigger problems or run bigger problems with fewer resources. I would not disagree with a single thing Kelvyn says about MLwiN. HLM and MLwiN are different. You can get information about HLM from its publisher SSI. I have attached the link. SSI publishes another package called SuperMix. I know less about this, but it has some interesting features, such as a specific routine to do multilevel event history (survival) analysis. Among other things HLM is organized around being a teaching (and learning) tool, so it has a unique (as far as I know) graphical display of its equations, that can be exported as graphics for use in papers, for example. Most users stick to the pretty simple features of the program, but there are a lot of choices of model types available. It handles very large problems quite well, although a few users get the 64-bit version for super large stuff.
Having said all this about MLwiN and HLM, if you are just starting out, you may be fine with something that is in a statistical package you are using until you hit its limits or in something free, although the commerical packages tend to have better interfaces and support. Also having been using statistical software since the ice age, I'd be really hesitant to use open source, user developed stuff on data that's pretty much anything more than routine unless you can confirm your results on another piece of software. Estimating these models is not so easy and the commercial packages get a lot of testing. I don't have the reference handy, but you can go back to an article by Leland Wilkinson (from the early 1980s maybe) that demonstrated that early PC-based versions of SPSS, SAS and some other packages did not get the right answers to some things as simple as regressions due to faults in their algorithms. I am not saying that we aggressively debugged the reason, but running a very difficult problem in HLM, STATA, SAS, and R, had interesting results. Eventually HLM, STATA, and SAS could be reconciled, but the R results (lme4, if I recall) were dramatically different, and we never figured out why. STATA and SAS changed the model (in other words HLM was more robust). So when we changed the model in HLM, they all agreed. Sorry, we didn't have MLwiN available. My point is, don't take it as a given all software gives the right answer. Bob
There's nothing wrong with Soutrik's list, except that I want to reinforce, it still has a lot to do with what you need the software to do. Different software is stronger in different area in terms of features, and more subtley estimation methods. I find the Expected Maximum (EM) method used in HLM to be more stable than Iterative Generalized Least Squares (IGLS) used in many other programs, although most users won't run into trouble with either (but I am frequently pushing programs to the limit). Note that Soutrk is making a point about R, as an example or user-developed open-source software, in terms of reliability that I made also. Bob
R, simply because of its flexibility in its scripting language, MlWin is great for fitting models but has no formal scripting language and I have found it challenging to make sense of at best, even reading data into is a chore. I routinely use R to interface with Bayesian software such as OpenBUGS or JAGS and it works really well, plus the nlme and lme4 libraries offer tools that will fit the majority of multi-level models you see in the literature, at least in my discipline.
“MlWin is great for fitting models but has no formal scripting language and I have found it challenging to make sense of at best, even reading data into is a chore”
This shows the difficulty with these comparisons as no one has a deep experience of a large number of options and it is very difficult to keep up to date.
Thus ,MLwin now has great facilities for reading data – it reads Stata, SPSS and Minitab saved files and you can simply cut and paste from Excel (if you want to do that!). Indeed, some people use MLwin as an effective ‘stat-transfer’ as you can open up a Spss file and save it in Stata format.
On this see
Jon Rasbash, Chris Charlton, Kelvyn Jones, Rebecca Pillinger (2009) Manual supplement for MLwiN Version 2.14
Moreover, if you want to use syntax you can write macros to automate your analysis in MLwin. Admittedly the syntax is rather arcane. However, CMM’s interoperability project is bearing fruit and you can stay within Stata or R and call MLwin for within those packages and use the high quality algorithms that have been the subject of extensive published research. Importantly you can use likelihood-based estimates as starting values for the MCMC analysis.
The syntax for doing this is then set out in detail
I would also endorse Robert’s comments that is the response is continuous and you are fitting 2 or 3 level variance components model and you have plenty of higher level units with a good number of lower level units in each higher –level unit, and you are not trying to analyse millions of records, then the choice does not matter a great deal, but I would aver that for situations outside of this , the software choice can be very important
I am writing this is in the middle of a 3 day workshop on MCMC estimations in MLwin for advanced users where a colleague is currently describing the tools that are available for making the algorithms more efficient. The participants are here because they have had difficulty in fitting complex models and/or big datasets with the software that they have been using.
To help MLWiN users through the forest of support materials we have just introduced the following site which is meant to be a guide to all our resources
I am a huge fan of MLwiN, but I mostly use R these days. What is best will depend on the types of models you need to run. The main advantage of R is being able to run it on all my machines, share code with colleagues and with students across multiple platforms. I also find that for certain models (e.g., fully crossed random effects) the lme4 package is a lot quicker and easier than MLwIN. For other models I have found MLwiN superior.
This is a very insightful discussion and I am so pleased to have found it.
I am a PhD student working on a policy capturing study and still figuring out what my data analysis needs are. I am thinking of purchasing HLM7 but the user manual was not written for novices like me. Oh yes, I have limited statistical and programming knowledge.
Thanks to Kelvyn I have found the extremely helpful resources on CMM's site and signed up for a LEMMA course.
Going back to Luis' comment on "What's the best car out there?", I would start with Hierarchical - multivariate analysis and Non-linear - cross-classified analysis.
Which software would you recommend given the above 'constraints'?