If I use it to increase the N for cross sectional analyses (non rpt measures), does this confound the data ie, scores would no longer be fully independent given that some counts would be from the same subjects?
The long vs wide format distinction just depends on how your software detects the data structure it needs. In a LMM or GLMM that uses long format you need to explicitly identify the structure (usually specifying random effects). Some software expects repeated measures data in wide format (and requires you to identify the structure of the columns in some way).
In general if you structure repeated measures data in long format and use that to inflate n (i.e., you don't identify the random effects) then you will either get an error message or (if it runs) you will have an analysis that is incorrect (because it treats non-independent responses as independent).
I use the long stacked form in quite a lot of analyses and not just repeated measures where the repetition is over time. You can do a multivariate regression- that is a model with several responses - in this way. So you could have 3 behaviours ( eg diet, cig smoking, excerise) nested within an individual nested within a community (so a 3 level structure). You can then analyse the 3 outcomes simultaneously.
There are a number of advantages in using a multivariate approach.
· A single overall model is fitted to the data; this means that it is possible to test whether a predictor variable is differently related to each of the outcomes. That is you can legitimately conduct tests of the coefficients across the different outcome variables, which is not possible if you have fitted separate models to each outcome. It is also possible to perform a single test of the joint effect of an explanatory variable on several responses thereby lessening the problem of a multiplicity of tests finding results by chance.
· The tests of specific effects for specific dependent variables can be more powerful in the multivariate analysis as the standard errors can be reduced. This is known to use John Tukey’s evocative term as ‘borrowing strength’. This effect is most marked when the responses are strongly correlated and there is a large amount of imbalance and many respondents have missing values.
· It is possible to calculate the covariances and hence correlations between each outcome variable at each level so that in an examination example, one could calculate the correlation between coursework and unseen examination performance (the two outcomes) at the student level and at the class and school level. It may well be that that there is a different pattern of correlation at each level. Moreover, it is possible to calculate this correlation at each level conditional on predictor variables so that this becomes a form of multilevel partial correlation.
· The multilevel estimates are statistically efficient even when some responses are missing. In the case where the responses do have a conditional multivariate Normal distribution, IGLS estimation provides maximum likelihood estimates. Thus, it is possible to estimate models when not all the responses have been observed on all of the respondents. A classic and extreme case is the matrix sample design where there is imbalance by design. All respondents are asked a set of core questions but additional questions are asked of random subsets of the total sample. For example, all pupils are asked a core set of core mathematics questions; but random subsets are asked detailed questions on trigonometry, matrix algebra, and calculus. All the responses are modeled simultaneously so there is a much larger sample of the core questions based on everyone and this helps with the estimation of the smaller subsets. There are covariance terms between each pair of outcomes and information is shared through these covariances – the term ‘borrowing strength’ is again used for this process.
It is even possible to estimate models when the responses are a mixture of continuous and categorical data eg a binomial model for wther you smoke or not ; and a Poisson model of the count of cigarettes if you do smoke.
The original development was by Harvey Goldstein in
Goldstein H. (1989). Models for Multilevel Response Variables with an
Application to Growth Curves. . "Multilevel Analysis of Educational Data" R D
The full multivariate model for longitudinal data is considered in
Jones, K and Subramanian, V S (2013) Developing multilevel models for analysing contextuality, heterogeneity and change using MLwiN, Volume 2, University of Bristol.