In longitudinal studies examining change in cognition over time, commonly outcome measures are composite scores formed by combining a number of individual mental tests. Typical examples are cognitive domain scores (memory, executive function etc). For a cross-sectional study, composite scores are commonly calculated by first standardizing individual test scores as Z-scores (ie adjusting for age, sex and education) and then forming a composite of these as either their sum or average. This composite can then be standardized by transforming to whatever distribution is desired (e.g. z, T, IQ scores etc). Since the variances of the test scores are equal, this method results in equal importance being given to each of the tests in the resulting composite domain score.
When forming composite scores at later occasions in a longitudinal study, it is desirable for the composite scores to be strictly comparable across time points. This can be achieved by applying exactly the same formulae to the raw test scores at later waves that were used at baseline to form the baseline composite score. (In other words, the baseline sample is used as the normative, or reference, group for the calculation of standard scores at later waves.)
However, a consequence of using this method is that for the composite scores at later waves, there is no assurance that equal importance is given to each of the component tests. For example, in aging studies, it is well known that different tests that might be accepted markers of the same cognitive domain are affected differently by age, and test scores standardised in this way no longer can provide a guarantee of equal variances at later waves.
Of course one could use structural equation modelling and apply a latent growth model, with corresponding loadings constrained to be equal at different time points. But if, in fact, variances of the different tests do vary differently across time, the modification indices could indicate that those equality constraints should be relaxed. If that were done, then the latent cognition factors would have slightly different construct validities at different occasions. A larger problem could be that the selection of tests, commonly regarded as neuropsychologists as belonging to a particular cognitive domain, do not conform to a single factor model. (This is particularly true of markers of executive function.)