I have just started getting familiar with SEM and am currently playing around with Partial Least Squares (PLS) path modeling. Everything seems to be relatively clear so far, except for one thing. I work with neuroimaging data (mainly structural) from patients with neurological disorders. My interest is to model cognitive functions and to estimate an impact of different factors on them (e.g. treatment).

Selecting "candidates" for latent variables from high-dimensional clinical data seems relatively straightforward (e.g. use PCA to form "symptom dimensions"), but I'm not sure what to do with imaging data, given the fact that I usually have ~150 regional brain measurements of different modalities. And although I do have some understanding of brain-cognition associations, I'm not 100% sure how to reduce dimensionality in this particular case (merging together some of the anatomically and functionally related brain structures in order to form reflective constructs). My first idea was to perform hierarchial clustering on imaging data using Pearson r^2 as a metric, but then I would be biased towards selection of r^2 cluster-forming threshold. Therefore, maybe it might be better to perform linear modeling first and then select the most significantly correlated structures from GLM results as "candidates" for LVs. Does it make sense? What would you suggest?

Similar questions and discussions