We are working on two skewed dependent variables. In this particular case, the few people at the end of one or both scales are of clinical importance. Thus, we opted that we focus the analysis of them against all the others.
We split scores on both scales at the 0.8 quantile. Then we construct four groups: those who exhibit high scores in either scale (2 groups: Beh1 and Beh2), those who exhibit high scores in both scales (1 group: Beh12), and those that exhibit none (1 group: Beh0).
We need to examine Beh1 , Beh2 , and Beh12 in terms of their scores in three continuous variables X1, X2 , X3, and one categorical variable G with two levels. Since the three continuous variables are correlated, we performed MANOVA (The categorical variable G is independent of being in omitted Beh0 or any of the groups included in the analysis) The MANOVA is significant.
Then we used LDA to find out whether we can also predict the group one is in from their score pattern in (X1 , X2 , X3). Due to LDA not taking a categorical variable as a predictor, we did this for each level of G separately. The LDA prediction has accuracy 63%, different from chance level, for G1, while only 58% for G2, not different from chance.
These findings are corroborated by descriptive statistics showing that for all Gi x Behj combinations, except for G1:Beh12 , X1, X2, X3 lie in roughly the same level within each combination. Only G1:Beh12 exhibits a rather low X1 outside the CI's of X2 and X3 .
We interpret this as follows: X1 is important in predicting Beh12 membership, but only for G1. This is corroborated by a moderation analysis on the same data.
Would you accept this interpretation, based on the methods used above? I would appreciate any feedback on this workflow. Thanks in advance!