Good morning,

I would like to share this question.

I am conducting a K-mean clustering analysis. My aim is to investigate if I can observe some patterns of food consumption. For my dataset, each row is the percentage of selection for three categories of food (CAT1 CAT2 CAT3) for one person. Thus, these three variables are associated: from two variables, we can find the percentage of the third by a simple formula (%CAT3 = 100 - (%CAT1+%CAT2). As K-mean clustering is sensitive to multicollinearity, do you think that it is better to run the K-mean algorithm with two variables only?

Thanks for your attention,

Ines

More Inès François's questions See All
Similar questions and discussions