Unsupervised classification (clustering) is a wonderful tool for discovering patterns in data. I know that it is also an ill-posed problem, but is it thinkable to do cross validation, for example on k-means using internal validity indexes (silhouette, etc.)? For example, a k-fold cross validation is applicable following this schema? I choose a battery of internal validity indexes, I derive k means from the train set then I compute validity indexes on each fold (not on train set as it is usually done). In your opinion, is this approach more consistent with respect the classical one (namely, computing the validity indexes on the whole dataset considered as the train one)? Except the Gabriel model, there exist other cross validation frameworks, and what are preferable?

Similar questions and discussions