I applied k-means and k-medoids clustering techniques on iris dataset, in particular I clustered with respect to sepal length and sepal width features across 4 classes. With k=3 using k-means I obtained a silhoutte score about 0.42 and then I identified that the cluster labels and class labels are largely different (verified using the crosstab functionality) . With respect to k-medoids, however, with the same k=3, the cluster labels and class labels are almost similar with only a few mismatches. The silhoutte score for this case is only 0.44. Could anyone explain why this is the case?

I understand that silhoutte score provides insights about how data points are centered around the cluster, lesser the value means the data points are sparse and not properly contained within the cluster region. Also for unsupervised setting, we are not supposed to use confusion matrix for estimating the cluster performance. Out of curiosity I tried. I would like to know why silhoutte score does not exactly capture the cluster performance in this context?

More Harshavardhana Anantharaman Srinivasan's questions See All
Similar questions and discussions