I am using the CLUTO clustering toolkit to cluster software faults from multiple versions of a software project. The goal is to find a clustering solution that provides insight into the characteristics of the software, and its defects. I would like to show that I get consistent clusters on different versions, and identify differences as they occur over time.
I have not seen any papers where this has been done. So far I am thinking of a chi-square test for homogeneity. The clusters are ranked, and the ranks seem to be fairly consistent as well. Adding ranks into the comparison may help my argument, but I'm not sure which test would be most appropriate. Any suggestions on the best way to do this? Any references to see similar comparisons in other applications of clustering?