12 December 2015 4 10K Report

Hierarchical clustering algorithm is a well-known algorithm for clustering data points into different clusters. The number of clusters may not be specified apriori unlike K-means. Although, a cutting point value can be used to obtain clusters at a particular point during the clustering operation. The algorithm is either bottom-up (agglomerative) or top-down (divisive).

Recently I needed to cluster large dataset using HAC algorithm, so I checked Mahout list of clustering algorithms. Mahout has limited algorithm for clustering operation and all of these require the knowledge of cluster k. I tried HAC implementation on WEKA but the algorithm run for two days with 2GB heap memory size without producing any results. Any advice on this? Thanks

Similar questions and discussions