Is it advantageous to cluster a data set using some clustering algorithm and then agglomerate the cluster by applying Hierarchical Clustering algorithm?
You must to dedice how do you want to use this result, the clustering hierarchical is used when there are few cases (less than 150) because you will to interpretate de levels of the hierarchical, if you have so much cases don´t use clustering hierarchical, support vector machine is another algorithm to make clustering.
P.D.: sorry for my grammar, still learning english :)
you do this usually when you think that hierarchical agglomerative clustering will give you a good "high level" view of your data (depending on the linkage you use, the hierarchical clustering can follow complex shapes at high level) but either you fear that hierarchical clustering will get lost at the beginning of the process (linkage functions and noise do not always mix well) or you have so many individuals to start with that you face algorithmic / performance problems with the calculations of linkages (randomly subsampling would also be a possibility)
in such case, you start with a "standard" clustering, say k-means, so as to bring the number of "individuals" of the hierarchical clustering (these "individuals are indeed cluster centroids ; whether they are weighted or not by the population of the cluster is another matter) into a manageable range (1000 for instance, so, you run a k-means with k=1000)
(in the process, you also hope that this first step will "smooth" the data somewhat so that the linkage function will not drive the hierarchical clustering onto the void at the first "noise bump")
Clustering the clusters relies upon an assumption that the importance of the first level clusters is roughly equal across your entire sample set. That is, your first level clusters reflect a similar assumption - that a given sample from your set is just as important as any other sample. Once that first level of clustering is complete, you have cluster centers. If the number of samples in a given cluster is approximately the same as the number within the other clusters, then it would be reasonable to do a second level clustering, however, if the number of samples in a first level cluster vary widely, then a second level clustering has the potential for providing poor quality second level centers. For example, let's say you are clustering colored balls (reds, blues, yellows) and the first level clustering gives you three nice groupings that correspond with the colors. A second clustering would likely bring the 2nd level cluster to the center of the color space (which may be what is intuitively expected). Looking at the 3, level 1 clusters when they contain much differing numbers of sample brings up the issue of how uncertainty will play into the location of the second level cluster. Small number of samples within a given cluster will "move" the level 1 cluster center around your parameter space to a much greater degree than could be compensated for by the smoothing affect of a large number of samples. This "noise" in estimating the level 1 cluster centers is then reflected in the results of the second level clustering more by the lightly populated first level clusters than the smoothed centers of the other level 1 clusters. Rather than mixing the colors in the example above, if there are only a couple of Red balls, one of which is "off-color" (noise in the sample), the Red cluster center from level one would be "pulled" away from true Red and thus making the second level cluster pull away from the center of the color space.