Hierarchical clustering is a greedy approach to cluster a dataset following a hierarchical tree structure. This approach makes perfect sense to me for phylogenetic data to model evolutionary relationships. Yet I also often see it applied to gene expression datasets, for which the rationale is less clear to me. What are the theoretical and/or empirical justifications for preferring hierarchical clustering to more agnostic clustering algorithms such as k-means in the case of gene expression data?