If you are looking for the "theory and examples of how to perform a supervised and unsupervised hierarchical clustering" it is unlikely that you will find what you want in a paper. Paper tend to either i) present an advance in a certain direction e.g. a new method or ii) review a certain area. Papers in the first category will have too many details about a single method while papers in the second category will mention many methods but they will be unlikely to provide the theoretical foundation that you seem to want. A textbook will be probably a more likely place to find a good theoretical explanation of all the concepts needed in order to understand these methods. I personally attempted to provide exactly this in this book:
The chapter on clustering alone has 65 pages but it does provide everything you need to know in order to get started with clustering. You can access this chapter online for free on the Amazon web site.
If you are looking for the "theory and examples of how to perform a supervised and unsupervised hierarchical clustering" it is unlikely that you will find what you want in a paper. Paper tend to either i) present an advance in a certain direction e.g. a new method or ii) review a certain area. Papers in the first category will have too many details about a single method while papers in the second category will mention many methods but they will be unlikely to provide the theoretical foundation that you seem to want. A textbook will be probably a more likely place to find a good theoretical explanation of all the concepts needed in order to understand these methods. I personally attempted to provide exactly this in this book:
The chapter on clustering alone has 65 pages but it does provide everything you need to know in order to get started with clustering. You can access this chapter online for free on the Amazon web site.
They have online tools, examples and very clear tutorials for unsupervised methods (class discovery). I think it is a very good starting point to play around with you dataset using different algorithms (UPGMA, SOTA...)