You can always try Affinity Propagation (AP) by Frey et al. Already implemented in R as well as Matlab. Easy to understand and use. Needs a little preprocessing though and does not handle noisy data (in its original implementation).
For noisy data you can use IUC or DBSCAN. IUC needs some effort to implement, but is very versatile. DBSCAN is easier to implement, but based on my experience it's very sensitive to its parameters' values.
Hierarchical clustering is probably what you are looking for. Try to find: 'Algorithms for clustering data', Anil K. Jain, Richard C. Dubes, Prentice Hall, Advanced Reference series,1988. The algorithms are described, the different distance measures you could use and the implications for data interpretation.
The number of clusters will probably still the issue with these algorithms and more over you will have a few extra questions as well: what distance measure and what cluster algorithm to pick.
There is a Matlab implementation in their statistics toolbox and if you look around there are probably several more.