Machine learning Algorithms are classified as "Classification" , "Clustering" or "Regression".
Support vector machines or k-means are specifically designed to solve a problem (classification in the first case, clustering in the second), and are indeed just an optimization procedure to maximize an "expected goodness of classification" or "goodness of clustering" criterion. The beauty lies in the choice of the criterion and optimization procedure. HMM are not an algorithm per se. They are a specific kind of probability distribution over sequences of vectors - for which we know good parameter estimation and marginal distribution computation algorithms. But asking whether they are in the "clustering" or "classification" family is as ridiculous as asking whether the Gaussian distribution is supervised or unsupervised learning.
Why "both classification and clustering"? Because of the following: Being probability distributions, HMM can be used for classification in a bayesian framework; and being model with hidden states, some latent clustering of the training data can be recovered from their parameters. More precisely:
HMM can be used for classification. This is a straightforward application of the bayesian classification framework, with the HMM being used as the probabilistic model describing your data. For example, you have a large database of utterances of digits ("one", "two", etc) and want to build a system capable of classifying an unknown utterance. For each class in your training data ("one", "two", you estimate the parameters of a HMM model describing the training sequences in this class - and you end up with 10 models. Then, to perform recognition, you compute the 10 likelihood scores (which indicate how likely the sequence you want to recognize has been generated by the model), and the model with the highest score gives you the digit. In the Rabiner tutorial on HMMs, the training stage is "Problem 3", the classification stage is "Problem 2".
HMM can be used in an unsupervised fashion too, to achieve something akin to clustering. Given a sequence, you can train a kk-state HMM on it, and at the end of the training process, run the Viterbi algorithm on your sequence to get the most likely state associated with each input vector (or just pull this from the γγ during the training process). This gives you a clustering of your input sequence into kk classes, but unlike what you would have obtained by running your data through k-means, your clustering is homogeneous on the time axis. For example, you can extract the color histograms of each frame of a video sequence, run this process on this sequence, and you'll end up with a break-down of the video into homogeneous temporal segments corresponding to scenes (the unpractical bit is that you have to set up the number of scenes kk in advance). This technique is commonly used in automatic, unsupervised, structure analysis of video or music.
Classification : Identifying which class of a set of pre-defined classes the data belongs to.
Clustering : Learning the set of classes the data belongs to.
Regression : Finding a relationship between on variable and one or more others.
The description of the HMM on Wikipedia has the following table:
as shown in https://en.wikipedia.org/wiki/Hidden_Markov_model
so the number of states (classes) is fixed.
That means that the algorithm does not try to figure out the number of classes (states) are --- so it's not open-ended clustering (where the number of states is unknown).
However, as @nikie points out, the HMM will do clustering.
There is not really an independent variable (as exists in the regression context) --- so it's not regression.
So my answer is that the HMM is a classification and a clustering algorithm, I do not believe it is a regression.
Hidden Markov models have been around for a pretty long time (1970s at least). It's a misnomer to call them machine learning algorithms. The HMM model itself is a stochastic process based on a Markov chain, usually discrete in time and space but not necessarily so. The estimation problems concerning HMMs, as covered in Rabiner's tutorial paper in the Proc IEEE, are the computation of the HMM likelihood function, the estimation of the state sequence (Forward backward or Viterbi algorihm) and the re-estimation of the HMM parameters via the EM algorithm, leading to the Baum-Welch formulae. None of these is what I would call a machine learning algorithm, except maybe EM. Once you have estimated the HMM parameters ("trained" in machine learning speak), you can use it to provide likelihoods for the noisy inputs to the HMM, which is sort of like classification. It is most useful, IMO, for state sequence estimation, which is not a machine learning problem since it is for a dynamical process, not a static classification task. Here is a link to an example of what you can do with HMMs in state sequence estimation.
Conference Paper A High Performance 1-D Hidden Markov Model Tracker for Passi...