There are several clustering algorithms which are developed especially for this problem. Three of the most famous examples are K-Modes (see "Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values" by Xuang), ROCK (see "ROCK: A Robust Clustering Algorithm for Categorical Attributes" by Guha), and Cactus (see "CACTUS–Clustering Categorical Data Using Summaries" by Ganti et al.). Clearly the research is still active on the field, so you will find many new algorithms and variations published continuously, e.g. "A genetic fuzzy k-Modes algorithm for clustering categorical data" by Gan et al. a few years ago.
Also, you can actually use any clustering algorithm as long as you have a suitably-defined distance function. In this respect, you may be interested in the paper "Similarity Measures for Categorical Data: A Comparative Evaluation" by Boriah et al., which has an extensive experiment using K-NN for anomaly detection.
Ah, yes...I found a completely unknown universe of references, searching for "categorical data". It was a problem of key-words, I think. I was searching "nominal data" or "conceptual data". Thank you by the suggestions. They were very useful.
Yes, keywords can be daunting! I spent countless hours searching on the web and then discovering hundreds of references once I knew the "correct" word.
If I remember correctly, nominal data is categorical data that has no order defined on it. However, I often see "categorical" as a synonim for "nominal" in the literature.
Don't know if this fits also: There are algorithms that allow swarm robots to cluster at specific places in the environment. For example see: BEECLUST algorithm which was inspired by the clustering behavior of honeybees.
You could also look for 'clustering of (dis-)similarity or relational data. Several classic but robust vector quantization algorithms (Self-organizing maps, neural gas, fuzzy-cmeans, ...) were extended also to deal with those data
There are various implementations which allow mixed data type input. However, some of these convert everything into numerical once inside. The Condorcet clustering technique is interesting, implemented in the IBM Intelligent Miner system, see:
You may also be interested by Affinity Propagation. (http://www.psi.toronto.edu/index.php?q=affinity%20propagation)
This algorithm makes very weak assumptions about your data: all you need to do is define some similarity between data points. However this similarity doesn't even have to be a proper distance function: it does not need to be symmetrical and not all data points need to be related.
You can look at the original Science paper (http://www.sciencemag.org/content/315/5814/972.short) for examples of applications, such as clustering faces, genes expression or flight connections.