Partitioning a large set of objects into homogeneous clusters is a fundamental operation in data mining. The k-means algorithm is best suited for implementing this operation because of its efficiency in clustering large data sets. However, working only on numeric values limits its use in data mining because data sets in data mining often contain categorical values. In this paper we present an algorithm, called k-modes, to extend the k-means paradigm to categorical domains. We introduce new dissimilarity measures to deal with categorical objects, replace means of clusters with modes, and use a frequency based method to update modes in the clustering process to minimise the clustering cost function. Tested with the well known soybean disease data set the algorithm has demonstrated a very good classification performance. Experiments on a very large health insurance data set consisting of half a million records and 34 categorical attributes show that the algorithm is scalable in t
The mentioned methods are good for numerical variables as they are similar to K-means. I would suggest two approaches for your problem:
1. First approach: Use an integrated framework of K-means (or CLARA as it is good for large datasets) for numerical variables and K-modes for categorical variables. I added below a guideline link for the most robust and common algorithms for numerical variables.
2. Second approach: convert the categorical variables to numerical variables using encoding algorithms (see link below) and then use K-means or CLARA for all variables (after conversion).
Also, I suggest you test and evaluate multiple algorithms to identify which method best fit your dataset.