It is simply not possible to use the k-means clustering over categorical data because you need a distance between elements and that is not clear with categorical data as it is with the numerical part of your data. So the best solution that comes to my mind is that you construct somehow a similarity matrix (or dissimilarity/distance matrix) between your categories to complement it with the distances for your numerical data (for which you can use simply an euclidean or manhattan distance). Then use the K-medoid algorithm, which can accept a dissimilarity matrix as input. You can use R with the "cluster" package that includes the pam() function.
Then, as with the k-means algorithm, you will still have the problem for determining in advance the number of cluster that your data has. There are techniques for this, such as the silhouette method or the model-based methods (mclust package in R). However there is an interesting novel (compared with more classical methods) clustering method called the Affinity-Propagation clustering (see the attached article), which will cluster the data and determine from the data the most appropriate number of clusters. Care needs to be taken though because this algorithm will take a similarity matrix as input instead of a dissimilarity matrix like in the Kmedoid implementation aforementioned. Please take a look at that article and the R implementation that you can find in the "apcluster" R package.
I hope that helps. Best regards.
Article Clustering by Passing Messages Between Data Points
Article APCluster: an R package for affinity propagation clustering
If there is a logical order of the categories (i.e. category A is more similar to category B than to category C due to some features of the categories) you can apply weighted values to categories.But this is a typical "false" category feature (because it can be decomposed in a vector of numerical features ). If the problem is related to real categorical features each category has the same distance to each other. You can set a fixed distance for any category feature depending on the logic importance (weight) of this category for clustering.
Example: if you have two category features A and B by the knowledge of the clustering problem you can set that a mismatch in the catogory A has a weight 10 and a mismatch in the category B has only a weight 5.
This reasoning is valid if you have a valid knowledge of the meaning of the categories and the clustering problem. It could not be always a correct reasoning.
What do you think about "Two Step Cluster Analysis" which is incorporated in SPSS? This method can cluster mixed data (categorical and scales variables). Do you have any experience with that?
Can it handle big data (example: sample size 2000)?
Here-after are some useful references to answer your questions.
About "Two Step Cluster Analysis":
Shi et al., "Two-Step Method for Clustering Mixed Categorical
and Numeric Data", 2010 - http://www2.tku.edu.tw/~tkjse/13-1/02-IE435.pdf
About k-Means mixing categorical and numeric data:
Huang, "Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values", 1998 - http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.15.4028&rep=rep1&type=pdf
Ahmad et al., "A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets", 2011 - https://www.kau.edu.sa/Files/830/Researches/64870_36338.pdf
About k-modes-extended k-Means:
[Including automatic cluster counting] Bai etal., "An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data", 2011 - https://pdfs.semanticscholar.org/9b99/517332596a4d1332c6151613f5d20e4bf3e7.pdf in association with http://www.mecs-press.org/ijitcs/ijitcs-v8-n3/IJITCS-V8-N3-6.pdf
Bai et al., "A cluster centers initialization method for clustering categorical data", 2012 - http://cicip.sxu.edu.cn/achievement/paper/A_cluster_centers_initialization_method_for_clustering_categorical_data(2012)ESA_20125309070507.pdf
Bai et al., "The k-modes type clustering plus between-cluster information for categorical data", 2014 - https://pdfs.semanticscholar.org/a083/886a95c923a4ccf5ebc757897cbee5dbc851.pdf
About robust categorical data clustering:
Chawla et al., "k-means--: A unified approach to clustering and outlier detection", 2013 - http://epubs.siam.org/doi/pdf/10.1137/1.9781611972832.21
Jiang et al., "Initialization of K-modes clustering using outlier detection techniques", 2016 - https://ccc.inaoep.mx/~ariel/2016/2016%20Initialization%20of%20kmodes%20clustering%20using%20outlier%20detection.pdf
we proposed a distance measure (ConDist) for mixed type of data at ECML 2015. It calculates distances between categorical values by extracting information from correlated context attributes.
With ConDist, you can calculate distances between two data points with continuous and categorical attributes. Consequently, you can use ConDist for clustering algorithms like DBScan or WARDs. Centroid calculation (which is needed for k-Means) is not possible.
You can use k-prototypes algorithm for mixed numerical and categorical values.
Here (https://github.com/nicodv/kmodes), you can find a python package for k-prototypes.
[1] Z. Huang, “Clustering large data sets with mixed numeric and categorical values,” in In The First Pacific-Asia Conference on Knowledge Discovery and Data Mining, 1997, pp. 21–34.
The Weka tool allows you to cluster numeric and categorical data. For the latter, it uses mode rather than means - so there are limitations and implications. But it can be done.