I am mostly a qualitative research who has obtained nominal data from interviewing 36 students (10 variables). I would now like to see if the cases (students) can be grouped together into about 3-4 clusters using cluster analysis
i think it matters of interest and motivation which fulfill needs. you can do it for small dataset but it must meet the basic requirements of cluster. You must be aware of clusters [k-Mean, HCL, etc] and which one is suitable with respect to your problem.
i do not have experience with SPSS but i can give you answer in R programming!
We used principle component (factor) analysis and ipsative cluster analyses in SPSS for a larger dataset (n = ~450) (https://www.researchgate.net/publication/222382079_Segmentation_by_visitor_motivation_in_three_Kenyan_national_reserve)s) Fairly straightforward process using the software.
Though with 10 variables across 35 students, won't your qualitative approach be more meaningful and appropriate? Could be neat to see what the numbers say, but one might be more influenced by the qualitative clustering.
Best,
Adam
Article Segmentation by Visitor Motivation in Three Kenyan National Reserves
Having such a rather small data set (n=36) , you can use hierarchical clustering in order to explore the overall tree of successive partitions Pk of k clusters (k ranging from=n-1 to 2) that can be formed by binary aggregations of individuals and clusters
SPSS provides an agglomerative hierarchical clustering procedure (Hierarchical Clustering Analysis) which is menu driven.
Assuming the qualitative nature of your data, you have to choose the Counts option into the Measure panel of the Method menu : the standard choice is the Chi-Square measure that applied to multinomial data (categories).
However by preliminary data transformation, you can tranform your categorical data into a complete logical coding (each category of a qualitative variable generating a logical dummy). Hence, in this case, you must choose the Binary option into the Measure panel of the Method menu, where the standard choice is the squared euclidean distance.
A classical reference, rather historical by now, for the hierarchical clustering methods is :A general theory of classificatory sorting strategies. I Hierarchical systems
by G N Lance and W T Williams
available at http://biocomparison.ucoz.ru/_ld/0/50_Lance_Willams_1.pdf
A modern introduction is provided by :
Data Clustering: A Review
by A K Jain, M N Murty, PJ Flynn
which can be found at :
http://eprints.iisc.ernet.in/273/1/p264-jain.pdf
Other methods can be used into SPSS with categorical data : the K-means procedure (choosing a partition, i.e. the number of clusters) and the Two-Step procedure ( finding the best number of clusters accordingly with two types of criteria in the first step, then proceeding to the aggregation of clusters in the second step).
I have started using HCA for binary data and squared euclidean distance since the data is nominal.Given the relatively small number and the nature of the data the use of CA will be as an exploratory tool to identify possible patterns in the data.
Why not cluster analysis ? there are examples where HCA is used to classify 5-6 objects on the bases of few variables. Try with some distance and with some classification method, the result are not too different if there is soem structure in your data. Avoid the nearest neigbor (single linkage). For the selection of dissimilarity measure the SPSS help offer a good guidance, the help of the PROXIMITIES procedure, it is more explicative that the help for CLUSTER.
I need to know is there is a minimun number of subject to run a hierarchical cluster analysis. My sample has 546 subjects, and I have two test of five factors each one. A revierw ask me why I think that this number is enough, it would be better if I can do it quoting some research.
I do not understand why your reviewer asks this question for a hierarchical clustering: is it related to the context of your research (e.g. cardinality and/or stability questions of the obtained partitions and clusters to be obtained) ?
" Since the authors stated that samples from previous studies were too small in terms of sample size, they should indicate why, on the contrary, their sample is adequate (e.g., power analysis)."
This is because there two similar previous studies with a smaller sample (the first with 92 subjects, and the second with 20).
Thanks to Marco for this reference that I'll try to get.
During this time interval, please find hereafter an open access reference about the problem to determine the relevant number of clusters, which can be of some help to answer partly to your referee because it documents the R-package NbClust :
NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set, October 2014, Journal of statistical software 61(6):1-36,
The original question (4 years ago) mentions " nominal data from interviewing ". In this case I recommend apply first the correspondence analysis to the qualitative variables. The correspondence analysis will produce the perceptural map. Then, in the next step, apply the cluster analysis to the coordinates on the perceptual map
José Francisco Moreira Pessanha Dominique Desbois .... please i have related question....
I actually have an experiment with 13 -sample size and 22 variables ....I want to perform cluster analysis(k-mean)....precisely hierarchical followed non-hierarchical....(k-mean)...please what can i do in this case
José Francisco Moreira Pessanha many thanks for your respond sir ...actually I did this....I used SPSS to undertake my experiment...I perform wards algorithm then I will perform K-mean.....can I do further thing to improve k-mean clustering....?
In my opinion with N>P the dependence between variable compromise the real utility of the cluster analysis. Personally I prefer in this case to extract the principal components and do cluster on PC. It is my opinion, but with this little number of subject, if K-means differ consistently from the group created by Ward method, it is because clustering is not clear (or there is a bad identification of the number of cluster)
I do not know the variables from the Khawla Domi study, but it may be possible to separate variables by type and form groups of variables (economic, demographic, social, etc.) and apply the PCA to each group of variables. The first component of the PCA in each group forms a variable for cluster analysis. This strategy can reduce the number of variables such that n> p
The answer of José Francisco Moreira Pessanha is very good and I like it because connects the statistical procedures to the conceptual problem. This procedure do not offer the orthogonality of the new variable, so may be appropriate to evaluate correlation among the new variable before to do clustering.
José Francisco Moreira Pessanha Marco Acutis thank you very much for your time....i just have 1 more question what did you mean by the first component since i perform what so called as dimension reduction(factor analysis)...how can i determine the first component .....do you mean the (threshold) from the graph....thank you again