I have a set of 35 independent variables. I do not have response variable for my data set. I used density plots to identify multi-modal distribution in my independent variables. Hence, I used Gaussian mixture clustering technique to group the data. Upon clustering, I obtained 6 clusters.

I designed hypothesis to test my results as follows

Hypothesis 1: H0: there is no significant difference in means in the clusters formed.

Before proceeding to ANOVA, I did Shapiro - Wilk normality test (rejected null hypothesis W = 0.99132, p - value = 1.623e-12) and outlier test (found that there are outliers in the data)

Next, I did Levene 's Test for Homogeneity of Variance and found that variances for groups are unequal. Results were same for Bartlett test, and Fligner - Killeen test.

From this, I failed to meet the ANOVA assumption. Now, that ANOVA is out of the picture what non-parametric test should I use to test that the clusters I have are unique or distinct. Or should I use Test for Homogeneity of Variance to redefine my hypothesis and conclude my finding?

Just out of curiosity, I did do ANOVA and MANOVA analysis and from the results, I could reject null hypothesis.

I have looked into different validation indices such as silhouette width (using this to confirm the number of clusters formed are optimal), Dunn, pearsongamma etc.

I have over 100 similar data sets I need to validate. Any help regards to this is very much appreciated.

PS: my data is normalized data with mean = 0 and SD=1

More Nagdev Amruthnath's questions See All
Similar questions and discussions