What criteria can be used to decide number of clusters in k-means statistical analysis?

More Ali Mohamed's questions See All

I have get this error when i calculated the geometrical optimization for prco3, it takes 12 hours until gives this message in the outputfile?

Fatal error in MPI_Allreduce: Other MPI error, error stack: MPI_Allreduce(1628)......: MPI_Allreduce(sbuf=000002459254A180, rbuf=000002459F86A140, count=4851, MPI_DOUBLE_COMPLEX, MPI_SUM,...

09 August 2024 7,615 1 View

How can we differentiate between calcite, dolomite, siderite, magnesite and ankerite minerals in carbonatite rocks in thin section under op microscop?

How can we differentiate between calcite, dolomite, siderite, magnesite and ankerite minerals in carbonatite rocks in thin section under optical microscope?

07 August 2024 2,132 3 View

Unusual intensity drop in some sections of chromatograms in DDA?

Hi, we have measured tryptic peptides using both DDA and DIA method on QExactive. In DDA replicates i saw unusual intensity drops occurring at the same sections of chromatograms in DDA replicates...

07 August 2024 3,218 4 View

Can you suggest reliable sources defining "3D mesh" and "3D city models"?

Dear fellow researchers, I am currently working on a paper where I need to provide a reliable reference that defines and distinguishes between 3D mesh models and 3D city models. Although I am...

06 August 2024 9,986 2 View

Absorption coefficient of methane?

Hello, Can anyone provide me with the absorption coefficient of methane gas at 7.7 um? Any reference?

06 August 2024 980 5 View

How much total RNA concentration to be extracted from sorted plasma cells from bone marrow of C57BL/6 mice for RT-PCR ?

i have sorted anti-NP specific plasma cells from bone marrow of C57BL/6 mice at certain times after immunization with variable counts and isolated total RNA using TRIZOL method for RT-PCR using...

05 August 2024 8,835 1 View

How can we use mobile apps for improving students' academic performance?

Mobile apps can be a powerful tool for enhancing academic performance, how can we use mobile apps for improving academic performance

04 August 2024 9,492 0 View

Is someone interested in testing some of our new Alluaudite compounds (Molybdate) in the fields of heterogeneous catalysis and photocatalysis?

We have an original Alluaudite-Molybdate we want to test for catalysis and photocatalysis applications.

04 August 2024 2,261 1 View

What is the best sampling strategy?

I am conducting a qualitative study that uses interviews to investigate the perceptions of teachers about a particular leadership practice and I am focusing on 3 schools which have a total number...

01 August 2024 8,457 10 View

Looking for help on sem image analysis?

Hello I am conducting a microstructural analysis of a soil treated with lime. The following sem images are of the untreated s1 and treated soil s3. The untreated soil contains quartz calcite...

01 August 2024 572 0 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Why does my protein refolded to beta sheet during thermal denaturation analysis?

Hi! So i attempted to understand a novel protein behavior towards heat application by analyzing its secondary structure change. I subjected the protein to a thermal denaturation analysis using...

06 August 2024 1,989 3 View

Shounak Datta

Article Fast Automatic Estimation of the Number of Clusters from the...

Have a look at this paper. This may help.

Fallah H. Najjar

You can look at this link. my be help

https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set

Mashael Al Luhaybi

Determining the optimal number of clusters is subjective as it depends on the method used for partitioning the dataset.

You can use some methods such as elbow, silhouette and other statistical methods for determining the optimal number of clusters for k-means.

There are some functions in R for computing the indices to decide the optimal number of clusters such as fviz_nbclust()

Oliver Rutishauser

I worked on "Isotropic Dynamic Hierarchical Clustering." My assumption was that the clustering should determine the number of clusters and levels of the hierarchy automatically (like B-tree.)

Monalisha Sundar

To determine the best cluster number for k-means classification, cluster validity indices such as Silhouette index, DB index, Xie-beni index, SSW and Partition co-efficient can be used.

Each method differs with the index criterion (either minimum or maximum of the index is said to be the best cluster number).

These indices measure the compactness between each cluster by measuring interspecific /intra-specific distances between data points and provide the index range at the best cluster number to be used for classification.

It’s always good to use two to three cluster validity measures and compare the best cluster number for further analysis.

All the above mentioned measures are available as R-studio functions.

https://www.rdocumentation.org/packages/cluster/versions/2.0.7-1/topics/silhouette

https://rdrr.io/cran/fclust/man/PC.html

https://www.rdocumentation.org/packages/clusterSim/versions/0.47-3/topics/index.DB

B. K. Hooda

Agree with Monalisha Sundar.

Philippe Wanlin

Hi all !

I appréciate the responses given to Ali as I also use R ! But I think they are not sound as Ali asks about a sort of cut-off criteria to choose the number of clusters in a data set according to certain parameters/variables. The different coefficient you suggest are NOT available in SPSS. Maybe Ali should begin using R ... But if he wants to continue usign SPSS, he shoud know that there is - I really well know SPSS and computed lots of cluster analysis, if I'm wrong please tell me -- no mesure/value computed by SPSS to decide on the number of clusters. There are only some rule of thumb procedure. Here is how I proceed :

1. observing the data with descriptive statistics - a scaterplot matrixs is of great help to get a hypothesis of an approximative range of minimum and maximum clusters - sometimes, when not too much parameters, a multidimensional scaling may be instructive.

2. run a hierarchical cluster analysis using a agglomerative algorithme (mostly a Ward but you have different choices of methods and the one you use depends on your research question / purpose. I ask SPSS to provide the dendrogramme on which I apply the Best-cut criteria (first longest iterration distance) with the agglomeration schedule I choose the first longest distance also.

3a. Using another agglomerative method is sometimes useful but You may know that a perfect cluster fit is difficult to reach as you use different algorithm. To examine clustering fit, I use Chi2 (is there fit ? p

Nikos Koutsoupias

You may use a combination of FAMD and HCPC (with consolidation) functions in R package FactoMineR to get a recommendation.

Kirill Orlov

Most important "internal clustering criteria" to compare clustering results and to choose the best number of clusters - are available in SPSS too. Google "Kirill's spss macros page" and download "Internal clustering criteria" collection. It is also possible to run the macros from menu dialogs (see KO_macros.spe extension).

Renesh Bedre

You can use the Elbow method to find the optimal number of clusters for k-means

See how to implement it https://www.reneshbedre.com/blog/kmeans-clustering-python.html