For a K-mean cluster analysis, when variables are percentages have we to use n-1 variables?

07 December 2022 1 5K Report

Good morning,

I would like to share this question.

I am conducting a K-mean clustering analysis. My aim is to investigate if I can observe some patterns of food consumption. For my dataset, each row is the percentage of selection for three categories of food (CAT1 CAT2 CAT3) for one person. Thus, these three variables are associated: from two variables, we can find the percentage of the third by a simple formula (%CAT3 = 100 - (%CAT1+%CAT2). As K-mean clustering is sensitive to multicollinearity, do you think that it is better to run the K-mean algorithm with two variables only?

Thanks for your attention,

Ines

Aki Koivu

Yes, you should use k-means clustering with the variables CAT1 and CAT2 only. Since CAT3 is a computational variable linearly calculated from CAT1 and CAT2, it provides no additional information, other than imposing unnecessary collinearity.

Badges
Science topic

Similar topics
Mathematics

More Inès François's questions See All

Unexpected Increase in R² for the Third Component in sPLS model?

Hi everyone, I'm performing a sparse Partial Least Squares (sPLS) model to understand if the analysed contaminants (55 individual contaminants) explain my response variables (42 response...

07 July 2024 2,333 0 View

Seeking Expertise on Polymer Stabilization Agents ?

Hello, Is there anyone with expertise in polymer stabilization agents? I am looking for a scientific solution to stop a polymerization reaction and am open to any suggestions. Thank you for your...

30 June 2024 2,535 3 View

Can i keep distinct assembly parts after running topology optimization in Abaqus?

Hi! Does anyone know if you can keep the different parts of an assembly separate after running topology optimization on it in Abaqus? All three parts of my assembly merge to one after I...

17 April 2024 6,808 3 View

Can you identify what is this in my cell culture?

Hi, I have been cultivating Jurkat cells in RPMI supplemented with 10% FBS, and I have observed some small things in my flask. Although the cell growth appears unaffected, I want to ensure that...

16 April 2024 5,466 3 View

Does there exist a compact (Hausdorff) topological space X such that if F is a connected closed subset of X, either F=X or F={x} for some x in X?

Clearly X is a connected topological space.

18 March 2024 5,465 10 View

How to merge datasets in R by row including only matching records?

Good morning, I have two datasets with the exact same columns. I would like to select rows that have a matching ID between the two datasets (Please, see tables below).I tried to merge datasets...

11 January 2024 1,604 3 View

How do I map and study changes to river morphology (planforms) through time?

I identified a portion of a small river river (~10km) in my region that went through significant physical changes after a recent forest disturbance, with meanders being cut through and new erosion...

18 December 2023 4,969 9 View

Characterization of Basic Mineral Solution (pH=14)?

Hello, I am writing to request assistance in characterizing a mineral solution that I have developed, known to possess basic properties with a pH level of 14. My objective is to gain a...

05 December 2023 5,834 12 View

Does anyone know the best conditions to electrospin high Mw PVA in water?

Hello, I am trying to electrospin 146k-186k Mw PVA in water. I have tried following 2 different articles and I can't get the cone to form. I have tried varying voltage, needle-collector distance,...

04 December 2023 2,212 3 View

Anyone on arXiv.org ready to endorse my preprint?

My paper is titled "a New Mathematical Approach to AI: Semantic Calculus and Dynamic intuitive System". Instead of peer reviewing, this preprint server requires endorsement from previously...

26 November 2023 4,949 0 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Why does my protein refolded to beta sheet during thermal denaturation analysis?

Hi! So i attempted to understand a novel protein behavior towards heat application by analyzing its secondary structure change. I subjected the protein to a thermal denaturation analysis using...

06 August 2024 1,989 3 View