Simplify a panel data for clustering - Question regarding correlation issue?

19 June 2023 1 9K Report

I have 15 years panel or longitudinal data (measure of monthly vegetation indices across months and over years). I want to try both multivariate timeseries (tried the sktime kmeans clustering) as well as simplified k-means. The question is with regards to variables for the simplified k-means.

I have simplified the data by taking mean and standard deviation of all 12 months (e.g. Jan_Mean, Feb_mean,....Dec_mean & Jan_sd, Feb_sd,.... Dec_sd - total 24 variables). The simplified data resembles to that of multivariate timeseries.

My aim is to cluster the areas which have similar growing conditions. I am using k-means for the same. However, the new data structure have a high correlation coefficient between months and this expected because of the Phenology (Plant growth starts when temperature reaches >=6 degrees in Spring and reaches its peak in July and then goes down by the end of the year). I dont want to lose the importance of each month.

However, k-means has difficulty managing columns with a high correlation coefficient, as it gives more weight to those columns. This can be corrected by substituting Mahalanobis distance for Euclidean distance, but due to the complexity of the calculation, I cannot employ this solution. Therefore, I require your help in addressing the correlation issue so that I can reduce correlation and use the default distance.

I have attempted to aggregate the months by developing seasons/Phenology (by taking the mean and standard deviation) - despite this, there is still a high correlation between some seasons, whereas the correlation between the majority of seasons is moderate (>0.7).

Can I take a month difference (current-previous) to significantly minimize the correlation?

As the data is of the panel form, I would also like to include within-year variations (annual mean and standard deviation e.g. Year2001_mean, Year2002_mean,.....)?

I sincerely need your advice.

Andrew Paul McKenzie Pegman

You cannot simplify the data like that. You must analyse all of the raw data in one big table to include all possible interactions :)

Badges
Science topic

Similar topics
Mathematical Sciences
Graphs

More Mohana Priya's questions See All

In the situation below, can I apply braun and Clark thematic analysis?

I have a theoretical framework for preparing a interview guide to assess the unmet care needs of a sub population of chronically ill patients and their caregivers(dyadic interviews).

04 August 2024 8,798 7 View

How to culture and maintain Lx-2 cells?

Hi there! I recently started working on Lx-2 cells. When I thaw the cells, they start attaching within few hours but later in about 2 days they started rounding and dying. I used DMEM+2% FBS...

31 July 2024 3,893 0 View

My nanoparticle has a lower fluorescence life time of 2 ns (usual life time between 3-10 ns). what are the inferences I can get from this?

what all details we will get from fluorescence life time data

10 July 2024 505 1 View

How to troubleshoot the Milliplex Cytokine assay, when I see >OOR/<OOR in my data?

I am conducting cytokine measurements using a multiplex assay. Some of the proteins are yielding results that fall outside the standard curve values, either below (< OOR) or above (> OOR)....

24 June 2024 9,717 0 View

How to convert mIU/mL of beta galactosidase enzyme to pg/ml? I have used ELISA to determine beta gal concentration in mIU/ mL in human serum samples?

We have to report the concentration in pg/ml, the ELISA kit provider did not reveal the specific activity of the enzyme. In that case how to convert the units?

05 June 2024 2,173 5 View

Seeking Participants for Survey on Performance-based Incentives in IT Projects, please help if possible?

Dear colleagues, I am conducting a research study on the impact of performance-based incentives on project success in the IT industry. Your participation in this survey will provide valuable...

26 May 2024 3,890 3 View

Can I apply two-step custer analysis of continuous variables for defining cases and controls?

There are 33 items adding up to an overall value for qol score. How to present the results of two-step cluster analysis in the methodology for differentiating cases and controls group? Should any...

14 May 2024 7,717 2 View

To conduct case studies among few diabetes participant?

Hi, Planning to conduct a few case studies among diabetes participants. Is there any methodology available for the same?

05 May 2024 8,371 2 View

How could i statistically find the similarities between the temperature trend (24 hrs) of two different days? what test or analysis is applicable?

I have 24-hour temperature data for 30 days at 1-hour intervals. how could I compare and find days with similar climatic trends statistically?

17 April 2024 3,211 1 View

Are you interested to collaborate with Institute of Nutrition and Fitness Sciences ?

Purpose of Collaboration - 1. Collaborative Research in the field of nutrition, health, fitness 2. Knowledge Sharing in Nutrition domains 3. Writing articles in Indexed Journals If...

14 April 2024 7,563 0 View

Which type of compound does lamda max of 218 indicate in a uv-vis spectrum of a partially purified compound through column and TLC?

A crude extract of fungal culture using EtOH was subjected to column and TLC and partially purified compound was obtained. UV vis spectrum of the compound/s has max absorbance at 218nm. The...

11 August 2024 9,801 2 View

Can you connect an HPLC to a Mass Spec only at a certain time point?

Can anyone explain this method? Especially the last statement where it says only at 1.5 to 2.5mins was the MS/MS connected to the UPLC. How is that possible, is it a feature in this specific...

11 August 2024 8,141 3 View

RNA Extraction Using Hot Borate Method No Longer Working?

I've been performing RNA extraction on cotton petiole tissue for a few months now using the method described in the following paper, a derivative of the typical hot borate method...

08 August 2024 9,882 2 View

Can I use a HisTRAP column for affinity chromatography?

I'm working on selecting antibodies against a recombinant protein that has a His-tag. My idea is to first bind the recombinant protein to a HisTRAP column and then use this column for an affinity...

07 August 2024 505 3 View

What are the key methods and indicators used in assessing the biodiversity of river ecosystems, and how do these methods account for variations ?

Biodiversity assessment of river ecosystems is crucial for understanding the health and stability of these environments. This question aims to explore the various techniques employed to evaluate...

07 August 2024 4,290 3 View

Hi, please what is the best tool on Aspen plus that can help me get the best inlet temperature and pressure for cryogenic distillation?

I am trying to recover liquid CO2 from a mixture of 0.6 CO2, 0.3 N2, and 0.1 O2. My aim is to recover about 99% liquid CO2 at the bottom of the column and make sure the amount of gaseous CO2...

06 August 2024 4,611 0 View

Absorption coefficient of methane?

Hello, Can anyone provide me with the absorption coefficient of methane gas at 7.7 um? Any reference?

06 August 2024 980 5 View

How to perform EEG source analysis on each trial of data separately?

Hello Everyone I have a question about structure for connectivity analysis on sources. My goal: preprocess and cut data into trials create headmodels, using template MRI file perform source...

30 July 2024 2,744 1 View

A positively charged and 10x his-tagged protein that doesn't bind to any chromatographic resin?

Hello everyone, I am currently working on a protein that is 10x his-tagged and positively charged (predicted pI=10.16). But when I tried to use Ni-NTA column to purify the protein, it's not...

24 July 2024 7,293 4 View

Why is trifluoroacetic acid (TFA) used in c-18 column?

I would like to know the role of the TFA (trifluoroacetic acid) used on reverse phase chromatography c-18 column. It's is often used with a concentration of 0.1% on the mobile phase.

23 July 2024 8,075 3 View