What do I do when the scores plot from PCA analysis does not give distinct grouping I'm expecting from my data?

More Adeyemi Adegbenjo's questions See All

The fathers of ancient and modern history and their contribution to history?

On what basis what their views approved, how were they able to influence the world civilization of today and the example of people I'm talking about are HERODOTUS AND LEOPOLD VON RANKE AND MANY MORE.

26 April 2023 4,244 2 View

Questinnaires for research topic on the effects of information technology on the efficiency of tax administration in Nigeria?

I need questionnaires for the research topic EFFECT OF INFORMATION TECHNOLOGY ON THE EFFICIENCY OF TAX ADMINISTRATION IN NIGERIA: A CASE STUDY OF FEDERAL INLAND REVENUE SERVICE.

14 February 2023 7,826 0 View

Is there any special procedure to detect Parkin in a Western blot?

Hi, I am trying to detect Parkin (a mitochondria related protein) on western blot however my antibody doesn't detect it at the right location (50KDa) but detects a double band at 110KDa. I used...

10 October 2022 4,944 2 View

Which Economic Theory Links, Trade Libralization, Income inequality and Poverty Reduction ?

I am trying to model the effects of trade liberalization/ openness, income inequality on poverty reduction.

25 May 2022 2,412 8 View

How do you secure disclosure in adolescents using HIV self test (HIVST) kit?

Many of the adolescents attending the STI clinics and youth friendly centres opt for the HIVST but they usually fail to disclose their results. Can you share your experience of our optimal...

17 January 2021 8,465 2 View

Please, is there suitable solvent to fractionate fatty esters into medium molecular weight fatty esters and high molecular weight fatty esters?

I want to separate a complex mixture of esters into medium and high molecular weight esters using solvent extraction.

11 October 2020 10,173 2 View

Can You Recommend good tutorial materials and literature on grid forming and grid following converter models?

Can anyone recommend good tutorial materials and literature on 'grid forming and grid following converter models'?. I will appreciate your recommendations and suggestions. Thanks.

02 July 2020 9,544 3 View

Is there any open source software available to model the microstructure of cementitious composites?

I will like to model the microstructure of concrete/mortar, is there any software to model the resulting products? Thanks

20 January 2019 1,265 2 View

What is the best way to coat PVA fibers with oil agent?

PVA fibers are modified with oiling agents to reduce bonding between the fiber and the cementitious matrix. What is the best type of oiling agent and method of application is most...

05 December 2018 2,199 1 View

How to get percentage contribution of factors in design expert?

I'm using design expert software for the design of the influence of different factors in my experiment. However, I couldn't get any section that shows the percentage contribution of each factor on...

29 November 2018 6,280 3 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Why does my protein refolded to beta sheet during thermal denaturation analysis?

Hi! So i attempted to understand a novel protein behavior towards heat application by analyzing its secondary structure change. I subjected the protein to a thermal denaturation analysis using...

06 August 2024 1,989 3 View

Damien Ertlen

You could try discriminant analysis (also known as canonical discriminant analysis) but unscrambler unfortunately does'nt offer this algorythm but many other software does. According to the shape of your matrix you can either use it directly or use the PC from your PCA analysis (as mentionned by Michel)

Best regards

Damien

PS; maybe you can tell us a bit more about your data matrix

J. A. Hageman

If a scores plot does not give you a clear distinction between groups, it may not mean that there is no distinction, it could simply mean that largest source(s) of variation is/are similar in both groups. Simple things to try in that case are (i) look at higher PC's, sometimes your clusters will pop up there or (ii) use some form of preprocessing like scaling (auto,etc), transformation (log10). This may help reduce the overall variation present in both groups.

If it is just about knowing about features that are different between groups, you can also find significant ones using a simple t-test. If you wan to do that multivariately, try some classification method like PLS-DA or PC-DA. In these cases you'll need good validation since these methods will always come up with significant differences.

Peyman Kabiri

In addition to all the above comments, I would like to emphasis on some important issues:

1. In order to make your features comparable, first you should make sure that you have them normalized in advance to applying PCA or any classification algorithm on your data, i.e. zero mean and unit variance for all the features.

2. PCA stands for Principal Component "Analysis". As it is mentioned in its name, this algorithm is aimed on better understanding the data set, its features and to analyze it. PCA is used for dimension reduction but not recommended for the classification. Nevertheless, one can use the resulted data set in lower dimensions for the classification with no problem.

3. There are some extensions on PCA as well, that make it applicable on non-linear feature sets. You may want to study those extended versions of PCA.

4. While using PCA, be aware of the effects of the high variance features on the results. This feature or features may have no information on them and may mislead you and your classifier to nowhere. Imagine the noise in the data (signal noise) that will come up as the most important feature by PCA but, actually, it is worthless!

Good luck with your work.

Cheers

Peyman

Adeyemi Adegbenjo

Thank you all for your submissions. My data matrix is a 335 X 167. Im starting with the NIPAL clustering algorithm and was to use SIMCA for final classification/testing new samples using the Unscrambler software. I will continue based on your pieces of advise. But my class A in the dependent variable has up to 312 samples of the 335, with my class B being only 23 samples. Do you think this imbalance might be responsible for the inseparability? Dr Peyman, can you please suggest an example of PCA extension to try out? Thank you.

Search for "Non-Linear PCA" and "Weighted PCA" keywords.

If I find any articles will suggest them to you.

Number of samples are no concern for PCA (although for the classification they are very important), instead, number of the features are the main concern here (number of fields in each record), i.e. 167 in your case.

You may also have to do some work on increasing number of your samples and on the uniformity of the distribution of your sampling in your sample space.

Thank you Dr Peyman

Hong-Dong Li

One important point to keep in mind is that the principal components with the largest variances say PC1 or PC2 are NOT necessarily the ones that have discriminating ability. PCA only provide variance information about the data, not about sample separation. So cases can happen that for example, the 3rd and the 5th PC can discriminate samples from different classes.

Thank you all.

Adeyemi.

Robert Dettman

Thanks for this interesting discussion!