PCA - consider all principal components?

More George Azzopardi's questions See All

What should Berlin do as a city to become as impactful as London and Paris in World Football?

Please go through my Abstract. I can also share a proposed Thesis Outline.

04 August 2024 2,077 0 View

How can we separate diastereomers of larger organic moiety?

For example, oxazine type moieties which forms diastereomers, how the separation can be done? Chiral HPLC is the only possibility?

10 July 2024 1,519 3 View

How does one remove a withdrawn article from a journal's plagiarism test repository?

We submitted an article to a journal, which subjected it to a plagiarism test. Afterward, we withdrew the paper and submitted it to a different journal, only to discover it now has a very high...

07 June 2024 4,688 3 View

Is it possible to use commercially available 3% or 6% hydrogen peroxide for the hydrogen peroxide scavenging assay?

Good day! I am currently testing for the antioxidant activity of my extract via the hydrogen peroxide scavenging assay. I have already done the experiment in six trials using 30% hydrogen...

27 May 2024 4,677 3 View

Where did the 'respond to request' button go?

The format to requests has changed. I simply want to respond to a request and can no longer find any way to do this. The 'help' section regarding this is 2 months old. Thanks, George

30 April 2024 250 0 View

Can migrants apply for economic asylum in your country as part of his recognised HUMAN rights?

The harsh reality facing the world and migrants majorly is the wide economic disparity and Inequality fuelled by some rich world economies exploring resources and failure to meet social...

09 April 2024 1,420 0 View

Is it possible to validate the studies like Implantation, Acute toxicity, skin sensitization, irritation on animals ?

Is there any method validation procedure for toxicological studies like Implantation, Acute toxicity, skin sensitization, irritation on animals. Generally these studies will conduct on animals.

28 March 2024 280 0 View

How do I fix significant Quality control Drift with the GFAA?

I run an Agilent Technologies Graphite Furnace and a recurring problem when analyzing Lead is that the continuous calibration verification, (CCV), QC slowly drifts lower from the target (20ug/L)....

27 March 2024 3,609 0 View

In traditional taxonomy, can a species image be prepared based on only one half and mirrored the other half in photoshop?

Recently, people working in faunal research have practice to photomontage when identifying a species (in most cases they are not professional taxon experts), and they make a mirror image of one...

21 March 2024 6,684 8 View

Hi, can ethanol show fluorescence?

Since my compounds are soluble in ethanol, pure ethanol is showing a broad peak at 371 nm. Is it possible?

20 March 2024 4,843 4 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Why does my protein refolded to beta sheet during thermal denaturation analysis?

Hi! So i attempted to understand a novel protein behavior towards heat application by analyzing its secondary structure change. I subjected the protein to a thermal denaturation analysis using...

06 August 2024 1,989 3 View

Chandra Mouli P.V.S.S.R.

PCA is basically a dimensionality reduction technique. You choose the top k principal components. PCA is not used for classification purpose. If you have classification, then choose Linear Discriminant Analysis (LDA) or support vector macine (SVM) if the data is linearly separable otherwise you can choose non-linear SVM for non-linear data

Anil Kumar

Agree with chandra. PCA only use for dimension reductions while it depends, the top k principle components.

Alain Lesaffre

Hi George,

I suppose you ask if I do a PCA and use all the PC in one classification algorithm will I get the equivalent results. It depends of your data specially their distributions, I suppose you scale already, if it is the case then you should preprocess your data before to come to normal and then do the scaling, centering.

If the difference of units have a meaning in your classification and you have no outliers in your data, which have one noticeable influence of the axes of the PC then you classification should be similar.

Hope this help.

Alain

Zafer Yavuz

Good question :)

Theorically the classification accuracy does not change, because no dimension reduction is done, but, in practice, I wonder !

George Azzopardi

Thanks for your reactions. I understand that PCA is a dimensionality reduction technique, however, nothing keeps us from using all the principal components. In fact, I was experimenting on a data set with 18 features. I normalise the feature vectors using L2-norm. I do not do z-transform. When I transform the data using all principal components I obtain again a data set with 18 features. The classification results that I obtain with these transformed features (using Naive Bayes) is significantly higher than that when I do not transform the data. Am I missing something? I was expecting to achieve the same results.

Klamer Schutte

George, applying PCA will attempt to un-correlate the components of your data. In cases where before PCA your data is highly correlated this might improve the independence of your components. Naïve Bayes often is applied per-component (although you do not provide such details right now) and does assume independence of its inputs (that is the naïve part right?). So after PCA it will be more independent and thus you might get better results. I guess in effect you perform a whitening operation as described in https://en.wikipedia.org/wiki/Whitening_transformation. Best Regards, Klamer

Thanks a lot Klamer! Yes, it makes sense.

Gino Tesei

Each PC found by PCA is a normalized linear combination of the n (=10 in your case) initial predictors, where normalized means that the l-2 norm of the linear coefficients is equal to one. So, assuming that the number of observations > 10 and that the rank of the predictor matrix is equals to 10, using {PC1,..., PC10} instead of {X1,...,X10} as predictors matrix vs. the same output variable will get you the same hypothesis function (= prediction on the test set) if the solution of the optimization problem behind your model is invariant to this kind of linear transformations in the predictor space. For example, if you were using a logistic regression model you would get the same prediction and, as a consequence, the same accuracy. On the other hand, if you were using a non linear model like SVM, you would get a different prediction and, in general, a different accuracy. The mathematical proof of that is a bit long and complex, so you can find here an R simulation where assuming the same {Xtrain,Ytrain} and {Xtest,Ytest} you can find the same prediction on Xtest (and hence the same accuracy = 0.45) both using the initial predictors {Xtrain,Xtest} and their PCs {PC.train,PC.test}. On the contrary, the same doesn't hold for SVM (accuracy from initial predictors equals to 0.5 while using the PCs as predictors equals to 0.45). You can change easily {Xtrain,Ytrain} and {Xtest,Ytest} just changing the initial seed (here set up to 333) and you can find out that the same pattern holds.

Example: a "linear" model (logistic regression) vs. a non linear model (SVM)

set.seed(333)

Xtrain

Thanks a lot Gino for your elaborative answer! Indeed when I use a classifier that combines the features linearly, the results are always the same.

Jorge L G F S Costa Pereira

Hi George:

I'm used to perform PCA in order to make some data representation over 2 or 3 dimensions.

In my experience, data plots varies significantly when in the way you perform data pre-treatment.

Variable centring (mean subtraction) is necessary in order to enter in variance space.

Variable normalization (zero mean and unitary variance) scales covariance matrix and enhances correlation dependencies.

Data scores in both cases can be surprisingly different!

Statistical criteria to decide the number of principal components and relevant variable contributions is an also important subject.

Best regards,

J Costa Pereira

Dear Jorge, thanks for your contribution too!

It is now clear to me why Naive Bayes classifier has more possibility to work better after rotating the axes i.e decorrelating) with PCA. Moreover, classifiers that are based on comparing the pairwise distances of samples with a linear function should not be effected when we rotate the axes because the pairwise distances will remain exactly the same. I confirmed this when I used KNN. However, when I used an SVM with a linear kernel the classification rate improves after I rotated the axes with PCA. My understanding is that the first step of SVM with linear kernel is to compute the pairwise similarities using dot product, which is a linear operation. What is causing such an improvement then?

Colin Layfield

I would expect using all the dimensions would mean it behaves the same. LSA certainly behaves that way.

It only behaves in the same way when the classifier used is based on some distance measures; such as KNN with Euclidean distance. In case of Naive Bayes (NB), for instance, by rotating the axes (the role of PCA), the features become uncorrelated. This satisfies the basic independence assumption of NB and as a result NB performs much better.

Zenon Gniazdowski

My suggestion: https://www.researchgate.net/publication/319469038_New_Interpretation_of_Principal_Components_Analysis

Article New Interpretation of Principal Components Analysis

Alaa Tharwat

Yes,

Here is a very good source of PCA including presentation and source code

Presentation Principal Component Analysis (PCA) : An Overview

Article Principal component analysis - a tutorial

Ismail Bilgen

I applied PCA before training SVM+RBF classification model. I used all PCs and it substantially increased the performance (accuracy, sensitivity and specificity ..) I also asked the same question: "Does using all components make sense?" My experience says "yes".