How to select the dimensions of the input vector in principal component Analysis (PCA)?

I am not sure if we understand your question correctly. Are you asking for the right amount of principals components?

An easy, but comprehensive and well visualized tutorial on PCA can be found here http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/112-pca-principal-component-analysis-essentials/ .

You run PCA with a data frame (rows being samples, columns being parameters). With the help of the eigenvalues and the scree plot, you can decide how many principal components to include.

Daniel Wright

It depends what you are using the result for, and how many dimensions you think should account for the interesting variation in the sample.

As previous commentators note, if you have no thoughts about these two aspects, the eigenvalues of the correlation matrix can offer clues. If you're data are X and you are using R, try

plot(1:ncol(X),eigen(cor(X))$values)

and look up scree test for how to interpret this. But first think about those first two things.

Ankit Soni

@michael heming

Thanks, but I am not asking for right amount of principal components.

I am asking to reconstruct the input vector, i.e. if I have an input sequence of 2000x1 and I want to apply the PCA over it then to do so I should first convert it into on input vector of dimention n*k which after applying PCA depending upon variation is converted to an output vector of dimention n*s (s being lesser than k).

Then is there any criteria to select n and k or any combination of n*k should be selected randomly ?

Michael Heming

Ankit Soni

Maybe it's easiest if we start with a minimal working example.

If we follow this easy tutorial: http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/112-pca-principal-component-analysis-essentials/ we can simply do:

library(factoextra)

library(FactoMineR)

data(decathlon2)

decathlon_active print(res)

**Results for the Principal Component Analysis (PCA)**

The analysis was performed on 23 individuals, described by 10 variables

*The results are available in the following objects:

name description

1 "$eig" "eigenvalues"

2 "$var" "results for the variables"

3 "$var$coord" "coord. for the variables"

4 "$var$cor" "correlations variables - dimensions"

5 "$var$cos2" "cos2 for the variables"

6 "$var$contrib" "contributions of the variables"

7 "$ind" "results for the individuals"

8 "$ind$coord" "coord. for the individuals"

9 "$ind$cos2" "cos2 for the individuals"

10 "$ind$contrib" "contributions of the individuals"

11 "$call" "summary statistics"

12 "$call$centre" "mean of the variables"

13 "$call$ecart.type" "standard error of the variables"

14 "$call$row.w" "weights for the individuals"

15 "$call$col.w" "weights for the variables"

If this doesn't answer your question, I would suggest that you provide a minimal working example for your data.

What is the practical range of Signal to noise ratio (SNR) in real time wireless communication applications like WSN, IoT, IoT health care ?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Strugglling with m6A dot blot any suugesstion ?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?