What is the least number of dimensions for which PCA should be applied?

More Vijay H Mankar's questions See All

Can anyone give me a matlab code for cellular automata transform?

A Cellular Automata Transform as proposed by Olu Lafe is useful in image processing and other applications. Also one can suggest some good tutorial over it as free ebook is not available.

31 December 2014 7,194 4 View

What should be the criteria to decide whether the features selected for classification are appropriate?

For classification if some m number of features are selected. some of these may be co-operative (good for classification) and rest may not. What is the statistical measure to discard rest of the...

02 March 2014 7,102 16 View

Weak DAPI staining after immunohistochemistry - how to improve?

After immunohistochemistry of previously fixed in PFA and EtOH and then frozen 20 μm sections of zebrafish brain, DAPI staining is very weak (right) compared to the same sections stained without...

05 August 2024 9,637 2 View

How to isolate lymphocytes from mouse spleen?

I have tried several times to isolate lymphocytes from mouse spleen, but all attempts have been unsuccessful. I tried most available protocols. I used different dissociation media (HBSS with Ca...

04 August 2024 9,913 7 View

Do some Staphylococcus aureus strains have in vitro antimycobacterial activity ?

Some Staphylococcus aureus strains Inhibit the growth of Mycobacteria in Mueller Hinton Agar medium containing 10% OADC. Do some Staphylococcus aureus strains have in vitro antimycobacterial activity?

29 July 2024 10,023 2 View

Trouble finding an anti-Digoxigenin antibody to perform Southern Blot technique?

I'm currently using Roche anti-DIG-AKP Fab fragments to perform Southern Blot assays, but this reagent is delayed since november last year. It's anybody facing the same problem? If so, how are you...

15 July 2024 6,668 0 View

What are the opportunities and limitations for using digestate ion different crops and soil typs and how can its application be optimized?

What are the opportunities and limitations for using digestate ion different crops and soil typs and how can its application be optimized - I have few of them listed and can be updated in this...

13 July 2024 2,431 2 View

Does Fentanyl presence in injecting drug user communities require special harm reduction measures?

With Fentanyl already being detected in some countries in Europe, I believe many of us are not educated enough on special needs of injection drug users that use Fentanyl, due to its high strength...

08 July 2024 7,694 3 View

Can I use BSA for a Histological Blocking Solution?

Hi all, I would like to know if it is possible to use Bovine Serum Albumin (BSA) in my blocking solution for immunofluorescence muscle fiber typing. The primary antibodies' host is mouse. The...

07 July 2024 7,967 3 View

How long can I store mouse tissues in neutral buffered formalin for, and how often should I add fresh formalin?

My study uses C57BL/6J mice and I am required to store portions of the important organs in neutral buffered formalin (NBF). Could I potentially keep those tissues in the formalin for a few weeks...

06 July 2024 8,360 2 View

Can an Ischemic Buffer Alone Effectively Mimic Hypoxia in an In Vitro Cardiac Cell Hypoxia-Reoxygenation Model?

I am working on an in-vitro model of hypoxia-reoxygenation (H/R) in cardiac cells (H9c2). Due to the unavailability of a hypoxic incubator, I am exploring alternative methods to create hypoxic...

30 June 2024 1,234 2 View

List of journals impact factors?

Dear colleagues, Is it possible to send me the list of journals impact factor for the year 2024 (classification is for the year 2023)? excel format if it is possible. Thank you in...

29 June 2024 2,102 3 View

Ludovic Journaux

Dear Vigay Mankar,

if you want to apply a PCA before in order to reduce the dimensionality of features, you may conserve the most cumulative percentage of inertia (98-99%). Moreover, depending of your features, maybe you must use another DR methods such as in this reference.

John A. Lee, Michel Verleysen, Nonlinear Dimensionality Reduction, Springer, 2007.

best regards from France

Ludovic Duponchel

Indeed PCA can be a good tool to compress your data. However fixing a cumulative variance threshold is not a good idea because noise can express a relatively high amount of variance. In conclusion, it can be interesting to test ANN very different number of PCs.

Maybe it could be interresting to estimate the intrinsic dimensionality.

Vincent Spruyt

This really depends on the amount of training data you have. If you would have a theoretically infinite amount of training data, then applying PCA would only degrade your results. The reason you want to perform some kind of dimensionality reduction is related to the curse of dimensionality. Estimating a lot of parameters (e.g. highly dimensional Neural Net) based on only a few training samples would result in overfitting. You can generalize your classifier, and thus avoid overfitting, either by increasing the amount of training data (but that is usually not possible), or by reducing the number of dimensions used (thereby reducing the number of parameters to be estimated).

PCA is one technique that can be used for dimensionality reduction. It finds a new, lower dimensional orhtonormal base such that the largest variance of the original data is kept. However, the discriminative information in your data is not necessarily captured by the largest variance. Therefore, if you don't need PCA, don't use it. You could also have a look at other dimensionality reduction methods such has LDA.

I recently wrote an article on my blog about the curse of dimensionality in classification problems: http://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/

Cedric Laczny

Depending on the size and nature of your problem, I would like to refer you to the family of SNE-based approaches, in particular t-SNE or BH-SNE (http://homepage.tudelft.nl/19j49/t-SNE.html). These approaches, roughly said, try to preserve the neighborhood structure of the points in the high-dimensional space also in the low-dimensional embedding. They are nonlinear and nonparametric which may be desirable depending on the problem at hand. BH-SNE allows the application of t-SNE to problems with many datapoints but is currently providing only two-dimensional embeddings. Hence, if you want to do ANN on 2D points, you might want to give BH-SNE a try. It can also help in checking if thecomputations done on the original data would make sense when the data was embedded into 2D.

Given that you are working with a problem that seems to be amenable to supervised approaches, LDA as suggested by Vincent Spruyt could be an interesting option. Another interesting option might be LMNN: http://www.cse.wustl.edu/~kilian/code/lmnn/lmnn.html.

@ Vincent Spruyt: Nice blog post!

Best,

Cedric

Gérard Dreyfus

A word of caution, along the lines of Vincent Spruyt's answer above: before doing PCA, you should perform feature selection, i.e. discard the features that are little or not relevant for discrimination. PCA takes care of input space only, hence does not say anything about the relevance of your features for classification. PCA may provide a compact repesentation of your input data by finding linear combination of the features; but if the features are irrelevant, linear combinations of them will not help. Therefore, you should first perform feature selection, and, once you have discarded irrelevant features, perform PCA on them. PCA may, or may not, be useful, depending on the geometry of the data in the space of relevant features.

Daqing Chen

Typically, the sum of the eigenvalues can be used as a guidance with regard to how many PCs should be considered - usually how many of the PCs can make 70% of the sum of the eigenvalues.