In our problem of character recognitions we have about 80 dimension features. Should we apply PCA before using it for recognition using ANN since we are already getting good results without PCA?
if you want to apply a PCA before in order to reduce the dimensionality of features, you may conserve the most cumulative percentage of inertia (98-99%). Moreover, depending of your features, maybe you must use another DR methods such as in this reference.
John A. Lee, Michel Verleysen, Nonlinear Dimensionality Reduction, Springer, 2007.
Indeed PCA can be a good tool to compress your data. However fixing a cumulative variance threshold is not a good idea because noise can express a relatively high amount of variance. In conclusion, it can be interesting to test ANN very different number of PCs.
This really depends on the amount of training data you have. If you would have a theoretically infinite amount of training data, then applying PCA would only degrade your results. The reason you want to perform some kind of dimensionality reduction is related to the curse of dimensionality. Estimating a lot of parameters (e.g. highly dimensional Neural Net) based on only a few training samples would result in overfitting. You can generalize your classifier, and thus avoid overfitting, either by increasing the amount of training data (but that is usually not possible), or by reducing the number of dimensions used (thereby reducing the number of parameters to be estimated).
PCA is one technique that can be used for dimensionality reduction. It finds a new, lower dimensional orhtonormal base such that the largest variance of the original data is kept. However, the discriminative information in your data is not necessarily captured by the largest variance. Therefore, if you don't need PCA, don't use it. You could also have a look at other dimensionality reduction methods such has LDA.
I recently wrote an article on my blog about the curse of dimensionality in classification problems: http://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/
Depending on the size and nature of your problem, I would like to refer you to the family of SNE-based approaches, in particular t-SNE or BH-SNE (http://homepage.tudelft.nl/19j49/t-SNE.html). These approaches, roughly said, try to preserve the neighborhood structure of the points in the high-dimensional space also in the low-dimensional embedding. They are nonlinear and nonparametric which may be desirable depending on the problem at hand. BH-SNE allows the application of t-SNE to problems with many datapoints but is currently providing only two-dimensional embeddings. Hence, if you want to do ANN on 2D points, you might want to give BH-SNE a try. It can also help in checking if thecomputations done on the original data would make sense when the data was embedded into 2D.
Given that you are working with a problem that seems to be amenable to supervised approaches, LDA as suggested by Vincent Spruyt could be an interesting option. Another interesting option might be LMNN: http://www.cse.wustl.edu/~kilian/code/lmnn/lmnn.html.
A word of caution, along the lines of Vincent Spruyt's answer above: before doing PCA, you should perform feature selection, i.e. discard the features that are little or not relevant for discrimination. PCA takes care of input space only, hence does not say anything about the relevance of your features for classification. PCA may provide a compact repesentation of your input data by finding linear combination of the features; but if the features are irrelevant, linear combinations of them will not help. Therefore, you should first perform feature selection, and, once you have discarded irrelevant features, perform PCA on them. PCA may, or may not, be useful, depending on the geometry of the data in the space of relevant features.
Typically, the sum of the eigenvalues can be used as a guidance with regard to how many PCs should be considered - usually how many of the PCs can make 70% of the sum of the eigenvalues.