Certainly, 21 features is regarded as small in machine learning jargon and you should be able to run your ANN code without difficulty. Nowadays, we are faced with 100, 1000 and even more features in NIR, OCR datasets, Frequency domain techniques and similar optical applications. To make you code works fast or in real-time applications It is better to do some pre-processing on big-data such feature reduction using PCA and other techniques (see links below) and then apply ANN.
You can use PCA which performs a linear mapping of the original 21 features to a lower-dimensional space ( (feature reduction with data loss) in such a way that the variance of the data in the low-dimensional representation is maximized. Basically, PCA converts a correlated feature space into a noncorrelated feature space. In the new space, features are reordered in decreasing variance values such that the first transformed feature accounts for the most variability in the data.
This one is surely a high dimensional data and size of data will be determined based on the total population. Such a high dimensionality surely will reduce the overall performance of ANN and will increase the total number of training cycles.
However, feature selection depends on the problem that either the selected features will solve the current problem and they are enough keeping in view the all aspects.
21 dimensions are relatively high for data analysis purpose...it is recommended to use PCA and reduce dimension(if possible). This all depends on the nature of the problem.
Certainly, 21 features is regarded as small in machine learning jargon and you should be able to run your ANN code without difficulty. Nowadays, we are faced with 100, 1000 and even more features in NIR, OCR datasets, Frequency domain techniques and similar optical applications. To make you code works fast or in real-time applications It is better to do some pre-processing on big-data such feature reduction using PCA and other techniques (see links below) and then apply ANN.
You can use PCA which performs a linear mapping of the original 21 features to a lower-dimensional space ( (feature reduction with data loss) in such a way that the variance of the data in the low-dimensional representation is maximized. Basically, PCA converts a correlated feature space into a noncorrelated feature space. In the new space, features are reordered in decreasing variance values such that the first transformed feature accounts for the most variability in the data.
It is normal if you want to use PCA for dimension reduction and you have 100 dimensions in input and want to get 20 dimensions. I used 381 dimensions in input and got 16 features in output. Why do you want to use PCA ? why not ICA? or deep neural network? DNN (deep neural network ) is the start of the art method for feature extraction (dimensional reduction ).