Simple Answer is BIG NO. DL methods only extract features and features goes from local to abstract level. Feature selection in DL is still unexplored area due to black-box nature of DL
deel learning networks can perform the functions of feature extraction and selection. So, in many cases, they are employed to extract and select the features. Later, a classifer such the SVM is used to implement the classification task.
Indeed, one of the strong points of deep leaning is precisely the hierarchical feature selection along the successive level of increasing abstraction in detecting patterns.
Deep learning in its layers performs feature selection as well. So I believe you basically do not need to perform PCA or any other feature selection technique for feature selection. However, it might be a good idea to investigate/explore the integration of a feature selection technique, for example PCA, to the last layer of the NN and see the results.
Simple Answer is BIG NO. DL methods only extract features and features goes from local to abstract level. Feature selection in DL is still unexplored area due to black-box nature of DL
Deep learning algorithm learn the features from the data instead of handcrafted feature extraction. It does not use PCA. Learning features depend on the objective function and its optimization procedure.
If your data have huge dimensions (like SNPs, micro-arrays, text with a huge "dictionary") and are additionally very sparse, you cannot use DL due to rather limited number of neurons in the input layer (usually maximum a few thousands). So, the good issue is to use random projection (RP) algorithms to decrease the number of features to a reasonable number, which can be tackled by DL (i.e. a few thousand). However, this procedure can be treated rather as the "feature extraction" than "feature selection". That is, the "new" features are meaningless.
Yes. please refer " DEEP NEURAL NETWORKS VERSUS SUPPORT VECTOR MACHINES FOR ECG ARRHYTHMIA CLASSIFICATION" by Sean shensheng Xu, Man-Wai Mak and Chi-Chung Cheung .
it is an interesting question indeed and like the NFL theorem, there is no right answer to this. My personal view is neural networks are inherently black box and when they would generate the non linear combinations the feature having less discriminating power will get lesser weights. I have worked little bit in feature selection and what I have understood is , it is compeletly dataset and domain dependent.