I used Shannon entropy as Feature selection to reduce inputs of artificial neural networks, it's improving the accuracy of the model, and the number of neurons in the hidden layer is reduced.
You can use Shannon entropy/Information entropy as means of comprimation of the original system, which is often a complex one, into a restricted set of features. In other words, entropy serves as a probe of the full often complex system.
AN or ML techniques can be applied to such preselected features.
I found this approach very useful during the design of a predictive model that is capable to predict deadly heart arrhythmias up to one hour before their onset from ECG recordings processed by permutation entropy that is fed into Shannon formula.
Details can be found in the preprint of the paper on my RG profile "Application of Machine Learning and Complexity in Medicine: Prediction of Drug Induced Arrhythmias in Rabbit Model up to One Hour Before Their Onset ...". ML methods were evaluated by my coworker.
The paper itself is quite extensive. The full version should occur in the coming months.
Yes, but not in general. A good survey on feature selection is here https://upcommons.upc.edu/bitstream/handle/2117/97413/R02-62.pdf and Conference Paper A Survey Of Feature Selection And Feature Extraction Techniq...