It really depends on the actual task and data (e.g., do you have image data or tabular strucutred data etc.). With deep learning methods at the base, usually the artificial neural network (e.g., CNN) will take over the job as feature extractor and generator). For other purposes and applications the following papers might help:
Article Performance Comparison of Feature Selection and Extraction M...
Conference Paper Comparison on Feature Selection Methods for Text Classification
Conference Paper Comparison of feature selection methods for machine learning...
Article Feature selection and dimensionality reduction: An extensive...
Allow me to depart from the norm on this topic. While ML algorithms can help eliminate features as non-useful because they do not correlate with object classification (non-correlated implies non-causal/non-informative), it ought not be the only thing employed in selecting useful features (correlation does not imply causation/informative).
I personally use ML algorithms to narrow down all the possible features to those that are potentially causal and informative (saves tons of time), then from a scientific point of view I determine if the features fit a causal model for why certain objects exhibit certain behaviors captured in the feature space BECAUSE the object is what it is. Lastly, I use only those informative features that have causal explanations and abandon the rest until they do have causal explanations and are informative.
If you have no causal explanation you cannot determine with correlation alone if the value you seem to be getting out of the feature is causal or data bias (making it appear valuable, but it's really not). In low-risk applications the eventual poor behavior of using a non-causal feature may not be devastating, but in high-risk applications it can be. In certain low-risk applications it can better to use features without causal explanations than not using any features at all, but if you can find causal features, my advise is to do so and use them.
So to answer your question, you are the best feature selector, and leverage ML algorithms to narrow your focus to potentially causal and informative features.
Check out this source for more details on the causal approach to ML: https://ieeexplore.ieee.org/document/9438325
Maybe you can consider the recursive least squares algorithm (RLS). RLS is the recursive application of the well-known least squares (LS) regression algorithm, so that each new data point is taken in account to modify (correct) a previous estimate of the parameters from some linear (or linearized) correlation thought to model the observed system. The method allows for the dynamical application of LS to time series acquired in real-time. As with LS, there may be several correlation equations with the corresponding set of dependent (observed) variables. For RLS with forgetting factor (RLS-FF), acquired data is weighted according to its age, with increased weight given to the most recent data.
Years ago, while investigating adaptive control and energetic optimization of aerobic fermenters, I have applied the RLS-FF algorithm to estimate the parameters from the KLa correlation, used to predict the O2 gas-liquid mass-transfer, hence giving increased weight to most recent data. Estimates were improved by imposing sinusoidal disturbance to air flow and agitation speed (manipulated variables). The power dissipated by agitation was accessed by a torque meter (pilot plant). The proposed (adaptive) control algorithm compared favourably with PID. Simulations assessed the effect of numerically generated white Gaussian noise (2-sigma truncated) and of first order delay. This investigation was reported at (MSc Thesis):
Thesis Controlo do Oxigénio Dissolvido em Fermentadores para Minimi...