[Feature Selection] How to determine a "good" number of features to be selected?

More Jean Paul Barddal's questions See All

Does anyone have experience electroporating SKOV3 cells with a large plasmid using Neon transfection?

I have been trying to electroporate SKOV3 cells with a large plasmid (11kb) without much success. Any tips?

29 July 2024 3,229 1 View

What is the main used curing agent for epoxy based coating?

I am trying to coating metal oxide into steel substrate, for better sticking planning to utilse epoxy based, Could you please suggest suitable curing agent for the coating purpose

25 July 2024 438 3 View

How to scrape off the surface layer of compound sputtering targets as with multiple usage top layer's stoichiometry changes from the original target?

I am experiencing this for the Bi2Se3 sputtering target. Initially, even if the target was 2:3 in Bi : Se, with a few sputtering runs, due to the high escaping tendency of selenium, the top layer...

23 July 2024 1,649 1 View

How to use NCBI datasets ?

I have been trying to extract genome from NCBI using their dataset tool, however some examples seem not to work : ./datasets download genome taxon "Homo Sapiens" --annotated --assembly-level...

20 July 2024 1,339 2 View

What are the best ways to automate my job and live without working?

I want to know the best methods to automate my job and live without working?

12 July 2024 5,993 2 View

What are the best ways to earn more money as a researcher?

I would like know what are the best ways to earn more money as a researcher?

11 July 2024 4,695 2 View

How do I monitor my CCTV camera in Bangalore from California?

I would like to know how do I monitor my CCTV camera at my home in Bangalore from my current location which is California in USA?

11 July 2024 831 0 View

How do I check if the research work is human made or AI generated?

I would like to know how to check if a given research work is human made or AI generated?

11 July 2024 8,645 5 View

How does an AI detection tool work?

I am interested in knowing how does an AI detection tool work and what's the main logic behind it? How it is able to detect the presence of AI work?

11 July 2024 3,889 6 View

How do I prove that my research work is noval and not plagiarized?

I would like to know how to prove my research work is new and not copied from other people's work or AI tool?

09 July 2024 8,235 2 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

The Curse of Evolution and Complexity?

Brain and body mass together are positively correlated with lifespan (Hofman 1993). The duration of neural development is one of the best predictors of brain size, and conception is the best...

05 August 2024 6,247 3 View

Could dyes amplify the spectrum of light to a specific wavelength?

I am interested to know the behavior of dyes toward light. Specifically, Blue dyes re-emit the spectrum, especially from the green zone (known as principal in LED lamps, and blue dyes are known...

05 August 2024 3,290 1 View

How to report results of Generalised Linear Mixed Models in a journal article?

Hi everyone, If you have written or come across any papers where Generalised Linear Mixed Models are used to examine intervention (e.g., in mental health) efficacy, could you please share the...

04 August 2024 4,130 4 View

Need help with my research project on open source SIEM and machine learning?

Hello everyone, I am currently working on a research project that aims to integrate machine learning techniques into an open source SIEM tool to automate the creation of security use cases from...

04 August 2024 3,196 2 View

Swimming/space travel depends on the proprioceptive muscle spindles?

When the entire neocortex is ablated in rodents, although they are still able to swim, all the limbs move continuously and asynchronously (Vanderwolf 2006; Vanderwolf et al. 1978). Normal animals...

03 August 2024 835 3 View

What are the limitations and challenges of using machine learning for predicting concrete compressive strength in practical applications?

Machine learning (ML) has shown great potential in predicting the compressive strength of concrete, an important property for structural engineering. However, its practical application comes with...

03 August 2024 2,546 2 View

Professor G R Sinha

Ultimate challenge in such tasks is actually robustness; and to achieve robustness you can select optimum number of feature sets; rather than features.

Jean Paul Barddal

What exactly do you mean with optimal number of feature sets? Should I think about ensembles here?

Samer Sarsam

Hi Jean,

There is no magic number that works for all the tasks, simply you can evaluate the selected features based on classifier performance using those features. According to that, you can increase/decrease the number of selected features.

HTH.

Samer

Hi Samer, thanks for the answer. I strongly agree with you, yet, several papers still use some "magical" numbers to select the number of features. My idea would be to find a proper rationale for these specific numbers or if they are rule of thumbs with no solid rationales behind them.

Dear Jean,

In fact, it is not an easy mission because of several reasons: number of the original features (loaded/utilized) changes from task to another. In addition, nature of those features plays a significant role in determining the selected features number. Finally, relation between the features, from the one hand, and the utilized algorithm for feature selection, from the other hand, are another factors to be aware of during this process.

Peyman Kabiri

Dear Jean Paul,

From my point of view, this problem area does not have an exact answer.

Goal is to find a trade-off between the accuracy and the execution time (computational cost).

The execution time is directly connected to the number of features used.

Hence, our problem is an optimization problem where intention is to optimize the solution in such a way that we can have the highest accuracy together with the lowest execution time (smaller number of features).

Feature selection is aimed to reach to this point of equilibrium.

Best

Peyman

Xingyu Zhang

I think this is quite subjective. When I do principle component analysis to get the main features of a large dataset. I usually use the first several components which explain 80% or 90% of the whole variance.

Rekha B S

I agree, it is quite subjective. But, PCA is a good suggestion

Majid Mohammadi

I have utilized an evolutionary optimization methodology to find the optimal number of features to be selected (the link attached). It is though for gene selection, but is also applicable to general feature selection.

Majid

Article Robust and stable gene selection via Maximum–Minimum Corrent...

Chukwuka Obi

I have a question...is using SelectKBest to select 50% of the features in a dataset an optimal way in feature selection? I tend to select my features using this method, but it is the right way of doing it?

Hamed Naseri

Hi,

If the computational time does not matter and you want to obtain the highest accuracy, one possible approach is to consider a loop, in which the features are added to the prediction model gradually (one by one) based on their importance weight. Subsequently, the optimal feature set can be considered a set leading to highest prediction accuracy.

Raoul G. C. Schönhof

You could perhaps take a look at autoencoders for advanced feature selection: Preprint Fractal Autoencoders for Feature Selection

Cheers, Raoul

Tim Würger

Hey Jean,

I usually plot the training and test error (as obtained in a cross validation) with respect to a decreasing number of features used to train the underlying model. Your training error should increase constantly with decreasing number of features. Your test error should decrease up to a certain point before it increases again. This identified minimum, the sweet spot, defines the optimal number of features.

I hope this helps!

Cheers,

TIm

Ferdib Al Islam

You can use RFECV.