For a small dataset which kind of feature selection technique should we use like genetic algorithm, PCA, LDA, rough sets, etc.?

More Dilip Kumar Choubey's questions See All

Why most oftenly we used sigmoid function in MLP NN by using BP algorithm and gaussian function in RBF NN?

I have read in so many research paper.

10 November 2016 2,178 10 View

Is it possible to assume that in Diabetes disease diagnosis (dataset: PIMA Indian diabetes dataset) will have 100% accuracy with ROC 0.83?

can any one clear it?

02 March 2015 9,377 7 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Basis set and input instruction to calculate HOMO and LUMO using Orca for Imidazolium based organic salts?

Hey there, As a synthetic chemist delving into theoretical calculations for my imidazolium-based organic molecules, I would greatly appreciate any guidance on the appropriate input instructions...

09 August 2024 5,444 7 View

Do you know best mines of western part of Afghanistan?

I want to know more about Mn deposits in west of Afghanistan.

07 August 2024 3,427 1 View

Is Galaxy.org good to use for research for analyzing data and for publication?

Hello all, I wanted to know, can I use galaxy (USA, Europe or Australia) platform for analyzing the shotgun data, and can it be used for publication purpose as well? Thanks :)

06 August 2024 6,610 4 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

Better ways to analyze the qualitative and quantitative data in a sequential explanatory mixed method approaches

04 August 2024 2,703 6 View

How can I interpret the data without the need of solving it manually?

How can I interpret the data gathered without solving?

03 August 2024 9,054 3 View

Why can't academics earn the money they deserve?

Only Journals make money from the articles we have worked on for years. Academics do not earn money from their refereeing. Then shouldn't the solution be a system in which academics can earn...

01 August 2024 6,469 6 View

Pedro J. García Laencina Popular answer

There is not a rule for choosing a feature selection algorithm: it depends on the dataset under study. There are filter and wrapper based approaches for performing feature selection. Given that wrapper procedures are bases in the obtained performances of the learner on each feature subspace, they gives better results than filter methods which are independent of the prediction model.

Note that PCA and LDA are feature transformation methods (not feature selection).

Pedro J. García Laencina

Guillaume Lemaître

No more to add than Pedro a part of the link to the wiki which give more detailed information on the methods themselves.

http://en.wikipedia.org/wiki/Feature_selection

Davide Roverso

Given that your dataset is small, then you will not have much penalty in using wrapper techniques, since each run would be computationally relatively cheap. As for what kind of search algorithm you should use in the space of the feature subsets there is no clear answer. Both forward selection and backward elimination will work ok. If the number of features is quite large, then GAs might be better at finding the global optimal subset. In any case, I would base the evaluation function (scoring metric for a specific feature subset) on some form of cross validation.

Dilip Kumar Choubey

Thank you all for giving the answer.

Min Shi

I think matlab can be integrated with some other languages like c,c++,java etc. I also recommend libsvm. It is a good lib.

Stefano Cassani

On a small dataset, I would suggest to use all-subset procedure. You can find it in QSARINS software (http://www.qsar.it/)