What is the disadvantage of using Information Gain for feature selection?

More Mahdieh Zabihimayvan's questions See All

Who needs a remote professional academic writer in molecular genetics?

Sincere greetings My academic English writing is excellent. I desired to join several active groups within my field. My concentration is in molecular genetics. But I was unable to locate any. I...

20 October 2023 7,629 2 View

How to analyze qualitative data on Instagram?

Hi everybody! I intend to conduct qualitative research (content analysis using MAXQDA) on identifying factors affecting the success of sales pages on Instagram. The goal is to identify the common...

23 August 2023 255 4 View

I have a special multiwavelets and I want to compute wavelet low pass filter matrix of dimension 100, what should I do?

I have many scaling functions and many mother wavelets. By using of them, I want to compute low pass and high pass filter matrix of dimension 100.

13 December 2022 9,362 1 View

How to wash off gold NPs (75 nm, PVP-stabilized) from MIPs? Do you have any experience with it? Or idea?

The MIPs based on acrylate polymer ( MAA. TRIM) are used to imprint Au-NPs but removing the particles from the polymer is a big trouble now. I tried ( SDS1%+AcOH 10% and SDS1% +NaOH 0.05 for 1-2...

16 November 2022 4,312 2 View

Is it possible that calculate the Z-score and P-value for each of the subgroups by Stata 16 software?

Dear colleagues, we are working on an extensive meta-analysis and are trying to do subgroup analysis via Stata 16. When we do subgroup in stata16 by this command: . meta summarize, random eform...

07 September 2021 1,472 2 View

Can a 35-year-old researcher (from Iran) with 4 ISI articles (IF 2, 2, 1, and 0.3) and IELTS score 6.5 find a job as an Assistant Professor?

Hello every one, Can a 35-year-old researcher (from Iran) with 4 ISI articles (IF 2, 2, 1, and 0.3) and IELTS score 6.5 find a job as a postdocs fellow or an Assistant Professor? The field of...

01 October 2020 6,274 3 View

Is there a software that can compare a new 10-mer peptide with the surface antigens of a bacterium?

Dear friends, I want to design a short length peptide (eg. 10aa) for coating at the ELISA wells. Then I want to use the serum of patients as Ab1 then the HRP-labeled anti-human IgG as Ab2. I don’t...

15 August 2020 9,467 2 View

Lower bound of the second smallest eigenvalue of weighted Laplacian matrices?

Hi All, I am looking for a method that provides a lower bound for the second smallest eigenvalue (the Fiedler eigenvalue) of a weighted Laplacian of an undirected and connected graph. I have seen...

10 May 2020 6,402 6 View

Weird and inconsistent result of sheared chromatin on Bioanalyzer (Chip-Seq)?

Hi, I have done sonication on my formaldehyde-fixed samples, using covaris kit. Then I took a bit of sheared chromatin to test the shearing efficiency. To reverse the cross-link I add RNAseA-...

16 January 2020 2,974 2 View

In Normality Test we shoud use items or variables (that are created by using compute command from items)?

20 December 2019 1,270 7 View

How to determine the position of occupancy of the dopant? - whether it is doped in tetrahedral or octahedral site?

Suppose a material "A" has both tetrahedral and octahedral sites and we are doping another material "B" - usually an ion into it. How can we detect if the dopant has occupied the octahedral site...

17 July 2024 4,299 4 View

Neural Connectivity and Information Transfer?

Information transfer by the brain is dependent on the connectivity between neurons as mediated by learning (Hebb 1949). This connectivity is limited by the biophysical properties of neurons...

16 July 2024 4,249 0 View

How are Shannon Wiener values used in the Wilcoxon rank sum test?

Is it very literally subbing in shannon wiener index values instead of species abundances?

16 June 2024 4,533 5 View

Information Value: Is it really used in the industry when dealing with tabular data for supervised learning ?

Is Information Value really used in the industry when we want to deal with tabular data for any supervised learning task ?

08 June 2024 6,846 3 View

What are possible sample selection biases in Logit model estimation?

I randomly interviewed 250 poor people and 250 non-poor people. Considering 1 for poor and 0 otherwise, does estimating a logit model aiming to capture the probability of becoming poor make sense?...

02 June 2024 2,326 2 View

Why do I get no melt curve peak and no Ct for some of my patients but not others, in a SYBR green qPCR?

I am doing a RT-qPCR for gene expression analysis of cancer patients, and there are four different groups in the results. The first group that shows over expression of my target gene (both two...

31 May 2024 1,088 0 View

Can you provide me the Contact Information of Researcher?

I am writing to request assistance in obtaining the contact information of a researcher whose profile is listed on ResearchGate. The researcher in question is Abhinav Gupta is waiting for your...

27 May 2024 6,163 3 View

How to calculate the angular velocity of a rotating cell from a microbe video?

I have a video of Brownian motion of microbes under a microscope. From this video, I need to calculate the angular velocity of a particular prominent cell. From my understanding, what I know is...

18 May 2024 4,289 3 View

During variable selection to enter into the multivariable analysis, is that a must to run a bivariable analysis when the number of variables are < 10?

I want to select variables to enter into the multivariable analysis by doing a bivariable analysis but the number of variables are limited,so is there any ground rule in such condition?

13 May 2024 7,816 1 View

Are you interested in submitting papers to a new journal called "InfoScience Trends"?

InfoScience Trends offers comprehensive scientific content that delves into various facets of information science research. This includes but is not limited to topics such as information...

26 April 2024 9,521 1 View

Gulzar Shah

The primary purpose of the Information Gain is to determine the relevance of an attribute and thus its order in the decision-tree. An attributes (variable) with many distinct values, the information gain fails to accurately discriminate among the attributes.

Mahdieh Zabihimayvan

Dear Gulzar Shah ,

Thanks for your answer. So, it cannot work well for features with discrete values, right?

Reza Sadeghi

No. It does not work good for attributes with large number of distinct values (over fitting issue) [1]. To solve this issue the information gain ratio has been proposed. However it has its own problem as well[2]. For a comprehensive view about feature selection methods, their advantages, and disadvantages please refer to [3].

Best regards,

Reference:

[1].https://www.slideshare.net/marinasantini1/lecture-4-decision-trees-2-entropy-information-gain-gain-ratio-55241087

[2].https://www.quora.com/What-are-the-disadvantages-of-using-Information-Gain-Ratio-as-a-metric-for-building-decision-trees

[3].https://pdfs.semanticscholar.org/310e/a531640728702fce6c743c1dd680a23d2ef4.pdf

Huy Ryze

why Information gain is biased toward choosing features with a large number of values. I can't prove

Please note that information gain (IG) is biased toward variables with large number of distinct values not variables that have observations with large values. Before describing the reason of this condition, lets review the definition of IG.

Information gain is the amount of information that's gained by knowing the value of the attribute, which is the entropy of the distribution before the split minus the entropy of the distribution after it. The largest information gain is equivalent to the smallest entropy.

In other words, a variable with the highest number of distinct values probability can divide data to smaller chunks. Also, we know that lower number of observations in each chunk reduces probability of variation occurrence.

Using ID variable in splitting data is a common example for this issue. Since each individual sample has their own distinct value, selecting ID features leads to many clusters with one sample and entropy of zero. Therefore, a decision tree that works with IG, selects the ID as the first separator attribute. Indeed, entropy will approach to zero by selecting the ID feature. However, we are not interested to such a feature. We are more interested to features that highly explain the variation of dependent variable.

Suryanarayana Goddumarri

What is the disadvantage of using Information Gain for feature selection?

-->Natural bias of information gain: it favours attributes with many possible values.

-->Consider the attribute Date in the PlayTennis example.

-->Date would have the highest information gain since it perfectly separates the training data.

--> It would be selected at the root resulting in a very broad tree

--> Very good on the training, this tree would perform poorly in predicting unknown instances. Overfitting.