Information Value is NOT a Widely Used Metric in Industry
While IV is a popular concept for feature selection in academic settings and some data science competitions, it's not commonly used in industry for a few key reasons:
Oversimplification: IV assumes a linear relationship between features and the target variable. This can be too restrictive for complex, real-world datasets.
Limited Scope: IV only measures the predictive power of individual features in isolation, neglecting potential interactions between features.
Overfitting Risk: IV can lead to overfitting if not used carefully. Features with high IV might not generalize well to unseen data.
Lack of Interpretability: While IV gives a score, it doesn't offer insights into how the feature affects the target variable. This can hinder understanding and model explainability.
What Industry Uses Instead:
Feature Importance: Most industry practices rely on techniques like: Random Forest Feature Importance: This measures the decrease in model performance when a feature is removed. Gradient Boosting Feature Importance: Similar to Random Forest but based on the contribution of each feature to the gradient boosting process. Lasso/Ridge Regression Coefficients: The absolute value of coefficients in regularized linear models can indicate feature importance.
Feature Engineering: Instead of just relying on IV, industry focuses on creating features that capture more complex relationships and interactions.
Data Exploration & Visualization: Industry professionals heavily use data visualization and domain expertise to understand the data and identify key features.
Model Validation & Evaluation: Emphasis is placed on evaluating model performance on unseen data using techniques like cross-validation and metrics like AUC, precision, recall, and F1-score.
The Takeaway:
While IV can be a helpful tool for initial feature screening, it's not a definitive solution for real-world data science. Industry relies on a combination of more robust feature selection techniques, feature engineering, and thorough model evaluation to make informed decisions about which features to include in their models.