Overfitting is a type of modeling error that results in the failure to predict future observations effectively or fit additional data in the existing model. It occurs when a function is too closely fit a limited set of data points and usually ends with more parameters than the data can accommodate. It is common for huge data sets to have some anomalies, so when this data is used for any kind of modeling, it can result in inaccuracies in the analysis.

Overfitting can be prevented by following a few methods namely-

  • Cross-validation: Where the initial training data is split into several mini-test sets and each mini-data set is used to tune the model.
  • Remove features: Remove irrelevant features manually from the algorithms and use feature selection heuristics to identify the important features
  • Regularisation: This involves various ways of making your model simpler so that there’s little room for error due to obscurity. Adding penalty parameters and pruning your decision tree are ways of doing that.
  • Ensembling: These are machine learning techniques for combining multiple separate predictions. The most popular methods of ensembling are bagging and boosting.
More Muhammad Imad's questions See All
Similar questions and discussions