Residual analysis is a good way to validate the performance of a regression model. A residual plot can visualize the influence of other variables to the regression accuracy. For example, if the residuals are near to 0 on the left part, and far from 0 on the right part of the residual plot, you could discover that, the model works very well in predicting smaller values, but it always wrongly predicts bigger values. Then, actions (e.g. data transformation) could be taken to improve the regression model.
The following link is a good source for your understanding:
The analysis of residuals plays an important role in validating the regression model. Using residual plots, you can assess whether the observed error (residuals) is consistent with stochastic error. You shouldn’t be able to predict the error for any given observation. And, for a series of observations, you can determine whether the residuals are consistent with random error.
I know residual analysis must for all statistical models like regression. But Machine Learning approach not belong to them. It is totally different from normal regression assumptions. We also cross validate the results. So is it necessary to check residuals?
Pankaj Das I think residual analysis is still helpful when using Machine Learning approaches (e.g random forest, deep learning, etc) for a regression task. Regardless of the types of approaches that are used, residual analysis is still informative for recommending actions that could be taken to improve the regression performance.
Residuals indicate the quality of a model. If the expected value of residuals is not close to 0, it implies that the model is systematically biased toward either over-prediction or under-prediction. Besides, if residuals contain a pattern, it seems that the model is failing to explain some relations within the data and, thus, qualitatively inconsistent. It is also worth to check, whether residuals are distributed normally and homoscedastic (their variance does not change over time). If residuals are heteroscedastic, this means that the predictive power of the model is different for different sections of the data and, perhaps, it is worth thinking about dividing the dataset into two (or more) subsets in order to train two (or more) models, such that each model is specializing on the corresponding subset.