For one of my use case, i am building a credit approval model.
For one of the numeric variable i have missing values. I am using a linear regression model to replace these missing values rather than a mean as it could reduce the variability in the model.
While reading one of the research article it was suggested to use the most correlated variable as independent variable with the variable for which i am trying to impute missing values in regression.
I need to understand what is the logic for this as the research paper did not had enough literature around it.
Similarly, for all the numeric variable it was suggested to normalize them using a z-score to measure them in SD than in absolute numbers. What and why do i need to normalize my variable. For once i know there could be multicollinearity.
request expert advice on both the above questions.
Thank you. Shivi