In univariate analysis, it is advised to perform statistical imputation when there are missing data. What are the expected repercussion on the analysis?
It depends if the data is missing at random and are not related to each other. If the data are unrelated, estimated values will be unbiased with no loss of power. There are two methods of performing imputation:
1. Multiple Imputation (MI) fills in estimates for the missing data, but to capture the uncertainty in those estimates, MI estimates the values multiple times. The result is multiple data sets with identical values for all of the non-missing values and slightly different values for the imputed values in each data set.
2. The second method is to analyze the full, incomplete data set using maximum likelihood estimation. This method does not actually impute any data, but rather uses available data to compute maximum likelihood estimates. The maximum likelihood estimate of a parameter is the value of the parameter that is most likely to have resulted in the observed data.
There are several 'ways' data could be missing, but essentially, if a missing datum is from an "ignorable nonresponse," then you might basically assume the mean of the collected data could be used, but it does not count toward your variance estimate. If a missing datum is a "nonignorable" nonresponse, then it is not to be considered to be generated by the same mechanism as the collected data, and use of the mean of collected data would bias results.
If you have related auxiliary data, then they might be used for regression to "predict" missing data.
You might research "response propensity" groups.
If you can stratify data by like characteristics, you might reduce bias for nonignorable nonresponse.
The idea is that there may be a reason(s) for nonresponding, which tends to make responses, say for continuous data, larger (or smaller) than the nonresponses would be if possible to reliably collect.
Cheers - Jim
PS - So the consequences you asked about are variance and bias considerations, which depend upon the type of nonresponse, and the imputation procedure, of which there are several.
If you use a required package (e.g. Solas) then any other risk will be "empirical" or somewhat more so if you had used "Bayesian imputation" but then an imputation is not "real", it is just the most theoretical and appropriate value that estimates the missing value. If we carry-out a posterior survey (i.e. if the missingness is in a sampling experiment), we may find out that the actual value is even an extreme value (i.e. an outlier) which will could be why it is missing in the first place.
I also suggest a dummy variable (yes, no) if you decide to impute a value. Include the dummy in your model. If it is non-significant, then you know that imputing did not have an important effect on your results.
It is ABSOLUTELY critical to know if your data are missing not at random, a very hard lesson I learned early in my career. I recommend the Analysis of Messy Data series by Milliken and Johnson. Volume 1 (Designed Experiments), Volume 2 (Nonreplicated Experiments), Volume 3 (Analysis of Covariance) They seem to be available used, new, and E-Books..