Could you define "impute"? You use that word in many of your questions, and the meaning seems to differ from question to question. You may get more answers if people understand what you want to know. It is a good idea to be as specific as possible; in this question, for example, I haven't the foggiest idea what you may mean, no matter what possible synonym I try for "impute".
I mean replacing the missing values with proper substitutions!
Like for example, you can use the mean of a variable instead of the missing values in that variable. There are lots of other methods possible by commands in R, like irmi(), kNN(), .... . I need a way to compare these methods to see which one is the best for my data.
As one of the ways to compare these methods,I extracted the complete records of my data. I created missing values randomly in this data,and then imputed them by lots of methods, then for each of the variables I subtracted the real values from the imputed ones, and I plotted the box plot to get an idea which method gives better approximations of my data!
There is a problem to this approach though, and that is the missing values created by me here are completely at random, which in my real data that might not be the case!
I want to see if anyone can come up with a better idea so that I can figure out which method of imputation is the best for my data!
This thread will turn into a mess if it gets too long because tracking that first second will be problematic. So, it owuld be best if you could edit or delete the question and re-ask.
The imputed values are not completely at random. There has to be noise in your data. That part is random, you have to include that in the imputed values. But, the random value will be constrained by features of the model you apply, which comes back to your question. Those different imputation methods are different methods following different models of the data. They're not just different ways to get numbers, they express various things about what you think the data mean and how they can be modelled or represented more simply. Therefore, it's hard to quantify best in terms of measurement.
Perhaps you really need to ask the, "given research design X, missing values Y, and model M, what would be the best way to impute missing values?"