I feel like everyone I've talked to who is an expert on the subject is of the opinion that SNP imputation adds power to genome-wide association, and that there is no down side. More data points = more powerful linear regression. Sometimes I see unrealistic results though that seem to be caused by including imputed genotypes. I should note that I am working in a non-model organism with no reference genome, and it is very unlikely that I have enough SNPs to cover every block of LD. I used rrBLUP to impute missing genotypes based on relatedness. The attached image is one example of something crazy going on. In a large, multi-location, multi-trait study (this is a crop plant), there were a few heterozygotes and homozygotes for the minor allele at the SNP that is pictured. However, for the site*trait presented in the figure, all individuals with non-missing data were homozygotes for the major allele. Some individuals with missing genotype data at this SNP were scored by rrBLUP as having a slight chance of having the minor allele, so their genotype is coded as ~1.9. One of these individuals also happened to have the highest value in the study for the trait. As you can see from the red line, the predicted allelic effect is utterly unrealistic.
Is imputation my problem? Or perhaps should I be re-filtering for minor allele frequency for each site*trait combination?
Edit: The SNPs come from RAD-seq and thus there is a lot of randomly distributed missing data, as well as some heterozygotes mis-called as homozygotes. If you are going to point me to an article focusing on imputation in the human HapMap collection, which is a rather different situation, please explain how the article answers my question!