I understand that including related individuals can cause over- or under-estimation of risk prediction. What could I read to enhance my understanding of why this occurs?
This is because related individuals are also genetically related. For example, siblings from same biological parents are about 50% genetically related. If you have a lot of related individuals in your data, then they become the sub-populations in your data. They can drive your prediction and cause bias.
Hi Jenna, adding to what Wen wrote - consider any causal SNP that we know in advance is associated with any particular trait. One of the problems of considering related individuals is that you might end up deriving spurious associations because the causal SNP might correlate with many other SNPs in that particular subgroup, that have absolutely nothing to do with the trait.
Hi Jenna, I agree with Wang, number plays very important role in association studies, considering this; genetically related individuals should be avoided if cohort is small as it will be a biased study, if cohort is very big then it wont affect much but again according to statistics it is wrong as one of the important parameter of cohort studies is distribution. Also hoping SNP validation from your point of vies is genetic validation as genetic association is based on text SNPs and it has nothing to do with mechanisms unless it is singlet SNP