Hello
I'm reading a paper and in the experimental procedures there is a part about variant filtrations. They used filter which exclude variants that were (1) labeled as VQLOW in all individuals, (2) clustered in >10% of individuals from either cohort, (3) variants missing in >10% of either cohort, (4) with median coverage 100 in either cohort, (5) in simple repeats, homopolymer repeats >6bp, segmental duplications, microsatellite repeats, or low-complexity repeats, (6) out of the Hardy-Weinberg Equilibrium and (7) in non-unique 36-mers.
I don't exactly understand criterion (2), (4) and (7). Why is it a problem when the variants are clustered? Why is a coverage of more than 100 a problem? And to what are these 36-mers referring? And has (5) something to do with the fact that it is difficult to map repeat regions?