Hi researchers,

I need your help as I am not sure if my thoughts are correct regarding the usage of a repeat-masked genome as reference for read mapping and SNP calling. I am not familiar with analyses on a whole genome level and I was wondering if you have good reasons why or why not using a (hard-)masked genome? It will be highly time consuming running it with both genomes.

I thought using a masked genome would reduce the computational power. I map short reads to detect SNPs for Population Genomcis Study (detecting population structure). I have many reads per sample and a 2.4Gbp genome. If I am interested in SNPs does it matter if I cover the repetitive regions? Does is have any effect on the mapping quality? If reads would map to masked regions, but instead map incorrectly to another region, can I filter them out by mapping quality?

Alternatively, I thought about using the non-masked genome but removing the scaffolds that are only repeats (or with other words would be 100% masked in the masked genome).

I appreciate your feedback. What are your thoughts? What would be accepted by Journals?

Have a great evening. I am looking forward to you ideas and arguments.

Julia

More Julia Canitz's questions See All
Similar questions and discussions