01 January 1970 0 449 Report

Hello everyone!

My concern is related to the bioinformatic analysis of GBS or RAD-seq data derived from the F2 progeny of 2 contrast (in their phenotypes) inbred plant lines.

The most important things here are that we should account for, first, that the sequencing data is derived from the non-natural population (opposite to GWAS case). Thus we expect a high amount of the heterozygous calls (~50%) since the genotype segregation should be 1:2:1. Second, the GBS/RAD-seq usually are low coverage techniques of about 5x. Thus the calling for heterozygous positions could be quite problematic and we can observe overestimation (like in GATK) or underestimation of het calls.

So here I want you to share your experience or considerations on how to facilitate the obtaining of reliable genotypes that reflect the actual ones. Here I see at least three points to discuss:

  • How data should be pre-processed for such kind of analysis? What are the specific tools + parameters should be applied for data preprocessing i.e read filtering and alignment?
  • What SNP callers could be used for such a specific task? What parameters should be set in most common callers (i.e GATK, samtools, Freebayes, etc.) to avoid false-positive genotypes in F2 population?
  • What software + parameters should be (if it should at all) used to perform imputations in order to restore a part of genotypes?
  • Please share your experience if you faced such analysis (QTL mapping supported GBS/RAD sequencing data) or share your consideration if you work close to SNP calling and bioinformatics analysis of genotype data.

    More Rim Gubaev's questions See All
    Similar questions and discussions