Dear ResearchGate community, I need your input.
I am currently working on a linkage mapping project using genotype by sequencing SNPs. I have 3 families from 2 species (2 for species X, 1 for species Y). The two species are sister species and recently (~2-4 kya) diverged. Families are comprised of 2 parents (male and female) and 150-300 F1 offspring per family. The study species are two australian rainforest frogs.
I have ~10,000 possibly informative SNPs per family to start with, with a 10% overlap between any two families ( i know, very low). After quality check (Call Rate, Reproducibility, Read Counts, MAF, etc) and MI check I am left with 3-5 thousands snps per family that match mendelian expectations and are of good quality in terms of the above metrics.
I then remove all ABxAB snps and obtain 2-4 snps per family.*
When not including ABxAB SNPs, grouping within family finds the expected number of linkage groups (26) with three independent softwares (Joinmap v5, Carthagene, Tmap) with a 100% match across softwares.
Unfortunately, when I compare grouping between families, something weird happens. Markers belonging to 1 linkage group in family A, are evenly split into 2 linkage groups in family B, and viceversa. This is true for all linkage groups identified and is true whether I compare the two families from the same species or across species.
What could be causing this pattern? Any ideas and/or suggestion on how to untangle this issue? Note, I have checked the dataset inputted into the linkage mapping softwares against the raw dataset and apart for genotypes which I explicitly corrected (i.e. silenced if not well supported) the datasets match, so I am confident it is not an issue of data manipulation, also because I automated most of the process from the raw data to the linkage map encoded data.
*NOTE: i removed the ABxAB snps because when I include them from the start all linkage softwares I used find only 1 linkage group at incredibly high LOD thresholds