Hi everyone,
I'm working on RAD-seq pooled data with several species, but only one pool/species ! (RAD data are just meant to be used for SNP calling as another genotyping procedure will be pursued afterwards).
Though we tried to make pools that are representative of the geographic variability of our species (we do not have any prior information about genetic spatial structuration), we could not be as exhaustive as we hoped and only have ~15 individual per pool (i.e for a species). Also, my species do not have a reference genome.
Therefore I expect to face quite a struggle to disentangle sequencing errors from actual low covered alleles when processing my data, and to get a quite high ascertainment bias.
In order to minimise, as much as possible, those issues, I plan to create a pre-catalogue using Stacks and them realign my reads against this pre-catalogue, using BWA or GATK (as Stacks is not well-adapted to RAD on pools). I have heard about PyRAD but apparently this is not as fluent as Stacks, even for pooled data. I will also need to be very precautious for SNP calling criteria.
Now my question is : what software would you recommand for SNP calling ? Mainly, I was thinking about using Snape software from Raineri et al. 2012 (https://www.researchgate.net/publication/230884099_SNP_calling_by_sequencing_pooled_sample) but I am totally new in this regard, so I would be very interested if any of you have already used it and could give me piece of advice :)
- How do you fix your a priori information ?
- Did it actually work best than other softwares as VarScan or SAMtools ? Fracassetti et al (2015) (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0140462) found a better relevance of Snape but were working on an Aradidopsis Sp while I have very little information about the genome of my species.
Thank you very much in advance for your answer :) Have a very nice day !
Chrys
Article SNP calling by sequencing pooled sample