Hi folks,

We are aiming to kick off a genome sequencing project on an anole (lizard) species, Anolis distichus (genome size somewhere between 1.8 and 2.5Gbp). We've been researching "best practices", and the general consensus I am coming across is a combined coverage (of shorter and long inserts) of about ~100X, having long insert mate-pairs (~10-20kb+), and using SOAP or ALLPATHS as an assembler seems to lead to a decent assembly. However, recently, we've come across DISCOVAR de novo and Platanus as potential assembler options as well.

Obviously, choice of assembler is going to affect our library prep, so I wanted to canvass the community to see if anyone had any updated thoughts on current best genome sequencing practices (most of the posts/genomes I found were initiated in 2014 or before).

Resources:

-- A 'relatively' inbred individual for sequencing, and high quality DNA extracted from it

-- One of its congeners, A. carolinensis has been sequenced

-- We'll have a transcriptome for the individual we sequence

-- $5-7k for whole genome sequencing costs

Our current feeling is aiming for 2*250bp reads of ~450bp insert sizes (Illumina) will allow us to use DISCOVAR denovo, and if along with the transcriptome, that doesn't give us a "pretty enough" assembly (we are looking for a very high quality draft as we are interested in specific chromosomal regions involved in divergence across the genus), we could then add in mate-pair (and potentially pacbio?) if we needed to, and try ALLPATHS.

Do people have strong thoughts on how they would attack the project with the same resources? We would love to hear from you if so. Full disclosure: also asking this question on seqanswers, so if I get any answers there that I think people here would like to hear I'll make sure to share.

Cheers!

Similar questions and discussions