I want to do a RNAseq analysis of Papilio.glaucus and now we have the sequences from 12 samples. I need a reference if I want to use Cufflinks/Tophat2. Should I merge my own transcriptome assemblies or use published genome data? How to evaluate a genome? I'm just starting and throwing out some basic questions I know...
The contig N50 of my own transcriptome assemblies are 969, 1009 and so forth.
Below are the info from published assemblies:
Papilio.polytes: scaffolds: 3,874 contigs: 14,375 N50: 47,768 L50: 1,129
Papilio.glaucus: scaffolds: 0 contigs: 92,145 N50: 12,225 L50: 8,335 (The species I'm working on)
Papilio.xuthus: scaffolds: 5,572 contigs: 10,777 N50: 128,246 L50: 528
Papilio.polyxenes densovirus: scaffolds: 1 contigs: 1 N50: 5,053 L50: 1
Thank you for any valuable suggestions!