Has anyone compared Trinity with Abyss-trans for de novo assembly of the same data sets? I'd like to see the stats of both assemblies as well as any other assessment measures. Which algorithms deal with the isoforms and repeats issues better?
Hi Magdy, there is anothe article in BMC Bioinformatics comparing some methods to de novo transcriptome assembly: http://www.biomedcentral.com/1471-2105/12/S14/S2
For me Trinity is the better option. I used it to assemble transcriptome data of two different datasets: one of a fungus and another one of a plant.
There is an article in BMC Genomics where the authors compare Trinity and ABysSS-Trans for wheat Illumina reads assembly (http://www.biomedcentral.com/1471-2164/13/392).
Hi Magdy, there is anothe article in BMC Bioinformatics comparing some methods to de novo transcriptome assembly: http://www.biomedcentral.com/1471-2105/12/S14/S2
For me Trinity is the better option. I used it to assemble transcriptome data of two different datasets: one of a fungus and another one of a plant.
Don't know about Trinity by have tried out Abyss for a short time, it does the job. What I do like, I dont work for the company, is CLC Genome Workbench. It does all the assembly etc in a fraction of the time but alas its not not freeware. You do have the ability to customize your workflows and design your own species specific plugins using their own SDK.
Hi Magdy, I agree with the previous comments to read the BMC papers recommended. About Isoforms and repeated sequences it depends of your species. I personally work with species with several whole genome duplications events in their evolutionary history, so I'm interested in paralogues. For paralogues search non of them is very good and find paralogues is not easy. Also, depending of your species evolution history, you can have several duplications and repeated sequences that in general are difficult to resolve with de novo assemblies.
But,if it helps you, for me Trinity is a bit better than ABySS-Trans.
The authors came to the conclusion the Abyss-Trans performed not as good as Trinity or SOAPdenovo-trans and excluded it from further analyses. It might be worth to take a look on the paper. The study also suggests a mapping approach to identify gene models using a related species (divergence < 15%).
From my own experiences, Trinity works better than Abyss-Trans but this might be biased by the data sets I used.
Thanks everyone! i have used Abyss-trans to assemble two different transcriptomes, one is for a plant without a sequenced genome and one for an insect with a fully sequenced genome. In insect case, by comparing my assembly to the published assembly of another insect belong to the same genus, I found that abyss-trans assembly is really good. Same thing with the plant transcriptome. What I found interesting about Abyss-trans is that it merges the assemblies from different k-mers. I found this particularly good as I am convinced that there is no single magic k-mer (see the attachment).
Also, Abyss-trans performed well in detecting the isoforms as compared to published trascriptomes from closely related species in both cases.
I haven't use Trinity at all but I am going to. Will do the comparison on my species and will update you guys in case there is anything that isn't published already in those papers. Cheers everyone!
Each assembler will have some pro and cons, and results will depends and the tuning, as you stated Magdy. I would like also to point to velvet/Oases which is also famous. Why not make the assembly with the different assembler and then pool the results with CAP3? I saw several studies doing that.
I would second Olivier's idea. All models are wrong, and different models are wrong in different ways, so if you can make a consensus of models you can reasonably expect it to be better than any model individually.
I have tried Trinity for my Ion Torrent data, I think I get very good assemblies, but have not figured out how to compare them to others (how to find out how many contigs are in agreement), because what I can see with Trinity that it reports different splice variants or how one can call it. There are more contigs from out from the same region kind of - or from similar reads.
I also would like to know how you did this graph you attached, I would also like to make some more sense out of my assemblies from Trinity, so that I can see more than number of contigs and calculate the N50 and mean length.
I was thinking about doing something like that, but can one actually introduce quite much bias in the assemblies then if the assembler kinda choose to the contigs some bases - because sometimes it can be 50:50 for a base and then one would introduce more and more mistakes by fusing different assemblies? I guess then it quite much depends on the coverage...