Which is the "best" assembler for transcriptomes when no genome is available?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

How to confirm the site-directed mutagenesis result without performing NGS?

I'm cloning a fragment of 3200 nts into plasmid. The cloning was successful, however, 02 amino acids were mutated. Now I want to fix these 02 aa by site-directed mutagenesis technique using...

08 August 2024 4,645 2 View

Why did the authors extrapolate a phenotype that they experimentally proved in one bacterial strain across the whole genus of the organism?

I aim to be as skeptical as possible regarding whether a pair of orthologous genes results in the same phenotype in their different but related bacterial organisms under similar environmental...

05 August 2024 6,787 4 View

Why my colony PCR results of my recombinant bacterial not showing any results?

I am performing ligation of the plasmid and a target gene. The steps I have taken are: 1. Double digestion of the plasmid and target gene 2. Ligation of the plasmid with the target gene 3....

05 August 2024 2,570 3 View

Does anyone have issues using Prepman Ultra reagent for MicroSeq ID bacterial, fungal and yeast sample preparation?

I have been attempting to extract DNA from Bacterial, Fungal and Yeast banked samples (>1e7 cells) using Prepman Ultra reagent and I seem to be struggling to obtain a sequence. Although the...

01 August 2024 2,079 0 View

What is the acceptable p-value cutoff for GO enrichment analysis ?

I have an RNA-seq data that I have analysed using Limma-voom and have extracted the gene IDs, log2FC and the p-values. At p value < 0.05, I have over 10,000 DEGs, however, when I run the GO...

31 July 2024 225 2 View

Inquiry on Maximum Nucleic Acid Volume for 2.5 mL Liposome Solution?

I am currently working on a project involving liposomes and need to determine the maximum volume of siRNA that can be added to a 2.5 mL liposome solution with a total lipid concentration of 10...

30 July 2024 6,420 1 View

Why cannot i find my protein on cell surface after antibiotic selection of expressing plasmid?

I cannot confirm cell surface protein expression by flow cytometry even after transfection and antibiotic selection of cells. Does it take long for proteins to express on the cell surface? the...

28 July 2024 3,178 3 View

Are these cassettes suitable for expressing PETase mutant in E. coli?

I created two potential gene expression cassettes (constitutive and inducible) for expression of a mutant PETase gene on PeptiCloud using the version tree feature, which allows users to create...

28 July 2024 7,559 1 View

Should the amount of DNA input used for ChIP-seq library preparation be matched between the control and experimental groups?

Hi all. As a beginner in ChIP-seq experiments, I hope you understand that the following questions might be somewhat basic. I am planning to perform ChIP-seq or MeDIP-seq analysis to investigate...

28 July 2024 6,938 1 View

Olivier Armant Popular answer

You can also try Oases/velvet (from EBI). If I am not mistaking, I think you can mix long reads and small reads for the assembly´with Oases.

http://www.ebi.ac.uk/~zerbino/oases/

Vladimir Mashanov

In our case, MIRA (http://www.chevreux.org/projects_mira.html) did a pretty decent job (454 reads).

Octavio Martínez

Thank you Vladimir, in oír case we will be using both, a previous assembly of 454 reads and new reads of Illumina (paired). We will try MIRA and post the results.

Luca Cozzuto

My personal experience is that Newbler (and not Trinity) works really good for 454 reads while Trinity does a good job with illumina ones. Mixing the reads does not improve the results with Trinity.

Luke Iwanowicz

While pricey and only moderately tweakable, we have had really good luck with CLC Genomics Workbench. It has worked well assembling Illumina, 454 and Ion Torrent sequence data as discrete or hybrid assemblies. We have compared assembly results to that of other assemblers and it is as good as MIRA and better than others. It is also VERY fast. Cost is the biggest drawback.

Olivier Armant

Li-Jun Ma

We have been using trinity and are pretty happy with it.

Http://trinityrnaseq.sourceforge.net/

For hybrid assembly (454 & Illumina single-end), we assemble 454 with Mira, and Illumina reads with Velvet/Oases. We perform several runs of Velvet/Oases for a range of k-mer values and introduce cleaned 454 reads into each run to improve the assembly. We then merge the 454 MIRA assembly and all the Velvet/Oases assemblies (CAP3) into a unified super-assembly.

Thank you very much to all the colleges that answered; we will try different strategies and then post a brief summary. Our first tries will be in an iMac with 3.4 GHz Intel Core i7 and 16 GB 1333 MHz DDR3; if not enough, will use Langebio's cluster. Anybody using similar hardware (iMac)? I have problems to run Newbler…

Jonathan David Moore

I think it will depend on the tools chosen, and the amount of sequence, but my guess is that RAM will be most likely limiting factor, and 16GB may be too small, depending on the depth.

Paul A McGettigan

An important consideration for denovo assembly is pre-filtering of the reads before attempting an assembly.

Reads containing errors can be identified as they tend to be singletons. Duplicate reads can also be eliminated as they don't add to the information for the assembler.

The khmer tool and digital normalization approach developed by C. Titus Brown perform these tasks very well.

Links for paper and code here:

http://ged.msu.edu/papers/2012-diginorm/

Applying these pre-filters helps to limit memory usage as well as reducing the redundancy in the final transcriptome assembly.

Btw, MIRA comes with the miramem tool, which roughly estimates the amount of RAM needed for assembly. You will just have to answer a few questions (type of reads, their number, etc.)

Jeffrey J Werner

I second the recommendation of Olivier Armant for Oases/Velvet, but I also have heard that CLC Bio's assembly pipeline is REALLY easy and works pretty well.

Any of these assemblers, though, will require lots of memory. If this is a huge data set (e.g., and Illumina run), you can significantly reduce the memory requirements, for any of these assemblers, as well, using the "digital normalization" technique via a Python software package called khmer. There's a tutorial here:

http://ged.msu.edu/angus/diginorm-2012/tutorial.html

and a paper discussing the theory and practical results of this diginorm pre-processing step here:

http://arxiv.org/abs/1203.4802

We've had some huge successes using this pre-processing step prior to assembling several Illumina HiSeq lanes combined into one data set (~300-400 million reads, 100 bp each) both for metagenomics and transcriptomics.

Ewald Grosse-Wilde

To add my two cents... your hardware makes things really tricky. Try to get access to a computer with more RAM. That of course depends on the amounts reads/technology used, but still. We have run a dozen or so independent transcriptomes here in the last few years, mostly Illumina data. Oases/Velvet is pretty good, so is Trinity. Our main stay is CLC, especially since version 5.5 - usually yields the best results, is fastest and uses the least resources as well. If I remember correctly there is a trial version of it available, so if you only have this dataset you can give it a shot without buying the (expensive) software. But the best advice ? Try several assemblers. There is no "one solution for all" software, at least not yet. Assemble with various programs, using several options. Then assess the quality using length distribution plots and by randomly checking a few dozen contigs, both by Blast and by looking at the distribution of reads within. If you use this approach, it will cost a few days/weeks of time, but you will safe tremendously downstream.

Rommel Ramos

For prokaryote genomes we have used Velvet/OASES, it is really good! How Paul McGettigan says, apply a filter (quality filter and trimme the sequeces) is very important. For Roche 454 and Ion Torrent to apply the quality filter I recommend https://sourceforge.net/projects/qualevaluato/files/Quality%20Long%20Reads/