Does anybody have experience finishing bacterial genomes?

More Eduardo Zavala's questions See All

What type of data is this?

I have some mass spec data (please see attached file) but am not sure about the formatting of the data in the "mass spec" column. I have looked up examples of ESI and EI spec data and it does not...

31 December 2017 1,189 6 View

Interpreting phylogenetic tree, how does horizontal genome transfer look like?

I finally got a decent-looking tree using back-translated protein sequences to align the nucleotides of bacterial genome regions that span (12-20)kb. I used NJ algorithm and Jukes-Cantor...

05 June 2016 1,531 3 View

Which phylogenetic approach would be the best for comparing (13-20)kb regions of DNA?

I am working to characterize very specific region between two bacterial species that spans from 13kb to 20kb in size, and each contains from 8 to 13 genes. I plan to use the protein sequences to...

04 May 2016 1,994 9 View

Why is there different alignment results when using .gb or fasta files?

I am using Mauve to align 29 contigs to a bacterial reference genome, but I get different alignments depending on which type of file I use even though its from the same reference genome. I do not...

31 December 2015 3,715 0 View

GC-MS retention index prediticon?

Hello experts, Does anyone know any free software about retention index prediction ?

08 August 2024 7,403 2 View

Is there any way to quantify bacterial and fungal cells in their mixed culture?

I am working in fungal fermentation of soybean meal and there is bacterial growth in them at times. I am trying to quantify fungal cell counts and bacterial cells; but I haven't been able to do at...

07 August 2024 7,535 4 View

Why did the authors extrapolate a phenotype that they experimentally proved in one bacterial strain across the whole genus of the organism?

I aim to be as skeptical as possible regarding whether a pair of orthologous genes results in the same phenotype in their different but related bacterial organisms under similar environmental...

05 August 2024 6,787 4 View

Why my colony PCR results of my recombinant bacterial not showing any results?

I am performing ligation of the plasmid and a target gene. The steps I have taken are: 1. Double digestion of the plasmid and target gene 2. Ligation of the plasmid with the target gene 3....

05 August 2024 2,570 3 View

Who of all the Global Scientific community will help me Prof. Dr. Yoshida make way for TPEOM, MEC ~EMC to return the atmospheric gases to the norma ?

TEP presentation caption (The Environmental Project) Re: Why should Washington’s DC, or any country government point of location think of as nowadays of as to being 'tomorrow as to come! if it...

03 August 2024 2,484 1 View

Can anyone provide me with molecular docking softwares/ websites?

Molecular docking software/ websites?

02 August 2024 8,704 7 View

How is the bacterial genome's high protein count verified as genuine despite 800+ contigs and good metrics (98.55%completeness, 0.68% contamination)?

Given that the bacterial genome has over 800 contigs, but its quality metrics are good, with a completeness of 98.55% and a contamination of 0.68% as assessed by CheckM, what specific validation...

01 August 2024 1,514 1 View

What is the acceptable p-value cutoff for GO enrichment analysis ?

I have an RNA-seq data that I have analysed using Limma-voom and have extracted the gene IDs, log2FC and the p-values. At p value < 0.05, I have over 10,000 DEGs, however, when I run the GO...

31 July 2024 225 2 View

Inquiry on Maximum Nucleic Acid Volume for 2.5 mL Liposome Solution?

I am currently working on a project involving liposomes and need to determine the maximum volume of siRNA that can be added to a 2.5 mL liposome solution with a total lipid concentration of 10...

30 July 2024 6,420 1 View

Seeking Software Recommendations for SELEX NGS Data Analysis?

I am looking for software to help analyze SELEX NGS data, including alignment, sequence enrichment, and other related tasks. Can anyone recommend suitable tools or software? Best wishes, Waleed

30 July 2024 1,061 5 View

Katleen Vranckx Popular answer

How have you sequenced your genome, Illumina, PacBio or other? What is the length of your reads and your coverage?

You cannot expect to get a (reliably) closed genome with short reads. No assembly algorithm can resolve repeat regions and/or insert regions with short reads. If you are using Illumina, you can try to also sequence with PacBio, which produces longer reads and will give you a higher chance of spanning low complexity regions in one read. You can also try to create a whole genome map (Opgen) to map your reads against. If you are close to closing the genome with your current strategy (only a few contigs), you can try to design primers against the edges of the contigs and use good old sanger sequencing to extend the edges. However, if you still have a lot of contigs, this is not really feasable.

Katleen Vranckx

Eduardo Zavala

Before I came on the project, the genome was sequenced using a combination of Illumina and 454 Roche. The output was combined using Seqman pro and produced 29 scaffolds. From my understanding so far, the contig length ranges from ~800bp to ~350kb and a coverage average of about 3x. I have aligned the contigs to a reference genome and it does show a lot of gaps. My confusion is that a lot of the large contigs only partially align to the reference, and two contigs do not align at all. I really appreciate your response btw, maybe just telling somebody about this problem will help me figure it out.

Brian Thomas Foley

Three-fold coverage sounds far too low to expect to get closure. Are you sure that is right? I think it is more typical to have 100-fold or better coverage before attempting to get a complete bacterial genome. Depending on what species of bacteria, it may or may not be reasonable to expect your isolate to have a large amount of synteny (same gene order) over large regions with any given isolate from the same species.

Steven Robbins

Brian is right. 3x average genome coverage is too low to get full closer. All the same, how closely related is this reference organism to yours? If the same strain, these gaps and unaligned contigs are troubling. If only the same genus...meh, might be real, or might be misassembly or user error...to me, it's always better to start from the beginning. Redo the assembly. That way you know how all the data was processed.

Also, how sure are you that you sequenced a pure strain? Maybe you have some contigs from another organism and that's why a few don't align? Or maybe your organism has a plasmid the reference doesn't have? What are the genes on those contigs--plasmid genes? Or, can you classify those contigs to see if they're from the same organism?

For gap filling you could use Abyss-sealer to close gaps and FinishM to try and join contigs together (https://github.com/wwood/finishm). However, FinishM will change your contig names. Both programs try to take the reads and walk from one end of a gap to the other.

Steven and Brian, both of you mentioned some good ideas,and things I have not thought of. I did suspect that 3x is low in comparison to other data sets that I have seen. As far as the purity of the organism, I do not think its likely since more than 90% of the contigs aligned to the reference. The contigs are from a pathogenic strain of S. dysgalactiae, and are being aligned to a couple references found in NCBI. Which brings another question, there are distinct differences in the alignment depending on the which reference I use. I have been focusing on the reference with the closest alignment, but I have also aligned the contigs against the two references at the same time, and I have been tempted to design primers on that but am not sure how that would work out.

Steve, I have thought that it would be much better to understand the data if I started over, but this is an undergraduate project am choosing to do on the side and I don't have access to the equipment yet. However I will learn more about Abyss-sealer and other program you mentioned.

Do you known of a website or program to see if the large contig that doesn't align may contain pathogenically important genes?

Ah, I see. Undergrad project is a different story. As for the purity of the dataset, who knows. It is more common than people think to have microbial contamination in reagents and especially in cultures. Multiple strains are often seen in metagenomic datasets...it's an interesting topic for assembly, as similar DNA sequences with small differences will confuse the crap out of assemblers. This is often where abyss-sealer is useful. The assembler may have died because there were too many possibilities in extending the contig, so it chooses to die instead of create a chimera. What abyss-sealer does, depending on the parameters you set, is create a consensus of the multiple possible sequences it finds to walk further than the assembler wanted to. So closed gaps may not represent one organism...this is in the case of multiple present strains, of course.

Anyway, i'm not a clinical microbiologist, so i'm not aware of databases or programs to identify pathogens. A quick, but adequate solution would be to blast those contigs to NCBI and see where their best hits are. Hopefully they're to your bug! You could also use the program CheckM, which looks for single copy marker genes present in a given genome and gives you a report of the "contamination" of your genome. Multiple copies of those marker genes indicate contamination of your genome. If you have redundant marker genes on those contigs that didn't align, you have some evidence that they don't belong in that genome.

For an undergrad project, you just have to decide what's worth the effort :-). Do the blast thing first off.

Is your project specifically to produce a closed assembly? Or is there another scientific question you want answered? In a lot of cases, you do not need a closed assembly to answer your scientific question, so it would be ridiculous to go above and beyond to produce a closed assembly. For instance, if you want to know the relatedness of your isolate with others, a closed assemby is nice, but not necessary. There are more than enough techniques to do this without a closed assembly.

As mentioned by Brian and Steven, a three-fold coverage is extremely low. Is this a coverage you derived from the assembly, the alignment to the reference or from the raw read file?