How much depth of coverage do I need for a bacterial genome?

More Peter William Cook's questions See All

How do you obtain a Letter of Access for Citizen Researchers?

Please may I ask if non-salaried Public Contributors conduct data collection interviews or Focus Groups with NHS patients as part of your study? This means that the Public Contributor will either...

29 July 2024 3,085 0 View

What is the principle/mechanism behind aging of carbon (graphite) containing refractory mix for isostatically pressed refractories?

The resin bonded carbon containing refractories are aged before use. How time of aging is determined? And mechanism behind aging.

23 July 2024 3,205 0 View

When I am trying to distill triethyl amine while drying it over calcium hydride. How do I design the setup to let hydrogen escape without losing NEt3?

I am new to research. The boiling point of triethyl amine (NEt3) is 89 degrees centigrade. I am skeptical that when I will let the hydrogen escape which is getting generated in situ, I will also...

21 July 2024 7,284 4 View

Why my gel electrophoresis have shadow bands? Please see the attached picture for the gel electrophoresis ?

Sometimes I see the shadow like bands and its not true band. I want to know that what's the reason for it. I am using 2% gel for running genotyping samples I have uploaded the gel picture in both...

19 July 2024 148 6 View

What is the future scope of acoustic emission?

17 July 2024 1,510 1 View

Is the protecting group boc of the amino group stable at 37°C?

I have a small molecule reagent with a boc-protected amino group. Now the reaction needs to be reacted at 37°C for 30 h. Is this protection group stable?

12 July 2024 3,745 2 View

Why can't I detect the plasmon resonance angle with water?

I am trying to measure the plasmon resonance angle of gold film and pure water using the Kretschmann configuration and a 633nm laser. Without flowing water over the gold, I can detect a clear...

10 July 2024 4,719 3 View

Can DEPC-treated water be used as a replacement for OCT when embedding frozen tissue on cryostat?

Hi everyone, I am currently working on RNA extraction on tissue, but unfortunately, I only have large frozen clinical tissue on hand. In order to maximize the RNA integrity, I plan to use...

25 June 2024 7,550 1 View

How can we generate topology file for water and helium system?

Hello! I am facing a problem, I tried using pdb2gmx to generate topology file but it shows error HEA residue not found then I tried copying the forcefield file and edited accordingly but again...

24 June 2024 1,242 0 View

How to write literature review?

i want to write literature review on the topic " production of 3d PLA filament"

19 June 2024 1,712 2 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

Is there any way to quantify bacterial and fungal cells in their mixed culture?

I am working in fungal fermentation of soybean meal and there is bacterial growth in them at times. I am trying to quantify fungal cell counts and bacterial cells; but I haven't been able to do at...

07 August 2024 7,535 4 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Arthur Pightling Popular answer

According to my study (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0104579) 50x (Illumina) is necessary for SNP detection using reference-guided assembly. Using de-novo we have calculated that 60x coverage is necessary for accurate SNP detection. Generally, I would say aim for 75x.

Alejandro Sanchez-Flores

Hi Peter,

Depends on what are you trying to do. If you have a reference genome and you are resequencing, even 20x coverage will do it to have an idea of the variation of your sample. However, if you are assembling it, then between 50 and 100x will be more than enough. All these assuming you are talking about Illumina reads.

Again, you have to tell what is your goal. If you are using other techonolgies, for example, 454 or PacBio, 20x and 70x will be enough, respectively.

So, unless you tell us more, it will be difficult to answer your question.

Hilary G Morrison

I would add that whatever the coverage is, try assembling different amounts of the data, because I have seen 454/Illumina assemblies get worse (more and shorter contigs) as coverage increases. More is not always better.

Arthur Pightling

Le Wang

It depends on your research goals, the nature of your bacterial genome and the technology of sequencing.

Adam Merritt

What they said ^ but also keep in mind that evenness or consistency of coverage is also important. Data of X mean coverage with a small standard deviation produces different results to data of X mean coverage with a larger standard deviation. Especially with de novo work. On some platforms PCR-free prep methods improve results by removing amplification bias and evening out coverage. More even coverage can mean that lower depth of coverage becomes suitable for a given analysis. So for an amplified library 100X may be needed to minimize 0 coverage regions while 50X would be fine on a PCR-free library.

Héctor Candela

The chosen sequencing depth (or coverage) should be high enough to minimize the size of unsequenced regions, but within limits. This is so because most programs will use a de Bruijn graph representation of the data to assemble the genome sequence. Considering the high error rates of current NGS technologies, using too many reads will make the graph unnecessarily complex (e.g. with more bubbles, more tips, more erroneous connections) , requiring more computational resources (e.g. memory, time) and making the assembly significantly more difficult.

Lionel Moulin

To sequence genomes de novo for bacteria (6 to 10 Mb genomes), with illumina hiseq (2x 100 bp, paired end) we have tried to multiplex 3 to 6 genomes, and from the assemblies we estimated that we can multiplex up to 9 still getting our full genomes in

Naseer Sangwan

Must consider the G+C content of your organism before selecting sequencing depth and/or assemblers and assembly parameters..... for example ... on a de Bruijin graph based assembler the, 1000X illumina data of a 45%G+C bacteria will perform completely different than same depth data of a 75% G+C organism (like Actinomycetes)

Patrick Degnan

As noted it depends on your goal. Genome mapping for SNPs, novel gene clusters or de novo sequencing, assembly and closure. We sequenced 42 bacterial genomes (5-6 Mbp) with Nextera libraries in a single Illumina HiSeq 2 x 150 run. We got ~160 million reads (80 million pairs), and resulted in genome assemblies with 100-200X coverage and 50-100 scaffolds. The two biggest factors we have noticed effecting our genome assemblies are 1.) average insert size of the nextera library (smaller 150nt bad, 250-300 better) and 2.) the size and amount repeats (in our case IS elements). If you want to close genomes like ours, using a complementary approach (PacBio) or TrusSeq libraries with significantly larger insert sizes are necessary.

Patrick I would be happy to have an idea of the distribution of pairs on your genomes assemblies. Was it very variable from one genome to another, and what was the average of pairs sizes and range of most fequent ones? Cheers.

Peter William Cook

Thank you to everyone who has answered. I guess part of my problem was the simplicity of the question. I am not able to answer many of the additional questions asked, but these answers(and questions) will be helpful in guiding me towards the right Coverage Depth for my current project.

Thanks! Peter

Lionel, using the Nextera xt kit the library distributions going into the illumina sequencer were pretty broad (150bp-1kb). This distribution was based on the fragment profile from the BioAnalyzer chip. However, the output (sequenced) fragment sizes seemed pretty tight. I don't have the stats handy at the moment but, I'd estimate that >90% inserts were within +/-50nt of the mean (150-250nt). I'm guessing this is an artifact of amplification bias during colony formation? If you need more firm numbers I can talk to my student.

Thank you Patrick. I'm very worried on the insert sizes you gave me! With the TrueSeq it was much better as we could get inserts of 700 bp in mean by performing a sizing on gel of the fragments, and the assemblies were very good! But now illumina stopped this kit and we have to use the Nextera... !

Ralf Koebnik

Hi, to summarize and add a few points: The quality of the genome assembly, e.g. measured by contig number or N50 value, depends on

- coverage

- simple reads versus paired-ends/mate-pairs

- read length (HiSeq vs. MiSeq vs. 454 vs. PacBio vs. ...)

- the genome structure itself.

We have used 96x multiplexing of HiSeq, 2 x 100 nt reads. The first experiment used insert sizes of 250 - 300 bp, the second experiment used the Nextera XT with insert sizes from 250 to 1500 bp. We did not see any significant change due to the larger insert sizes (although I did not confirm them - this statement is just based on the company's information).

We targeted bacterial genomes of about 5 Mbp. The coverage was between typically between 50x and 100x, resulting in ca. 200 contigs for enterobacteria, between 250 and 1000 contigs for Xanthomonas (depending on the species) (e.g. BioProjects PRJNA266384, PRJNA266386, PRJNA266603, PRJNA266604, PRJNA267193, PRJNA266578), and more than 1000 contigs for one strain of Burkholderia (genome size 6.6 Mbp, BioProject PRJNA267193). We tried Edena and Velvet for assembly. Depending on the species, one or the other assembly program performed better and Edena seemed to be better for repetitive regions. Watch out for our Genome Announcements in issue 3 (1), in press.

Good luck!

Ralf

Ayo Ajayi

Hello friends. P.ease i urgently need to learn about genome assembly, anootation and interpretation. Thanks

Sajjad Sarikhan

Hello Every body!

Is there anybody here who has used Pacbio for Bacterial Genome sequencing with not more than 10x coverage? what was the result?

I mean, concerning new Pacbio kits capabilities for producing of reads with mean length of more than 10kbp, is it essential to have more coverage in denovo genome assembly? I think by having reads with mean length of 10kbp , a lower coverage (10x) is enough.

Any better idea?

Best wishes!