What could be a reason of this type of error in the genome annotation?

More Angelika Voronova's questions See All

Why unlike to coding genes characterisation, there were so many attempts to find ONE unique function, which would be suitable for all mobile elements?

To be short, I would like to share some thoughts about my research topic, but in common words and without details. Otherwise, I would write a scientific review for that purpose. I am sorry for...

09 October 2017 3,988 4 View

Why do I get difference in Copy Number estimation (software via Excel)?

I would like to understand calculations made by qPCR software. I create a standard curve in Excel with same parameters as in software with equation for my target: y=-1.571ln(x)+39.433 and...

01 February 2016 8,446 6 View

Can LiCl significantly change its Mr by water absorption from air during storage?

We use 10 M solution of lithium chloride for RNA precipitation. LiCl is crystallized in the almost empty jar during storage. Could it impact the molarity of solution made?

01 February 2015 503 4 View

How to determine whether my sequence is in direct or complement orientation?

For sequence publishing, should I take into account the common structure of retrotransposon (LTR-gag-int-RT/RH-LTR)? Or BLASTX similarity results, where all proteins are with weak similarity to...

10 November 2013 6,379 8 View

Could anyone recommend a good and free of charge program for complex sequence analysis - alignment, assembly, predicting ORFs and building a consensus?

I would like to assemble several parts (500-800 bp) of sequence (total=6 kB) and compare several variants of each part, to analyze SNPs and find if any variant could have ORF. (In old-fashion...

05 June 2013 2,896 29 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

How to confirm the site-directed mutagenesis result without performing NGS?

I'm cloning a fragment of 3200 nts into plasmid. The cloning was successful, however, 02 amino acids were mutated. Now I want to fix these 02 aa by site-directed mutagenesis technique using...

08 August 2024 4,645 2 View

Why did the authors extrapolate a phenotype that they experimentally proved in one bacterial strain across the whole genus of the organism?

I aim to be as skeptical as possible regarding whether a pair of orthologous genes results in the same phenotype in their different but related bacterial organisms under similar environmental...

05 August 2024 6,787 4 View

Why my colony PCR results of my recombinant bacterial not showing any results?

I am performing ligation of the plasmid and a target gene. The steps I have taken are: 1. Double digestion of the plasmid and target gene 2. Ligation of the plasmid with the target gene 3....

05 August 2024 2,570 3 View

Who of all the Global Scientific community will help me Prof. Dr. Yoshida make way for TPEOM, MEC ~EMC to return the atmospheric gases to the norma ?

TEP presentation caption (The Environmental Project) Re: Why should Washington’s DC, or any country government point of location think of as nowadays of as to being 'tomorrow as to come! if it...

03 August 2024 2,484 1 View

Does anyone have issues using Prepman Ultra reagent for MicroSeq ID bacterial, fungal and yeast sample preparation?

I have been attempting to extract DNA from Bacterial, Fungal and Yeast banked samples (>1e7 cells) using Prepman Ultra reagent and I seem to be struggling to obtain a sequence. Although the...

01 August 2024 2,079 0 View

How is the bacterial genome's high protein count verified as genuine despite 800+ contigs and good metrics (98.55%completeness, 0.68% contamination)?

Given that the bacterial genome has over 800 contigs, but its quality metrics are good, with a completeness of 98.55% and a contamination of 0.68% as assessed by CheckM, what specific validation...

01 August 2024 1,514 1 View

What is the acceptable p-value cutoff for GO enrichment analysis ?

I have an RNA-seq data that I have analysed using Limma-voom and have extracted the gene IDs, log2FC and the p-values. At p value < 0.05, I have over 10,000 DEGs, however, when I run the GO...

31 July 2024 225 2 View

Inquiry on Maximum Nucleic Acid Volume for 2.5 mL Liposome Solution?

I am currently working on a project involving liposomes and need to determine the maximum volume of siRNA that can be added to a 2.5 mL liposome solution with a total lipid concentration of 10...

30 July 2024 6,420 1 View

Recovery Viurses from bacteria genome?

Hello everyone, I am currently looking for tools to recovery viral genomes from bacterial genomes, not metagenomes. However, I have only found tools that are designed for retrieving and studying...

28 July 2024 8,953 1 View

Angelika Voronova

Currently megablast is running to test the proportion of the mistake. There are also gene transcripts that are correct and match their annotated genomic sequence. But even with smaller data set computation is intensive.

Matej Lexa

I am afraid we don't have enough information to pinpoint the source of your problem. I imagine such discrepancies could be caused by i) using a different genome assembly; ii) having some kind of alternative transcripts with intron inclusion, alternative promoter or not terminated where you would expect them to be; iii) human or computer errors in annotation. Anything else?

Xabier Vázquez Campos

Ab initio gene predictors use gene models in their predictions. For that they use a set of genes to create a model to predict additional genes. Repeats, esp. complex repeats are usually of viral origin which often include genes that are compositionally and structurally different to the rest of the genome. Because of this, gene models including many of this "foreign" genes don't perform as expected and both underperforming and mispredicting many protein-coding genes.

Pretty sure that this is something that has come around in Maker's mail list.

For the coordinates issue, are you using the annotation file that corresponds to your version of the genome? Annotations with coordinates are specific for the genome they have been generated for.

Thank you for your answers, there is space for thinking now.

Different genome assembly - shouldn't be truth, as I downloaded new genome assembly with annotation files from one source/folder of public repository, and this non-model plant genome isn't sequenced elsewhere. At least there wasn’t any other annotation file. Furthermore, this second version of the assembly contains new gene ID naming index, that was common for genes in all files for the v.2. Some genes still perfectly correspond to genome coordinates and to their transcripts.

I was downloading this genome version after a year after the genome version publication. I was completely new to whole-genome bioinformatics studies, therefore some experienced researchers helped me to download and process all the data. I didn’t expect errors in the initial files. However, owner of this genome (another 10 months was passed) has changed all annotation files and transcripts, they now are re-named again completely. Looks like we took uncompleted annotation...

How one could test the genome annotation for the accuracy?

As far as I look at them in the annotation files genes, that wasn’t correct was marked with ‘AUGUSTUS’, but correct genes with 'GmapIndex_5000bpSoftMaskedNew’.

The gene prediction is always performed in the one direction: from transcripts to genome scaffolds / or there exists reverse approaches? I could imagine, that some TE transcripts were defined as a genes, so if the one family repeats have 80% similarity, but genes are separated on sim level 98%, all those repeats from the one family could have their false gene ID and corresponding coordinates to the genome.