How to successfully recover aligned sequences, AND ONLY THEM, from a BLAST job on NCBI web server ?

10 October 2022 1 881 Report

Hi there,

I would like to share with you a problem that I commonly encounter on NCBI's BLAST web server:

I would like to recover the homologous nucleotide sequences of a gene of interest (presumed single copy) within a microbial taxon (most often within a genus).

As a starting sequence, I use a protein sequence and not a nucleotide sequence: if I use a DNA sequence from one of the species of the genus studied, I risk missing the homologous genes in the most distant species within the genre. So I perform a TBLASTN.

When I perform this TBLASTN on the NCBI BLAST server, I adapt the parameters so as to target the taxon studied, to recover a maximum of sequences (I target 1000 hits or even 5000 hits depending on the situation) and I use a fairly stringent e-value threshold (I do tests before) which ranges from e-50 to e-100 or even more stringent.

As databases, I target either the NR base or the WGS base depending on the situation. I specify it in case I am asked the question but it does not change much to the problem encountered.

The results displayed by the server seem to be suitable most of the time and I have in front of me the list of strains which have been fully sequenced to date and for which a sequence homologous to my sequence of interest has been found by the algorithm.

On the other hand, when I try to download the fasta file of the aligned sequences, I recover a number of sequences much larger than what was displayed. Tests have shown me that this number depends on the stringency chosen in the parameters. For example, if I keep the default e-value settings (0.05), I can potentially recover over 100,000 sequences with sometimes over 100 sequences per accession!

Already, if you have a trick to recover only the most homologous sequences, I'm interested. But my main problem is this:

In the recovered Fasta file, the downloaded sequences are not ordered according to their e-value, their homology but according to their position in the genome (or contig). It would have been possible for me in a few clicks to sort the hits to retrieve only the first of the sequences for each accession, but under these conditions, the sorting is too laborious.

Have you faced this problem and if so, how did you solve it (if you solved it…) ?

Thank you for your attention.

Abhijeet Singh

With this question, I suppose you need to slow down a bit and carefully look at the options available in BLAST before haphazardly using it. Blast is highly customizable, fast and very easy/user friendly services out there. And one can download results in all possible forms.

Many of the people do not face such kind of problem because they might know whats and hows of blast. Please explore it carefully.

Badges
Science topic

More Thomas Guiraud's questions See All

How can women be responsive when they can make love for hours?

… with my present female lover … she and I spend anywhere from two hours to six hours in caressing, touching, cuddling, hugging, lip kissing, deep kissing and intimate conversation before,...

11 August 2024 4,521 0 View

Adhesion strength of coating?

How can I determine a good adhesion strength range for coatings on polymer surfaces, such as DLC on polymer substrates? Is there a specific threshold for adhesion strength (from T-peel tests)...

10 August 2024 942 3 View

Why do men not accept that continually hassling for sex proves that they want it more than their partner?

Your partner’s not there to service you, it’s not their job to keep you sexually satisfied. You’re together because you love each other and want to make each other happy. Constantly hassling them...

08 August 2024 1,491 0 View

Why do we equate male and female arousal?

Women, on the other hand, can become physically aroused (increased blood flow in the reproductive organs) without becoming psychologically aroused even in the slightest. (Robert Weiss)

05 August 2024 9,537 2 View

Polymer wear calculation?

What is the method for analytically calculating the wear and service life of polymer-polymer sliding pairs?

04 August 2024 1,078 1 View

Why do women not understand that men are aroused by physical contact?

Women often complain that their husbands never touch them unless they want sex. (Michele Weiner-Davis)

02 August 2024 7,778 2 View

Why do women usually need more persuading than men do to have sex with a new lover?

Women need to feel a degree of sexual intimacy before sex becomes desirable… For women, intimacy sometimes results in sex; for men, sex sometimes results in intimacy. (Marina Muratore)

31 July 2024 8,860 0 View

Why do men and women confuse platonic love and sex?

Women associate affection with love. … Men associate affection much more directly with sex. … Men see affection of any kind as a sexual invitation. Many women find this bewildering. (Kramer &...

30 July 2024 9,498 2 View

Why is it not common to see similar pairs in polymer sliding pairs?

If the pairs are similar, will it reduce the adhesion due to electric charges?

29 July 2024 3,185 2 View

Why do women use fantasy to achieve arousal alone?

Women also often find it easier to fantasise when self-pleasuring than in sex with a partner. The immediacy of someone else’s needs actually inhibits the expression and satisfaction of their own....

26 July 2024 8,351 2 View

Can we convert a thousand of FASTA sequence in numeric form in .csv format? If yes kindly send me the script for the same?

I have a .text file for various FASTA sequence , and i want to convert these sequences into a numeric file which will be in .csv format. OR I want to extract physiochemical properties(features)...

25 July 2024 3,650 2 View

Why my negative control siRNA is decreasing the target gene's expression?

Hi Everyone, I'm using an siRNA kit to knock down a target gene. The kit guarantees that the negative control doesn't target any sequence in mouse genome, and when I use BLAST I don't find any...

23 July 2024 2,673 6 View

How to use NCBI datasets ?

I have been trying to extract genome from NCBI using their dataset tool, however some examples seem not to work : ./datasets download genome taxon "Homo Sapiens" --annotated --assembly-level...

20 July 2024 1,339 2 View

How can I adapt cyclic server model in hospital operations?

To help enchancing patients flow and service efficiency

18 July 2024 482 2 View

How to download fasta files in "gene features" format from NCBI with multiple samples in R?

Hello, I was attempting to download some sequences data from NCBI. The format "gene features" could help me extract certain gene fractions using the names, but I found that it seems to be...

04 July 2024 9,875 1 View

• What role should preprint servers like arXiv play in the scientific publishing workflow?

Preprint servers play a valuable role in the scientific publishing workflow by accelerating the sharing of research, promoting openness and transparency, and diversifying the publication...

01 July 2024 4,022 2 View

Problems to find a region of a Virus gene?

We have a couple of primers but when we make a blast for those primers we see two results: When I use Blast from NCBI, I found many lineages which anniling with many lineages. But when I want to...

01 July 2024 7,644 3 View

Query Regarding tRNA Annotation Order in Feature Table for NCBI Submission?

Dear Sir, I hope this message finds you well and in good health. I am writing to seek your guidance regarding an issue I encountered during the annotation of my phage sequence using DNA Master....

30 June 2024 8,650 0 View

How to calculate SD for the data obtained from graphs for meta analysis?

Dear researchers. I have extracted data from graphs for meta analysis. I am stuck to find the SD for the obtained data. Anyone who can help me to sort out this problem with formula? Thank...

23 June 2024 2,992 2 View

Is there any bioinformatic tool that can download batch DNA seq and translate them into multiple aa seqs?

Hi, everyone I would like to know if there is there any bioinformatic tool that can download batch DNA seq using imported NCBI accession numbers (upto 100 sequences) and translate them into...

19 June 2024 1,917 2 View