How to remove redundant and incomplete protein sequences during multiple sequence alignment and phylogenetic analysis?

More Sandeep Kumar's questions See All

Can Marketing stop the War ?

Can marketing stop the war ?

06 August 2024 3,661 11 View

What is the relationship between the threshold voltage and the device's noise performance, particularly in terms of flicker noise and thermal noise?

The threshold voltage (Vth) of a MOS device plays a crucial role for its operation. At the same time, noise is an intrinsic factor. So how noise (flicker or thermal) change with the change in...

29 July 2024 3,246 0 View

What is the solvent system required to run tlc for calixarenes?

If you don't understand my question let me know

21 July 2024 563 1 View

Sample size for qualitative study?

I am working on scale development in behavioral finance by undertaking a mixed-method approach using the exploratory sequential design. The phenomenon has diverse meanings in existing literature...

20 July 2024 9,153 11 View

How to convert available phosphorus to total phosphorus?

I need to convert Available phosphorus into total phosphorus

18 July 2024 6,799 3 View

Issues with Malaria Coated Nitrocellulose Membrane Releasing Moisture?

Hello, I am currently working with Malaria Pf/Pan nitrocellulose membranes coated for malaria detection. After drying the coated membranes in an Incubator (overnight at 37°C), I've noticed that...

16 July 2024 7,937 1 View

Can it is possible to find the cleaved sequence when a protein cleaved by a heamaglutanin protease (HA/P) by any bioinformatics tools?

Bioinformatics tools like peptide cutter

15 July 2024 6,453 1 View

Can we progress towards defining obesity based on body fat content?

The burden of obesity is enormous. The greater challenge lies in accurately diagnosing obesity at the right time, reflecting the true increase in visceral fat. BMI may not accurately represent...

14 July 2024 8,771 4 View

How does high humidity affect the growth and development of crops?

10 July 2024 7,258 20 View

In what ways does high humidity impact the pollination process in agricultural plants?

10 July 2024 7,513 13 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How to confirm the site-directed mutagenesis result without performing NGS?

I'm cloning a fragment of 3200 nts into plasmid. The cloning was successful, however, 02 amino acids were mutated. Now I want to fix these 02 aa by site-directed mutagenesis technique using...

08 August 2024 4,645 2 View

I can't see the ssDNA band after performing asymmetric PCR. Is there any way to do this?

After performing symmetric PCR, PCR purification was performed. Afterwards, asymmetric PCR was performed using the PCR purification product as a template, but no ssDNA band was confirmed in the...

08 August 2024 1,668 3 View

Does crude extraction using NaOH and Tris work well with Fungi?

I'm trying to find a DNA extraction method for fungi that does not require equipment and heating. Is there anyone who can suggest an alternative option? Thank you

08 August 2024 4,733 2 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

Salvador Ramirez-Flandes Popular answer

I think you might want to cluster your sequences for selecting only representative sequences for your clusters. However you will need to know in advance the identity threshold for your case. You can do this with tools like cd-hit or usearch. For example for CD-HIT:

cd-hit -i input.fasta -o input -c 0.9

and then you will have a file with representatives sequences at 90% identity.

Best regards.

Salvador Ramirez-Flandes

Guillaume Blanchet

Hi Sandeep,

I can tell you how I personally do in such case, but the issue for you to use it will be the number of your sequences... and their overall quality.

I manually use MEGA. I first align my sequences using MUSCLE with a huge gap opening penalty (like -5 to -10), like this you will easily see the shorter sequences/incomplete/too distant. Remove by Crtl X.

To remove redundant proteins, I just start a NJ tree with p-distance as model (in order to get that tree quickly). If two sequences are identical, then they will be branch together with no distance (horizontal) between them.

Usually I try to use UniprotKB for such work. It has a BLAST option too, at the difference that you can see if the protein is reviewed or not. Maybe easier to also get information such as provenance (mRNA or protein) and then decide if it is a shortened incomplete form or just a natural short isoform. All depend of your sample size and diversity...

Hope it helps,

Best,

Guillaume

Rajesh Kumar Gazara

You can also try BLASTCLUST to remove redundant sequences.