Remove redundancy in protein dataset?

More Bondeepa Saikia's questions See All

Could someone explain the basics of working with enzymes?

I'm interested in learning about enzyme usage. Could someone explain the basics of working with enzymes? I understand they're different from standard chemicals, but I'm uncertain about the...

01 August 2024 5,460 5 View

Can someone help me to model a core-shell nanostructure in GDIS and subsequently prepare the fdf file for using in SIESTA?

I am very new to the material modelling in GDIS. So, I want to learn how to model a nanostructure specifically core-shell in GDIS.

28 June 2024 5,224 0 View

How to calculate the dilution factor for samples prepared for ICPMS analysis?

I have taken around 100 mg of sample for digestion and final volume was 50 ml. But the acid concentration was too high in the sample so I had to dilute it again. I have taken 1 ml from the 50 ml...

16 May 2024 2,528 3 View

Can anyone help me with LCMS data. I am stuck with score sequest HT which is showing 0. Can we able to take this protein with 0 sequest HT score?

I need to set a cut off for my proteins but I am confused between score sequest HT and unique peptide whome to consider. If we consider score sequest HT then shall we need to consider the value 0....

26 April 2024 3,713 2 View

Why is the Ljung-Box (Q) statistic specifically Ljung-Box (18), missing in SPSS when using data from the years 1998-2014 to predict for the year 2016?

In detail, when utilizing the data from 1998 to 2014 as the training dataset, the Ljung-Box (Q) statistic, particularly Ljung-Box (18), is not generated in SPSS. However, if the analysis...

14 March 2024 2,383 1 View

Statistical analysis on maternal health care?

maternal health care

25 February 2024 8,730 3 View

How we search a research paper ?

Maternal health care

25 February 2024 4,309 1 View

May I know, whether the Journals include reference section while checking the similarity index of a manuscript?

It's an important query regarding the plagiarism checking of research papers.

24 December 2023 6,752 2 View

What should be the PCR profile of full length gene amplification with amplicon size more than 1.5 kb?

I am facing problem to amplify full length gene (around 1.5 kb) using cDNA with primer length varying from 24-28 decamer( attached with adapter) at tm 62 degree celsius. What PCR profile I should...

07 December 2023 8,199 3 View

What is the procedure to add 2 restriction sites to primers for cloning a complete gene sequence and is it necessary to add setting sequence ?

how to add two restriction enzyme sites to primers for cloning a gene sequence (size 1-2kb) to pGEM-T Easy Vector followed by pcambia 1301 and is it necessary to add setting sequence before the...

20 September 2023 7,443 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

How to confirm the site-directed mutagenesis result without performing NGS?

I'm cloning a fragment of 3200 nts into plasmid. The cloning was successful, however, 02 amino acids were mutated. Now I want to fix these 02 aa by site-directed mutagenesis technique using...

08 August 2024 4,645 2 View

Why does my protein refolded to beta sheet during thermal denaturation analysis?

Hi! So i attempted to understand a novel protein behavior towards heat application by analyzing its secondary structure change. I subjected the protein to a thermal denaturation analysis using...

06 August 2024 1,989 3 View

I need the datasets of Microgrid for system identification?

Hi I am working on data driven model of the microgrid, for that, i need the reliable datasets for the identification of MG data driven Model. Thanks

02 August 2024 5,748 4 View

Does anyone have issues using Prepman Ultra reagent for MicroSeq ID bacterial, fungal and yeast sample preparation?

I have been attempting to extract DNA from Bacterial, Fungal and Yeast banked samples (>1e7 cells) using Prepman Ultra reagent and I seem to be struggling to obtain a sequence. Although the...

01 August 2024 2,079 0 View

Transfection in HEK293T cells?

Dear All, I am trying to transfect a pCDNA3.1 vector containing my gene of interest. The purpose is to figure out the localization of the protein of interest. I have fused the protein with GFP on...

31 July 2024 9,892 4 View

How to retain the a GFP tagged gene expression in stable cell line?

Hello , I established a stable cell line expressing GFP tagged to a centrosomal gene having G418 drug selection marker. I validated the stable line by IFA and Western blotting, results are fine....

29 July 2024 5,007 0 View

What is the relationship between protein structure and N or C terminal tagging choosing?

I want to do 2,3-butanediol dehydrogenase(BDH) enzyme purification to confirm its activity for 2,3-butanediol. Before that, I need to confirm which N or C terminal tagging is better for enzyme...

28 July 2024 366 3 View

Why cannot i find my protein on cell surface after antibiotic selection of expressing plasmid?

I cannot confirm cell surface protein expression by flow cytometry even after transfection and antibiotic selection of cells. Does it take long for proteins to express on the cell surface? the...

28 July 2024 3,178 3 View

Should the amount of DNA input used for ChIP-seq library preparation be matched between the control and experimental groups?

Hi all. As a beginner in ChIP-seq experiments, I hope you understand that the following questions might be somewhat basic. I am planning to perform ChIP-seq or MeDIP-seq analysis to investigate...

28 July 2024 6,938 1 View

Muhammad Ali

Related to your query, I suggest you to follow this thread of answers/discussion: https://www.researchgate.net/post/How_can_I_remove_redundancy_in_sequence_data_sets

Also : https://star-protocols.cell.com/protocols/1439

https://www.uniprot.org/help/proteome_redundancy

Abhijeet Singh

Incorrect description of the question.

Removing redundancy is not the same as clustering sequences.

And clustering sequences with 25% identity does not make any logical sense.

Bondeepa Saikia

When we download thousands of sequences for databases, we often pick multiple copies of the same sequences which are submitted by different group or sometimes different strains. I want to remove the PDB ID s that have almost same protein sequences (having only two or three mutations)..