How to filter protein sequences (from Uniprot or Interpro) using databases such as Representative Proteomes or Reference Proteomes?

More Lucas Bleicher's questions See All

Como ficarão as vagas de emprego com o advento da Inteligência Artificial?

Em relação aos empregos que estão sendo substituídos pela inteligência artificial, como está sendo organizada a formação e a realocação dos colaboradores? Pois muitos empregos ficarão escassos, e...

31 July 2024 6,864 1 View

How do i quantify cells inside a stirred tank bioreactor?

i am working on a project where i need to find a protocol that shows how to quantify cultured cells inside a stirred tank bioreactor but every where i searched i found nothing

23 June 2024 5,687 3 View

Kolmogorov-Smirnov test using ordered quantitative data. Can you help me?

Assuming this is my hypothetical data set (attached figure), in which the thickness of a structure was evaluated in the defined positions (1-3) in 2 groups (control and treated). I emphasize that...

09 May 2024 871 4 View

If thermodynamics obtains a positive Gibbs energy in the adsorption process with negative enthalpy and entropy, how can it be justified?

ΔG = 12.77 ΔH = -37,92 ΔS = -162,40

07 May 2024 4,942 3 View

How to align sequences using Interpro profiles?

I used to be a heavy user of Pfam and its hmm profiles, and even though Interpro is supposedly its successor there is still a lot of stuff available at Pfam that is not yet available on Interpro...

22 April 2024 6,341 0 View

Does anyone have experience interpreting the output generated by Bottleneck software?

We are investigating the genetic structure of a population. We utilized the software 'Bottleneck.' Does anyone have experience interpreting the output generated by Bottleneck software?

17 April 2024 4,388 0 View

Does anyone knows how to convert an h5ad file into rds file, using an python script?

09 April 2024 1,520 2 View

Which variables may be relevant to a sales forecast for a supermarket chain?

I would like to know examples of factors that can influence a supermarket's sales forecast, whether internal or external factors, so that I can broaden my view on the topic.

07 April 2024 7,995 3 View

What is this strange cylindrical concretion in lacustrine sediment core ?

Dear all, we found these empty cylindrical cocncretions in severals ponds and lakes from different mountain range in Uzbekistan. Any ideas of what it could be ? The pictures have been took under...

29 February 2024 6,654 6 View

Identification of millimetered structures in sediment from core (Uzbekistan lake) ?

We found two strange structures in fresh watered lake from elevation higher than 1500 m a.s.l in Nuratau and Zarafshan ranges: 1: cristal in X or in "star-shape". Any ideas ? 2: fluter cylinder...

18 February 2024 1,306 2 View

I need the datasets of Microgrid for system identification?

Hi I am working on data driven model of the microgrid, for that, i need the reliable datasets for the identification of MG data driven Model. Thanks

02 August 2024 5,748 4 View

Should I remove an item from a scale to raise Cronbach's alpha and McDonald's omega or is it better to leave it if they are both over .7 already?

Hello! I have this scale which had 10 items initially. I had to remove items 8 and 10 because they correlated negatively with the scale, and then I removed item 9 because Cronbach's alpha and...

01 August 2024 4,606 7 View

I need a reliable source or an example supported by excel sheet to understand Fuzzy Vikor?

27 July 2024 5,916 1 View

Is a reliability test necessary in my survey on translations?

Dear all, I gave 116 respondents 18 translated sentences and asked them to indicate their levels of acceptance of these translations on a five-point scale. Some translations result from strategies...

24 July 2024 8,245 5 View

Can you suggest reliable procedures to get displacements from accelerations in frequency domain ?

I have identified many solutions. I need suggestion from somebody with application experience of this topic to identify the most reliable and robust procedure.

21 July 2024 3,465 5 View

When you express a protein, why do we express not only the domain we want, but also the protein around it?

I want to express STK4, and I've searched the paper for reference. When I check the protein kinase domain sequence for that kinase on Uniprot, it's 30-281, but the paper expresses the protein...

20 July 2024 4,951 1 View

What is the Scopus and Beall's dilemma?

I've found that some journals are both Scopus-indexed and listed on Beall's list as predatory or potentially predatory. Why does this discrepancy occur? Are there any more reliable platforms than...

12 July 2024 5,158 1 View

Are open access journals reliable and difference between an open-access journal and a paywall journal?

12 July 2024 8,971 2 View

What are the reliability and validity of a measure in qualitative research and ways to enhance the trustworthiness of qualitative data?

12 July 2024 5,374 1 View

What is trustworthiness in qualitative research and how can you improve reliability accuracy and validity?

12 July 2024 9,035 6 View

Susanta Roy

There are several ways to filter protein sequences from Uniprot or Interpro using databases such as Representative Proteomes or Reference Proteomes. Here are a few options:

UniProt Reference Proteomes: UniProt maintains a collection of Reference Proteomes, which are a set of proteomes that aim to provide a representative sampling of the diversity of the tree of life. These proteomes are selected based on phylogenetic diversity, completeness, and annotation quality. To download sequences from the Reference Proteomes, you can use the UniProt Proteome ID for the organism of interest. For example, the proteome ID for Escherichia coli is UP000000625. You can use this ID to download a FASTA file of all protein sequences in that proteome. You can find more information about UniProt Reference Proteomes here: https://www.uniprot.org/help/reference_proteomes.

InterProScan: InterProScan is a tool that searches protein sequences against multiple databases, including UniProt, Pfam, and other protein domain databases. When you run an InterProScan analysis, you can choose to limit the search to a specific subset of sequences, such as the UniProt Reference Proteomes. This will ensure that you only get hits from a representative set of proteomes. You can find more information about InterProScan here: https://www.ebi.ac.uk/interpro/interproscan.html.

CD-HIT: CD-HIT is a clustering program that can be used to reduce sequence redundancy in large datasets. You can input a set of protein sequences and CD-HIT will group them into clusters based on sequence similarity. The output of CD-HIT includes a set of representative sequences from each cluster, which can be used to filter out redundant sequences. CD-HIT is available as a standalone program or as a web server. You can find more information about CD-HIT here: http://cd-hit.org/.

OrthoDB: OrthoDB is a database of orthologous protein groups across multiple species. It includes sequences from UniProt and other databases, and provides a curated set of high-confidence orthologs for each species. You can search for protein sequences or groups of sequences in OrthoDB and download the corresponding orthologous groups. This can be a useful way to identify a representative set of sequences from multiple species. You can find more information about OrthoDB here: https://www.orthodb.org/.

Al Azim

Filtering protein sequences from Uniprot or Interpro using databases such as Representative Proteomes or Reference Proteomes is a common practice in bioinformatics research. Here are the general steps to filter protein sequences using these databases:

Obtain the list of protein sequences of interest from Uniprot or Interpro.

Download the Representative Proteomes or Reference Proteomes database from the UniProt website.

Use a sequence alignment software (e.g., BLAST or HMMER) to align the protein sequences of interest against the database.

Use a threshold value (e.g., percent identity or E-value) to filter the aligned sequences.

Extract the filtered sequences for further analysis.

Here are some additional details for each step:

Obtain the list of protein sequences of interest: This can be done by searching for a specific protein or by using keywords to retrieve a list of proteins from Uniprot or Interpro.

Download the Representative Proteomes or Reference Proteomes database: These databases contain a subset of protein sequences that represent the diversity of the proteome of a particular organism or taxonomic group. The databases can be downloaded from the UniProt website.

Use a sequence alignment software: Sequence alignment software such as BLAST or HMMER can be used to align the protein sequences of interest against the Representative Proteomes or Reference Proteomes database. This will identify sequences in the database that are similar to the input sequences.

Use a threshold value: After aligning the sequences, a threshold value (e.g., percent identity or E-value) can be used to filter the aligned sequences. Sequences that meet the threshold value are considered significant and can be further analyzed.

Extract the filtered sequences: The filtered sequences can be extracted from the alignment output file and used for downstream analysis such as phylogenetic analysis or protein structure prediction.

In summary, filtering protein sequences using Representative Proteomes or Reference Proteomes databases involves aligning the sequences of interest against the database, setting a threshold value to filter the aligned sequences, and extracting the filtered sequences for further analysis.