What are the softwares to predict the structure of protein?

Dear Ashish

I would start by performing a BLAST using e.g. Uniprot.org. That will tell you if your protein is similar to anything in the database, and if so, it is likely that the relevant catalytic site for the similar proteins are annotated. Alternatively (or subsequently), you could also attempt doing a templated model with e.g. SwissModel, but the resulting model may not have a satisfactory fit. If a good template exists, you can try to overlay the structures if the model and the template to identify the catalytic site. What is the protein you wish to predict the structure of?

George Minasov

Hi Ashish,

So, I would think the situation is next. You got the sequence and you have no idea what that sequence represents.

1. First, as Simon recommended, find the name of your protein, if your sequence is a protein sequence.

It could be that you got some sequence of letters, which do not represent any protein.

2. Second, after you got positive result from the step 1, run your sequence against "paper blast" (http://papers.genomics.lbl.gov/cgi-bin/litSearch.cgi). It will search for publications related to your sequence, or proteins with similar sequences.

You have to find the publication about function of the protein with the highest sequence similarity. Read the publication carefully and make a list of residues of the active and/or binding sites.

It is possible that you will get hints (publications) with crystal structures. Look for highest sequence homology structures.

3. If your search in "paper blast" did not give any information about crystal structures of similar proteins, I would recommend just in case, repeat search at the Protein Data Bank. In the advance search you can find structures based on the sequence homology (BLAST).

4. If you got a sequence or the structure with higher than 75% homology in the sequence, your protein most likely does the same thing. You can align two sequences and check if positions for active/binding sites match.

If you got sequence homology between 40% and 75%, there is some probability that your protein has similar function. But in this case I would run multiple sequence alignment, just to make sure, that you are checking correct patterns.

If sequence homology is below 40% the predicted function just from sequence would be not reliable. You have to have the structure of your protein to compare with the homologous structure (structural alignment).

5. If the BLAST search gave you result "Unknown function", there is no match you can do.

6. If you did not get any matches in PDB for your sequence. You have a unique protein with the "Unknown function" and the "Unknown structure". Only way would be to solve the crystal structure to get more information about your protein (but that is different story)...

Hope this will help you to identify your sequence.

Good luck!!!

George.

Ashish Chauhan

Thank you everyone.

I would like to go through the protocol suggested by Dr. Simon and Dr. George, and I will try Modeller as well.

Name of the protein is RuBisCO Simon Gregersen Echers

Simon Gregersen Echers

Dear Ashish Chauhan

RuBisCO is probably the most abundant protein on earth, so there are tons of resources available (in fact, I am also working a little with RuBisCO from primarily spinach, alfalfa and Sugar beets). You should have no problem finding useful things for it. It has a large and a small chain that, in vivo, forms a homotetramer. Here you see, for instance the UniProt accession for the spinach large chain

https://www.uniprot.org/uniprot/P00875

How are iso-frequency contours plotted?

How can we differentiate between calcite, dolomite, siderite, magnesite and ankerite minerals in carbonatite rocks in thin section under op microscop?

Why might the impedance values for DI water and 0.1X PBS buffer solution exhibit a decreasing and increasing trend, respectively over time (HP 4194A)?

How to do Pre Merger Financial Perfomance Analysis?

What are the roles of both Monetry and Fiscal Policy coordination in optimising the Macroeconomic outcomes in the economy ?

What is the relationship between the threshold voltage and the device's noise performance, particularly in terms of flicker noise and thermal noise?

Should Coefficient of Variation (CV) in a biological data necessarily be below 30%?

Is CFI / TLI = 1 okay in Mplus?

How to estimate a protien's size using Chimera or PyMol? how correct this estimated can be?

How to bring baseline to zero for an absorbance data for chromatogram?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How to confirm the site-directed mutagenesis result without performing NGS?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

Separation of organic acids-HPLC?