I'm working on a protein with 13 cysteines among 500 amino acids. It aggregated easily, and I want to mutant cysteine(s) to alleviate the situation. How to choose the cysteine that need to be mutant and how to evaluate the result?thank you!
What information do you have on your protein - do you have a structure, a homology model? The first thing you want to investigate is whether any of the cysteines form disulfide bonds in the native protein - you do not want to mutate these, as they may contribute to the stability of the folded protein. (Check annotations in uniprot http://www.expasy.org for your protein and murine and human homologs and do a blast search of the PDB http://www.rcsb.org to look for structural information). For the unpaired cysteins, you would like to make an educated guess whether they are buried in a hydrophobic environment - in this case, a mutation to Ala (slightly too small) or Val (slightly too large) are the best replacement. For solvent exposed cysteines, Ser would be the isosteric replacement, Ala and Thr might be viable alternatives.
What information do you have on your protein - do you have a structure, a homology model? The first thing you want to investigate is whether any of the cysteines form disulfide bonds in the native protein - you do not want to mutate these, as they may contribute to the stability of the folded protein. (Check annotations in uniprot http://www.expasy.org for your protein and murine and human homologs and do a blast search of the PDB http://www.rcsb.org to look for structural information). For the unpaired cysteins, you would like to make an educated guess whether they are buried in a hydrophobic environment - in this case, a mutation to Ala (slightly too small) or Val (slightly too large) are the best replacement. For solvent exposed cysteines, Ser would be the isosteric replacement, Ala and Thr might be viable alternatives.
The protein doesn't have any structure information, and blast in the pdb database only gives 24% identity. Maybe this is too low for homology modeling to give structure information?
I was thinking if I could know the disulfide bonds in the protein, and the intermolecular ones might be the cause of aggregation as the aggregated protein only showed the monomer molecule weight in the SDS-PAGE. I don't know if this could work?
You can still make some guesses about surface vs.core location using sequence based methods (hydropathy profiles vs. secondary structure prediction, sequence conservation between different organisms: highly conserved regions are more likely to be located in the core, variable regions on the surface. Hydrophilic loop and turn regions are likely on the surface, amphipathic helices and sheets may be recognizable by the pattern of hydrophobic/hydrophilic residues. If you have some properly folded material available, the reactivity of the Cys residues can give you a hint which ones are accessible to reagents, and Mass spectroscopy of proteolytic digests can indicate which peptides are covalently crosslinked by a disulfide bond.
There is some hope for 24% identity, but only if the alignment extends the whole length of the protein. (More important than %identity is the e-value. If it is less than 0.01 you can be confident it is a true homolog) . Then you can do as Annemarie suggested and find the ones that are not pairing. If this is not an option, you can try prediction algorithms such as DiANNA or Disulfind.
The E-value satisfied your request, but ALL alignments didn’t cover about 150 amino acids. Is this homology model reliable? I was thinking that only the surface cysteines but not the core ones would affect the aggregation state of protein. Is this right? If so, I only need to try to change the surface ones?
The DiANNA result was different from structure result.
Your alignment suggests that you have a structural core with a large insertion, what may itself be another folded domain. The template is a multidomain protein, and the insertion may be another folded domain. None of the cysteines in the template are paired, and it is a cytoplasmic protein.
Unpaired Cys residues in cytoplasmic proteins are usually not highly conserved unless they play a role in the active site of the protein. Comparing the sequence of homologous proteins in a variety of organisms, you can assess the sequence variability of the different Cys positions and learn that other amino acid residues are tolerated in that local environment.
If keeping your protein under reducing conditions throughout purification and characterization prevents aggregation, getting rid of cysteines may be the solution (I assume that you actually tested for the presence of disulfide bonds in the aggregates by comparing SDS gel runs of reduced and non-reduced samples). However, multidomain proteins can also aggregate by domain-swapping, which may bring Cystein residues to the surface which would be buried in the monomeric protein.
I Blasted the about 150 amino acid in the PDB database, and no significant similarity was found. I'm thinking maybe this part is just a loop or something that doesn't have a structure. One of the models matched the assumption(top, the yellow part represents the 150 amino acids), even though another model suggested the 150 amino acids had structure(bottom). I think I could coexpress the single domain part(remove the 150 amino acids) and test its activity to see if it could still maintain the activity and avoid aggregation?