Several proteins appear to share are some similarity in terms of domains with some other primitive/ancestral proteins. Therefore, is it ok to say that majority of the proteins have been derived or evolved from a few primitive or ancestral proteins?
Cited References: 47 [ view related records ] Citation Map
Abstract: The trypsin family of serine proteases is one of the most studied protein families, with a wealth of amino acid sequence information available in public databases. Since trypsin-like enzymes are widely distributed in living organisms in nature, likely evolutionary scenarios have been proposed. A novel methodology for Fourier transformation of biological sequences (FOTOBIS) is presented. The methodology is well suited for the identification of the size and extent of short repeats in protein sequences. In the present paper the trypsin family of enzymes is analyzed with FOTOBIS and strong evidence for tandem gene duplication is found. A likely evolutionary path for the development of present-day trypsins involved an intrinsic extensive tandem gene duplication of a small DNA fragment of 15-18 nucleotides, corresponding to five or six amino acids. This ancestral trypsin gene was subsequently duplicated, leading to the earliest version of a full-sized trypsin, from which the contemporary trypsins have developed.
We did a study some years back 'On the evolution of Trypsin' where we were able to show that Trypsin was constructed by repeated copying and concatenation of small gene segments coding for 5-6 amino acids. Also we argued that at some stage a large copy corresponding to around 100-120 amino acids had taken place. This analysis involved trypsins from very diverse organisms, and in an evolutionary perspective probably covered about 1 billion years.
Cited References: 47 [ view related records ] Citation Map
Abstract: The trypsin family of serine proteases is one of the most studied protein families, with a wealth of amino acid sequence information available in public databases. Since trypsin-like enzymes are widely distributed in living organisms in nature, likely evolutionary scenarios have been proposed. A novel methodology for Fourier transformation of biological sequences (FOTOBIS) is presented. The methodology is well suited for the identification of the size and extent of short repeats in protein sequences. In the present paper the trypsin family of enzymes is analyzed with FOTOBIS and strong evidence for tandem gene duplication is found. A likely evolutionary path for the development of present-day trypsins involved an intrinsic extensive tandem gene duplication of a small DNA fragment of 15-18 nucleotides, corresponding to five or six amino acids. This ancestral trypsin gene was subsequently duplicated, leading to the earliest version of a full-sized trypsin, from which the contemporary trypsins have developed.
Look at works of Christian Zmasek (Sanford-Burnham Medical Research Institute)w ho spent a significant amount of effort to answer these kind of questions.
Most proteins are derived from combinations of domains which come from ancestral proteins (to put it in your words). In that sense you are correct. There are a limited number of domains, most of which are ancient, and these make up the building blocks of the much larger number of whole, often multi-domain proteins. You may cite this review to support your statement:
Chothia, C. and Gough, J. (2009) Genomic and Structural Aspects of Protein Evolution. Biochem. J. 419(1), 15-28.
What Steffen is talking about is one possible mechanism by which one kind of novel domain may have come about. This is not an answer to your question. De novo evolution of a new domain is exceptionally rare, and this is not how the majority of protein-coding sequence is generated in the genome. Duplication and recombination (domain shuffling) creates most of the protein content you observe.
Yeah it's a theory called Domain Shuffling Theory. As per this all the new proteins are formed by shuffling of domains from the existing functional proteins. Not only this, it is also observed that only a few permutations and combinations of all the possible amino acid sequences actually seen in nature. Why this is so and what about the remaining combinations is a major question in protein structural studies.
While it is generally true that the structural fragments that are functionally important are reshuffled and re-utilized in many context in the same as well as in different proteins multiple times. It is also true that nature reinvents folds and the same units multiple times. It is also true that random reshuffling of small fragments creates new proteins.
However, we should not forget about a regularly happening mutational drift that has at least two effects. One is that the proteins can suddenly switch their structural and functional designation. The second is that longer fragments are created without a significant structural and functional designation.
The third element we should remember is that proteins by combining the solid-like and liquid-like characteristics can exist at the same time in multiple forms. Just as examples serpins and prions.
Moreover there exist a whole stream of research concerning ambivalent sequences. I myself published two pieces of work that contribute to clarifying some of the aspects of these sequences. In my review about thionins I noted that thionins alpha and gamma have significantly different folds (alpha has helices, gamma only beta sheets) but still they share, on the overlapping sequence length, 25% sequence identity. The other example is provided by the crystal structure of T. maritima inositol monophosphatase. In the paper we noticed that the same 24 aminoacid fragment has in the same crystal structure in one subunit an alpha helical conformation and in the other identical subunit an all beta conformation. Next we noticed that this element is the least conserved in the entire family (what may suggest fast evolution) and has the least tendency to have a particular secondary structure, but it determines the function of the individual representatives in the family.
From all these examples there is a clear suggestion that proteins have multiple ways of creating new folds and new functions and they may but do not need to utilize the previously invented domains. Again, I urge to read the papers by Zmasek. Proteins are not like Lego blocks, but sometimes they behave like them.
Although that statement might be true for most of proteins, we should not forget the existence of convergent evolution at the molecular level. There are some examples that show the importance of natural selection in shaping proteins that perform similar functions, even though they have different evolutionary origin.