You can build trees with any sequence you want. But I guess the active site will be very conserved, and it might be difficult to build a good tree. It all depends on what exactly you want to do, if it's evolution you're after, you better take the whole sequence I think.
You can build trees with any sequence you want. But I guess the active site will be very conserved, and it might be difficult to build a good tree. It all depends on what exactly you want to do, if it's evolution you're after, you better take the whole sequence I think.
The above suggestions are good. If you want to go further, I would do a multiple alignment or Blast first and see if you can identify the active site residues in your sequence. Then you can reduce the sequences you are using for your tree to the active site sequences only. Of course, your active site may be made of several scattered motifs in your sequence. if so, you might choes a number of ways to go. You could use the most conserved motif . . .. You would need to take care as to which program you use to bulid your ML tree if you are using very short sequences.
As Joke mentioned above, of course you can build it. You can do that with any set of sequences. The real question however is whether the tree is valid for your purposes. There are going to be very different evolutionary constraints on the active site region than on other regions of the protein. So you need to carefully consider the biological question you are trying to ask and whether or not this approach will be valid and appropriate for your question or not.
You need to test firstly for the best evolutionary model to the data set by using Paup*v.4.0b10. Posteriorly you can also translate the sequences using Bioedit for example. With the both dadasets in hand you can perform two ML analyses informing the model parameters to each analysis. A final step could be copare the topologies and gather your conclusions. Normally at a MP analysis those active sites (Codon-based) would be surely non-informative as Joke said.
You could also use this active site or conserved domain in order to infer pyhylogenetic relationships, if you have distant species (for closer i think it doesn't work so good). i.e. in this paper they reconstruct the relationship among basal metazoan and close relatives using protein domains.
-Guifré Torruella, Romain Derelle, Jordi Paps, B. Franz Lang, Andrew J Roger, Kamran Shalchian-Tabrizi and Iñaki Ruiz-Trillo. Phylogenetic relationships within the Opisthokonta based on phylogenomic analyses of conserved single copy protein domains. Molecular Biology and Evolution 29(2): 531-544.
Thank you! Yes;, it makes a good deal of difference how diverged the organisms are, in other words, are we looking at deep time or speciation. What I assumed the original question was dealing with,deep time since tubulins are such ancient proteins. Also, are we asking questions about the relationships between proteins or about organisms? If it is about relationships between organisms, one would be obliged to look at a large number of proteins. I would really like to know more about the question, since we may all be correct in our advice but just not agreeing on what the question is.
Before choosing to use whole sequences or active sites or conserved domains you should ask yourself if recombination has a important role in the evolution of your protein family. If recombination is important in your family, you should make your trees with "recombination units", usually conserved domains but you can use any pattern that fits that task.
Datamonkey.org is one of tools which easily give you answer if your sequences are recombined or not and, what is also important, you can screen your sequences using the DEPS method, useful when one wants to detect convergent evolution or selective sweeps.
@ Douglas, i wish to compare FtsZ (prokaryotic tubulin homologs) with tubulins and compare and see divergence (if any) in the active (nucelotide binidng site) across species