I am currently working on a phylogenetic analysis of a protein super family. I've obtained high quality alignments and maximum likelihood trees based on amino acid sequences and the structural information of some members' crystal structures. I was wondering whether obtaining alignments and trees based on the corresponding DNA sequences would be of additional value. For instance, one analysis I would like to do is to estimate the age of old ancestor(s) that gave rise to modern members. I'm no expert in this field, but programs like MrBayes can do this based mostly on DNA sequence alignments. Is it possible to draw that kind of information from amino acid sequence alignments? Any help will be greatly appreciated.
Javier, I must disagree with the general consensus of the above answers that DNA is the more informative -- at least in the case of analyzing a protein superfamily and its fold. The proteins you are looking at in a superfamily are very widely diverged, and over that span of evolutionary time there is far too much likelihood of multiple substitutions at a single DNA base (and thus more unobservable intermediate mutations, causing the divergence time to be underestimated); thus, using the DNA data to generate the alignments is very likely to lead to alignments that do not reflect the actual mutational history and thus incorrect phylogenetic trees. The protein sequence is under selective constraint for protein function and protein structure, and these are conserved over much longer periods than the individual codon choices. This is precisely the reason that BLAST searches for distant protein homologs, which are likely the evidence on which your superfamily of proteins was defined in the first place, use protein sequence comparisons instead of DNA sequence comparisons. Even when comparing aligned DNA sequences, unless the organisms are closely related the alignment should be constructed at the protein level, and then the amino acids should be replaced by their codon sequences to generate the DNA sequence alignment.
To put a finer point on that, the primary reason to align the DNA sequence instead of the protein sequence is that there may have been an insertion or deletion at the DNA level that could shift the reading frame of the c-terminal segment of the protein, or an insertion-deletion pair that shifts the reading frame through a (short) segment of the protein sequence. In these cases, a single event (or pair of events) can change a (short) segment of the amino acid sequence, leading the protein sequence alignment to overestimate the divergence time between the two proteins. However, (a) this is a very rare event because (b) the protein sequence is generally intolerant of that much change. If that much change can be accepted at that position in the protein, then that part of the protein must not be under very strong selection, and will not be terribly relevant to the structure of the protein.
In general I think it will be very difficult to estimate divergence times for organisms that diverged as long ago as, say, a bacterium and a mammal, both of which contribute a member to the same protein superfamily. The rate of evolution of two organisms that are that different is very likely to be quite different, and thus the "molecular clock" assumption that underlies using phylogenies to determine divergence times is not valid on such timescales. This is why divergence time estimation programs generally use DNA sequences: the model is only valid for short divergence times, over which protein sequences don't vary enough to provide enough information, but DNA sequences do.
An analysis based on DNA sequences would likely be more sensitive - some mutations in DNA do not change the protein. These changes in the DNA would be invisible to the protein analysis, but they might offer some extra data to the DNA analysis to resolve the relationships between the sequences.
in my opinion, in general, for the closely related species, or even the same species(the subspecies ) the tree based on DNA sequences is more informative; and for the wide-range alignment from many not closely related species, the tree based on aminoacid sequence is more informative. The "imformative" I used above means more truely, and in fact, the DNA sequence carry more rich information than aminoacid sequence, because from DNA to protein, not only the translation one to one, the modify , delete, alternation and others will occurred in the process , so there is no doubt that the DNA sequences has more information than aminoacid sequence. But if you need the more truly consequences of tree, you should control the irrelevant variety caused by the degeneracy of synonymous codon. Every species has the bias of codon usage, and in general, the more closely related species, the difference of codon usage is more small. So, the closely related species can use DNA sequences, and the wide-range species should use protein sequences. In addition, the superfamily also should use the protein sequences. Because the superfamily include many kinds of protein, maybe they only have the common domain, and the DNA sequences (maybe include the untranslation regions), which also has the codon bias difference, has the low accuracy than the corresponding aminoacid sequences.
I hope it can help you.
Protein phylogeny may take into acount amino acid substitutions that may be conserved in function that a DNA sequence phylogeny could miss. I guess it really depends on your purpose. For age estimation, though, DNA phylogeny seems better.
Although the result can be similar the two analysis have different purposes. In my opinion, the amino acid analysis will give you information about similarity but will fail in give you information about evolution processes for the lack of information in comparison to DNA (like synonyms mutations or invariant codons sites)
For estimation of time ancestors with Bayesian Analysis you have to use the software BEAST and DNA sequences for the reasons above but i recommend you to find some one who know how to refine the analysis cause the Bayesian software's can be very trick and one wrong adjustment can give you a lot of trouble.
Sorry for my bad English... And if you need any help you can contact me
I think the best thing to do in your case is align them according to the amino acid, but then perform the phylogenetic analysis on the underlying nucleotides. Im suggesting doing it this way assuming that its a diverse family... if they align well by nuc, then just do that...
A protein alignment technically contains more information because there are 20 amino acids compared to 4 nucleotides. The trade-off is that you lose all information about synonymous mutations, which could create a situation where you have two different nucleotide sequences producing the same protein sequence. The best of both worlds is a codon-aligned nucleotide sequence alignment, ie align based on protein sequence and convert back to nucleotide. This is typically what is used for evolutionary analyses such as those implemented in PAML. For quick and dirty phylogenies though, protein is probably best.
Chris Hemme, i have a doubt about your answer... I think that, technically, the AA sequence brings more information (20 to 4), but the evolutionary information that the DNA analysis can use to give a phylogenetic answer about the dataset is greater than the numerical characteristic. The values of changes and inferences that the DNA analysis can do when choosing the right model is greater than the diversity that the AA analysis give. What you guys think ?
Amino acid sequences are valuable in determining which portions of the protein's sequence are most critical in function (active sites etc). Gene duplication within a given evolutionary line complicates both DNA and Protein analyses. Protein analysis may allow the identification of orthologues and the divergence of homologues with new functions within a given evolutionary line. If you want to look art overall phylogeny, I would reccomend DNA analysis, but you should look at multiple genes to determine divergence times for various species, genera and higher taxa.This tends to reduce the problems caused by gene duplication and other genomic changes unrelated to lthe mutational clock.
Bottom line; what question you are asking (in some detail) is the important thing to determine before deciding what type of analysis you should chose.
I would concur with the opinions offered above: the purpose will determine the methods. Studying a fold family and focusing on areas of structural conservation would suggest that you should be using the amino acid sequence. Using amino acid sequences will usually produce a neater alignment for exactly the reasons that they are not as good for gauging evolutionary change: substitutions in nucleotide sequences may not change the AA sequence (as Jonathan mentioned), but could indicate areas of evolutionary divergence (or could simply be errors in the sequencing method). You mentioned that you are interested in dating ancestors, though, so doing a DNA sequence alignment would likely be beneficial to you.
The standard practice when comparing protein families, folds, or super families has been to use the amino acid sequences. (See Zelensky & Gready's 2003 & 2005 papers on the analysis of the C-type lectin-like domain fold). It's important to note that one of the most crucial steps in creating a phylo tree is the alignment. Choose the best alignment algorithm for your purposes. For amino acid alignment, while many people use CLUSTALW, CLUSTAL is not as sensitive or as accurate as MUSCLE, ProbCons or T-COFFEE.
Usually a tree based on DNA is more informative, 3 times more base than aminoacids, I remember some methods for protein phylogenetics alignements, it can be quite tricky with scores from one aminoacid to another.... Protein can have an advantage only if there is post transcriptial modified aminoacid (rare case)
As you are working with a super familly I think that you may find several copies of your proteins / gene.... I am sorry to tell you that... trees are not very usefull.... you should try a dot plot to see conserved domains (on proteins) you can try a blossum and..... other tools, alignement is not the best to study your question ...
Rafael Alves: I think we're saying the same thing. As others have said, it's ultimately about what question you're trying to answer. If all you care about is the relationship between the sequences and if your sample represents a broad taxonomic range, then protein alignment is probably sufficient. If you're dealing with closely related sequences such as in a MLST type experiment, nucelotide sequences are better. If you want to conduct in-depth evolutionary analyses over a relatively narrow taxonomic range, codon alignments are best. Also, on the information side, the amount of information lost will depend on the taxonomic breadth of the sample sequences. For closely related sequences, translating to protein will loss synonymous mutation information resulting in lower resolution trees. But beyond a certain taxonomic range, the nucleotide sequences are going to lose information anyway due to repeated mutations at a given site, indels, codon bias, GC skew, etc. Beyond that range, you probably won't be able to perform any meaningful nucleotide-based evolutionary analyses anyway, so protein alignments are probably best.
Chris Hemme: Get it, i agree with you. I work basically with RNA virus and the saturation of information in the codons is the major problem. When the dataset has close time sequences the problems are minimized, but when comparing different species of Baculovirus for instance, which is a DNA virus that separate the species millions of years ago, the protein alignment need to be done before the nucleotide or codon analysis. After I did that analysis I started to use the protein alignment as a complementary analysis to compare with the nucleotide information even when dealing with close range datasets.
I agree with Chris. AA alignment are generally better in cases of homoplasy or distant related proteins.
Thank you all for your replies. After reading carefully your comments, I see there is a general consensus that phylogenetic information is contained mainly in DNA sequence alignments of closely related proteins (ie, where one can safely assume that mutational rates remained constant, there haven't been processes like horizontal gene transfer, etc.). This is certainly not the case in the superfamily I'm trying to analyze, where all that is preserved at present is the protein fold. In fact, aminoacid sequence alignments generated with MUSCLE and T-COFFEE gave rise to bootstrapped maximum-likelihood trees (PhyML) with negligible branch support for earlier nodes. When I managed to extend a structural alignment of structures from the PDB (obtained with the DALI server) using MAFFT, the trees are drastically improved. However, they are not "perfect" in the sense that the ancestor protein(s) still cannot be identified unequivocally. Maybe the ugly truth is just that the phylogenetic information for the whole superfamily is lost. Perhaps the general approach would be to use aminoacid sequences to identify clades or branches in the tree suggesting that some phylogenetic information is still contained among those leaves, and then proceed to analyze the DNA sequences of that subset, in search for phylogenetic relationships of more closely related proteins.
Javier González: When dealing with genes which separate a long time ago there is several problems to reconstruct the phylogenetic relations. I don't know if I understood your objective correctly but I have the opinion that, in your case, the ancestor protein will never be really identified or at least not with likelihood or bayesian analyses. The coalescence methods that we use now are good to infer phylogenetic events but not to predict early structural states and any assumption in that area will be subjected to a large margin of error. What you can do to infer something discarding transferences or large polymorphic regions is to identify the more conserved sites in all your sequences, with an Entropy algorithm for instance, and reconstruct the evolutionary changes from regions that maintained the same characteristics through all that time
DNA will usually be more informative, since you can unambiguously deduce the AA sequence from DNA, but not vice versa. But if sequences are very divergent, there will be more homoplasy/noise in the DNA than the AA, so the latter may be give you better-supported nodes. In practice though, this can also be achieved by allowing different nucleotides to evolve in different ways/rates in the DNA data, as most current model-based methods do.
If you have the protein structure, however, then a third level of analysis that includes spatial information for residues can be employed. This could reveal compensatory mutations, as found in RNA stems for example.
I agree that the DNA level gives you more information about the evolution of the investigated protein set - but an amino acid level comparison is more usefull if you want to make a comparison on the functional level.
Much depends on your data - namely the divergence. We found cases when for very large divergences, amino acids models should be used and not DNA or codon models. But more often DNA (or rather, codon data) is the most informative. Ideally, for protein-coding genes one should work on codon level, and not on the level of DNA or amino acids. Nowadays you can even align codon sequences directly without doing the amino acid alignment first (for example, Prank and ProGraph). We have written a review about various codon models and what they can be used for (Anisimova and Kosiol 2009, in Mol. Biol. Evol., and Kosiol and Anisimova 2012 in the book "Evolutionary genomics: statisttical and computational methods"). Also recently we have developed software fpr phylogeny search under codon models (manuscript under review). If anyone would like to try it, it's available from http://sourceforge.net/projects/codonphyml
We would be happy to receive any feedback.
If you are trying to determine if one protein is related by structure and function to another, the protein sequence is most telling, and it will also allow you to place the different proteins into related groups. Here you would be looking for conserved domains which determine these characteristics. I and my colleagues wrote a paper (years ago) on the unification of the the eukaryotic and prokaryotic family of ferritins. "The unification of the ferritin family of proteins. 1992. Proceedings of the National Academy of Sciences, 89:2419-2423". DNA trees will give you phylogenetic relationship as well but may miss the structure function relatedness because to retain these characteristics many of the aa's can be changed while retaining the same related structure function but the key residues will be retained both in their chemical characteristics (either identical or conservative substitutions) and in their location in the sequence.
I agree that question in mind will be the primary determinant of the desing of a study and therefore will dictate the level at which DNA sequences should be analyzed (AA, DNA or codons). For studies of structure and function, it is important to remember that structural and functional homology will not always coincide with evolutionary homology, as structure also changes over time (although not as fast as the sequence itself).
Be carefull to bifunctional genes; Check in these paper, especially the Candeias reference...
When one is better than two: RNA with dual functions.
Ulveling D, Francastel C, Hubé F.
Biochimie. 2011 Apr;93(4):633-44. Epub 2010 Nov 24. Review.
PMID: 21111023
It all depends on the purpose for which the phylogeny has been done. Phylogeny based on DNA gives the evolutionary trend in the sequence or gene whereas the protein suggests a functional significance if any during evolution due to mutation or otherwise.
Replying to Siva, if DNA is coding then it is always better to use codon models and so have protein-coding alingments to account for; (1) the structure of genetic code, (2) unequal biases at three codon positions, (3) selection on protein - all of which is modeled explicitly by codon models - see Anisimova and Kosiol (2009) for details.
in time-scaled phylogenies, DNA sequence analysis will turn out to be more informative. in this case if you are interested in getting the MRCA of your protein superfamily, then the DNA is the way to go. From the understanding that synonymous and NS mutations do occur, the issue is how many substitutions.
Javier, I must disagree with the general consensus of the above answers that DNA is the more informative -- at least in the case of analyzing a protein superfamily and its fold. The proteins you are looking at in a superfamily are very widely diverged, and over that span of evolutionary time there is far too much likelihood of multiple substitutions at a single DNA base (and thus more unobservable intermediate mutations, causing the divergence time to be underestimated); thus, using the DNA data to generate the alignments is very likely to lead to alignments that do not reflect the actual mutational history and thus incorrect phylogenetic trees. The protein sequence is under selective constraint for protein function and protein structure, and these are conserved over much longer periods than the individual codon choices. This is precisely the reason that BLAST searches for distant protein homologs, which are likely the evidence on which your superfamily of proteins was defined in the first place, use protein sequence comparisons instead of DNA sequence comparisons. Even when comparing aligned DNA sequences, unless the organisms are closely related the alignment should be constructed at the protein level, and then the amino acids should be replaced by their codon sequences to generate the DNA sequence alignment.
To put a finer point on that, the primary reason to align the DNA sequence instead of the protein sequence is that there may have been an insertion or deletion at the DNA level that could shift the reading frame of the c-terminal segment of the protein, or an insertion-deletion pair that shifts the reading frame through a (short) segment of the protein sequence. In these cases, a single event (or pair of events) can change a (short) segment of the amino acid sequence, leading the protein sequence alignment to overestimate the divergence time between the two proteins. However, (a) this is a very rare event because (b) the protein sequence is generally intolerant of that much change. If that much change can be accepted at that position in the protein, then that part of the protein must not be under very strong selection, and will not be terribly relevant to the structure of the protein.
In general I think it will be very difficult to estimate divergence times for organisms that diverged as long ago as, say, a bacterium and a mammal, both of which contribute a member to the same protein superfamily. The rate of evolution of two organisms that are that different is very likely to be quite different, and thus the "molecular clock" assumption that underlies using phylogenies to determine divergence times is not valid on such timescales. This is why divergence time estimation programs generally use DNA sequences: the model is only valid for short divergence times, over which protein sequences don't vary enough to provide enough information, but DNA sequences do.
I think it's better to make both, first any quality and then DNA sequences of it.
I agree with Max, DNA sequence alignment is more informative than protein alignment
Actually, Hanan, I said that protein superfamilies have typically diverged over such a long time that the DNA sequence cannot be reliably aligned, and the protein sequence alignment is more reliable. So in this case, the protein alignment is more informative. Over much shorter evolutionary times -- 100 million years or less, perhaps -- coding DNA sequences _may_ be alignable, depending on how rapidly the organisms (and their proteins) are evolving. For example, mouse and human coding sequences (and some non-coding sequences) are alignable at the DNA level. However, the coding sequences of a single gene from two strains of E. coli that have diverged for considerably less than 100 million years are unlikely to be reliably alignable as DNA sequences (because E. coli has a very short generation time, and thus a very rapid evolutionary rate), but may still be reliably alignable as protein sequences; whether they are or not depends on how much purifying selection the protein is under.
In DNA seq, number of characters (nucleodites) are higher than that of a.a. sequence. Thus, DNA might provide alternative brannching patterns for certain nodes due to the high character sets. I am working on two closely related species, topology based on protein differs in only one charachter (a.a) however, 5 characters in cDNA seq of the same corresponding allignment. I like the former topology, since it gives me high accuracy testing my hypothesis.
Vera Hemleben
You have to observe several points for the DNA sequence: Is there RNA editing, is there a tendency of higher (or lower) GC content in the organisms compared, how many silent mutations, or are you looking at a paralogue gene. And, of course, with the amino acid you get the functional conservation which might be more informative.
see: Dressel, A. and Hemleben, V.: Transparent Testa Glabra 1 (TTG1) and TTG1-like genes in Matthiola incana R. Br. and related Brassicaceae and mutation in the WD-40 motif. Plant Biol. 11: 204-12 (2009)
Maybe making a tree with DNA is more informative because it can include neutral evolution events (kimura) which can not be done with proteins.
I think that in the first line of your question you answer yourself: "I am currently working on a phylogenetic analysis of a protein super family." If you want to do an plylogenetic analysis of a protein superfamily, you want to use the protein.
Comings and goings from the different answers tell you that you might get different things from analyzing the DNA or the protein, and I think this is true, but sticking to what you stated, Definitively PROTEIN
Thanks Jose, analyzing protein sequence alignments was my first guess too, but what programs are available to allow obtaining reliable age estimates of as many ancestors as possible for a given protein sequence alignment and/or tree?
As a side note, I've found a program called PHYRN (http://www.ncbi.nlm.nih.gov/pubmed?term=22514627), has anybody applied it successfully?
you can see it in your alignment... if protein sequence is very conserved go for nucleotides, if not, go for amino acids. you have to chose anything that will give you enough information. if you have enough information in protein sequence then definitely chose protein sequence for analysis. from my pint of view, if i were you I would do both (compute phylogeny in NUC and AA) and then decide which data give me better topology and statistical support.
by the way, you can even combine AA and NUC into one matrix and run it in MrBayes... POST YOUR ALIGNMENT OR BETTER A FASTA FILE (without your sequence of course), IT WILL BE EASIER TO DECIDE THIS WAY. Alternatively I can compute the whole thing for you if you award me with an authorship in your publication.
DNA sequence based phylogeny may give an idea on the gene family and evoutionary trends whereas the one based on amino acid sequence tells about the functional variation , if any , has beenattributed to the changes which according to me is more meaningful than the earlier one.
@ Siva:
The two "evolutionary trends" and the "functional variation" are not independent. In fact they are very tightly interrelated. Therefore, protein-coding DNA analyses would can be very informative and better reflective of biological reality (despite the fact that no model is ever true). Have a look at my papers on codon models and natural selection: http://people.inf.ethz.ch/anmaria/publications.html
Even when you are worried about sequences being too divergent, codon models will never be worse than amino acid models (for example see work of Seo and Kishino 2008; 2009 published in Mol. Biol. Evol.). It is possible to do dating of speciation/duplication events using codon models. However, this is also possible with amino acid sequences.
@Maria
I agree with you that evolutionary trends and functional variation are not independent. However, all the nucleotide divergence with silent mutations need not or wil not be reflected in the protein function.. A protein variation assumes importance only when the amino acid variation has really resulted in a functional variation. In this connection many of the orthologous sequences have neglible functional variation. As you rightly put is speciation due to duplication occurs with divergence. So the phylogeny based on nucleotide sequence may be an indication of the evolutionary trend which will be finally reflected in the functional variation .
@ Siva:
That's right, just two more comments:
(1) if you do use DNA data for protein-coding genes, you'd better account for the structure of genetic code (codon triplets and different nucleotide patterns at teh three codon positions). Markov DNA models don't do that, while codon models do (although they are more expensive computationally, it's worth trying them).
(2) it is premature to conclude that synonymous substitutions are always silent or neutral. There is a growing body of evidence suggesting that such "silent" mutations can influence the protein - splicing, expression, abundance, structure, and function!
The protein alignment does NOT contain more information - despite having more potential states (20), the same DNA corresponding to one amino acid will have 4x4x4 potential states (yes, I know it's fiddly because certain combinations *won't* be there, but play along) for a total of 64 combinations. Now add this complexity - mutation and evolution occur at the level of the DNA, but the actual selection occurs at the level of the protein! So while you probably get a more evolutionarily informative tree from a DNA parsimony/bayesian analysis, there may be something to be learned from *also* doing a protein-alignment based tree. Do both, compare the trees. You'll likely see strong correlations, but maybe you'll be lucky enough to see a significant *difference* - which may be telling all on it's own.
What I would like to say is that the phylogeny based on aa sequence has more functional significance in terms of its biology than just a DNA sequence wherein some point mutations go unnoticed and has no relvance to the protein .It may be significant from an evolutionary point of view but not immediately. The question here is not which phylogeny is great but which is relevant to the biology which we are trying to understand.
Depends on your area of interest:
1. multiple alignment of nucleotide sequences:
useful for determining variants / motifs and to get information about the family. The variant information is useful only if the gene under question is expressed and its phenotype is known.
2. multiple alignment of protein sequences:
used for determining motifs/ domains and their hierarchy and a phylogenetic tree. This information is used only when protein functional details are necessary.
so, multiple seuence alignment is a technique and it depends on your area of interest
It totally depends on what you are looking for. DNA based alignment are more sensitive, suppose two different sets of sequences have entirely different alignment or are at a distance in a tree. You would thing there are different, but may be both sets translate same sets of protein. A protein alignment show that both sets are very similar and are very close to each other. If you want to study mutation at DNA level or want to do evolutionary study go for DNA alignment, otherwise you need to go to protein level study. There is plenty of other information you can incorporate in protein level study i.e. structure information, domain information, protein family studies etc. In your case i would go for protein level study.
It all depends if you are interested in deep nodes or recent divergences, or both
Between sequences that diverged recently, most mutations will be silent (synonymous), i.e. no change in the aa sequence. Go for DNA.
Between distantly related sequences:
- homoplasy will accumulate in positions not affecting the aa sequence => noise increases.
- when comparing aa sequences, you can use a weight matrix that weights different aa at the same position according to their effectivness in keeping for example the function of a protein (you need to experiment which weight matrix is best for you, read the documentation). This is a clear advantage to DNA/DNA, apart for weighting differently transitions vs transversions, there is non weigth matrix.
best
Hallo, for your analysis in MrBayes, you can back translate the amino acids to the respective codons and use this to determine the ancestral state. Amino acids are better for MSA in distant related sequences but DNA is better for MSA on sequences that are evolving at a rapid rate. Thanks
I think that might be difficult, because you d'ont know the codon usage of the organism you are working with.
DNA sequences are more informative than amino acid sequences to study phylogeny, as the following some of researcher explain.............
Of course DNA seequences are very important in phylogeny and when we look at speciation the end result has to be in terms of changes in aminoacids if these are purposeful or just random with no cosequences on the gene function
While DNA contains the genetic information, the latter uses a degenerate code. Therefore both from evolutionary and functional point of view it is the amino acid sequence that is most important to phylogeny. Furthermore, if a nucleotide sequence undergoes single base mutation the result may be meaningless both in terms of the wobble as well as codon multiplicity. Indeed, a coding sequence itself is not a consequence of a one time event but something that occurs through multiple trial and error steps. Here, I don't mean to make a judgement about 'trial and error', but simply as part of process whose final outcome is a translatable ORF. Therefore, it is the amino acid sequence that is more relevant to phylogeny and phylogenetic relationships.
this is what I do- I construct both trees and choose. it may happen that aa tree has little information and is therefore poorly resolved. in this case use nuc tree. aa tree is, however, more robust. I would not trust nuc tree when I can see nicely resolved and suported aa tree. since it looks like you work with very diverged proteins it makes noo sense to make nuc trees. but you can try. its always good to "know" data you are working with. know how they behave under different conditions, know how robust your phylogenetic inference is. beware of compositional bias and fast evolving sequences (long branches). it ruins your phylogeny.
The relative value of AAs and nucs for resolving relationships depends on the evolutionary distances between the gene/protein sequences, and their lengths. This is important in virus comparisons where the sequences of particular genes may be easier to obtain that those of the full genomes - so the twigs of the tree may be best determined by nts sequences, but the base of the tree may involve sequencing whole genomes. To be sure it's best to use the same method for comparisons, such as ML, and then use bootstrapping to check which nodes can be relied on in each of the trees. I and Kazusato Ohshima explained this for potyviruses in "Potyviruses and the digital revolution" Ann Rev Phytopath 2010, 48, 205-223. I believe there is a copy in ResearchGate, otherwise email me. Good luck
It depends on your material and your genes you are working with:
For nuc phylogenies with mitochondrial genes you should be aware of RNA editing.