I know of a couple of measures of signal:noise ration in phylogenetic analyses, such as the consistency index and the retention index, but most phylogenetic reconstruction software packages do not compute these values and include them with the tree output.  Bootstrap values can be somewhat related to phylogenetic signal, but more often are related to the ratio of informative to noninformative (invariant within a clade) sites in the data set.

With different data sets, I would expect that the consistency index or retention index value for good data would be different.  For example, with perfect sequencing (no sequencing errors at all) of human, chimpanzee, gorilla and orangutan I would expect CI values close to 1.  But for perfect sequencing of of the same DNA region from mammal, frog, salamander, fish, insect, the best value expected might be closer to .6 or so.  And it might be different if we have 10 widely diverse samples from each organism than if we have just one of each (10 mammals, 10 frogs etc).

For most phylogenetic discussions I see here in ResearchGate and many other places, the focus is on the models and methods of analysis.  But in my experience, troubles with phylogenetic reconstruction giving impossible results is essentially always due to problems in the data, rather than problems in the model or method.  Good data will create a reasonable tree with any method, but it is frighteningly easy to come up with a bad data set for example due to misannotation of genes or organisms in GenBank.   And even with "perfect data" there can be problems with saturation of variable sites if we for example compare sequences from all lentiviruses, or DNA polymerase from bacteria, plants, animals, fungi.

So, if you know of good review articles covering this topic, or you know of software that is useful for checking multiple sequence alignments for various types of quality, please post them here in the answers.

More Brian Thomas Foley's questions See All
Similar questions and discussions