A similar question would be: "How can you tell if a history book is accurate?" We cannot replay history, nor time-travel, so direct observations cannot be made. Our history book will be biased in two fundamental ways: the information being interpreted may be incorrect (or, strictly, does not describe the past), and that information (correct or not) may be interpreted incorrectly (or, strictly, interpreted in a way which does not infer the past). These, in a phylogenetic context, are signal errors and systematic errors. With these errors in mind, we must admit that we can never tell if a phylogeny is accurate (except in special cases where we have made direct observations on the past, for example when culturing bacterial strains in the lab).
All is not lost! We need to rephrase our question: "How can you tell if a history book is rational?" We cannot tell if the book is true, but we can support why we rationally believe it to have a certain amount of truth. This support might come from how well independent lines of evidence agree; how we identify uninformative information; and how we assess the realism of our evolutionary simulations. Essentially, we make observations on the present to infer the past, while cautiously leaning on the assumption that things proceeded in the past in a similar way to how they proceed in the present.
Stopping short of the philosophical questions of what rationality is: - we cannot tell if a phylogeny is accurate, we can only tell if it is rational to believe that it is accurate.
At least you can say that a tree is a good fit to the data, for a given evolutionary model. Usually you can also say that certain trees provide a better fit to the data than others under almost any realistic model of character change. You can also say whether different datasets, or different partitions of a dataset support the same tree, and whether this is consistent with other information such as fossils or well dated historical events that would have impacted these taxa. Independent verification is the gold standard for assessing support of any hypothesis, isn't it?
In a strict sense support metrics can only assess the precision or 'fit' of the data to the tree, evaluating the accuracy of any particular tree compared with the true evolutionary tree is not possible. Simulation studies can on the other hand provide some insight into how well tree search algorithms recover a correct topology with a artificial known dataset, and by extension known 'true' tree.
In the real world I'd share the total evidence sentiment that if multiple independent datasets(morphology, fossils, development, molecular) agree on a topology it is more likely to be a correct one.
Of course, a well-supported tree could fail to reflect the "true" phylogenetic relationships. This is the main concern of those worried about the gene tree-species tree problem, long branch attraction, etc. In order to recognize these phenomena in the first place, one must have some sort of a priori idea about what the truth is - or at least a pattern of incongruent topologies from different data partitions. The former of these represents a correspondence theory of truth, which seems to be the position of people who talk about "realistic models" and that sort of thing (when you say something is "realistic," presumably you mean corresponding to reality in some way). The latter (congruence) represents a coherence theory of truth - "truth" is a confection of the agreement of evidence, and the more evidence that agrees, the more plausible the result is - but still no guarantee that it corresponds to the Kantian "thing-in-itself."
There are thousands of data sets available, for which the "true tree" is known very well from many independent sources of data. For example, in the primates there is fossil data, morphological data etc to show that chimp/human/gorilla shared a common ancestor more recently than those 3 to orangutan etc. For HIV-1 we have many data sets with known history. With deeper branchings, we know that amphibians preceded marsupials which preceded mammals, etc.
There are also many ways to generate artificial sequences with known histories, and to retain true ancestors as the sequences are being artificially evolved. And then we can test phylogenetic reconstruction methods on that data.
Accurate reconstruction of a known HIV-1 transmission history by phylogenetic tree analysis.
Leitner T, Escanilla D, Franzén C, Uhlén M, Albert J.
Proc Natl Acad Sci U S A. 1996 Oct 1;93(20):10864-9.
PMID: 8855273
Those types of studies can give us a very good idea about what types of data and phylogenies built from them are very likely to be correct, and which ones are going off into guessing territory. For example, although we can very accurately reconstruct HIV-1 transmission histories on a one to two decade timescale when we look at data from the HIV-1 M group overall which has evolved over 80+ years there are many recombination events and other problems such that the gene trees no longer coincide with trees built from subgenomic regions for many of the lineages.
With human/chimp/gorilla/orangutan type data, we can see that complete mitochondrial genomes tell one story, while some other data sets have different histories do to incomplete lineage sorting and other issues. Studies of differential evolution of the X chromosome vs autosomes for example provides a bit of a window into sexual selection.
Great ape genetic diversity and population history.
Prado-Martinez J, Sudmant PH, Kidd JM, et al.
Nature. 2013 Jul 25;499(7459):471-5.
doi: 10.1038/nature12228. Epub 2013 Jul 3.
PMID: 23823723
Evolution of gene function on the X chromosome versus the autosomes.
Often, for species where we do not have a solid fossil record or other information to support the molecular phylogenies, such as evolution within one group of insects (as opposed to determining only if a new insect is closer to ants, bees, flies or beetles) it can be useful to compare the data we have to another set of date that is very equivalent (same range of pairwise distances etc) from another group that does have a fossil record. For example we can see that comparing HIV-1 M group to HIV-1 O group (no fossils) is more like comparing mammals to snakes, turtles, lizards, amphibians and fish, than it is to comparing organisms within the mammals (distances are far beyond saturation of silent sites).
Each group of organisms is going to have its own unique biology which dicates the type or problems which might be encountered in phylogenetic work. Bacteria don't have diploid genomes and sexual selection like many eukaryotes have. Some fungi have hundreds of sexes (mating types) rather than male/female. Most bacteria are subject to massive amounts of horizontal gene transfer via phages, plasmids and random uptake of DNA. Organisms that fly or swim or send spores or pollen great distances are less likely to experience physical barriers which create subspecies and species, than organisms which cannot get across a river or mountain range.
If the tree from mitochondrial DNA in your set differs from the tree from most nuclear genes, the question is less often about which tree is "correct" and more often about what those differences can tell us about the evolutionary history. For a good example in that area, see the work on Neanderthal/Denisova/Human/Chimpanzee mitochondria vs X, Y and autosomes evolution.
Analysis of human accelerated DNA regions using archaic hominin genomes.
Burbano HA, Green RE, Maricic T, Lalueza-Fox C, de la Rasilla M, Rosas A, Kelso J, Pollard KS, Lachmann M, Pääbo S.
PLoS One. 2012;7(3):e32877.
doi: 10.1371/journal.pone.0032877. Epub 2012 Mar 7.
PMID: 22412940
Neanderthal and Denisova genetic affinities with contemporary humans: introgression versus common ancestral polymorphisms.
I guess that this depends on the definition of accuracy... in theory the best way should be a contrast towards PHYLOGENY (the single, actual one) which by now is not possible. But let's wait a couple of years!
What my experience is that there is no "precise tree" in phylogenetic analysis. All you can obtain is a tree that supports most evidences and a tree that you think can support your hypothesis based on the morphoogical, distribution or whatever the character that you consider outside the molecular data. However, a tree can be considered to be "the most accurate" if the same tree is obtained from different analysis, such as different gene regions, different loci, different DNA sequences and/or protein sequences. And different tree topologies, such MP, Ml, UPGMA, NJ, etc.
A similar question would be: "How can you tell if a history book is accurate?" We cannot replay history, nor time-travel, so direct observations cannot be made. Our history book will be biased in two fundamental ways: the information being interpreted may be incorrect (or, strictly, does not describe the past), and that information (correct or not) may be interpreted incorrectly (or, strictly, interpreted in a way which does not infer the past). These, in a phylogenetic context, are signal errors and systematic errors. With these errors in mind, we must admit that we can never tell if a phylogeny is accurate (except in special cases where we have made direct observations on the past, for example when culturing bacterial strains in the lab).
All is not lost! We need to rephrase our question: "How can you tell if a history book is rational?" We cannot tell if the book is true, but we can support why we rationally believe it to have a certain amount of truth. This support might come from how well independent lines of evidence agree; how we identify uninformative information; and how we assess the realism of our evolutionary simulations. Essentially, we make observations on the present to infer the past, while cautiously leaning on the assumption that things proceeded in the past in a similar way to how they proceed in the present.
Stopping short of the philosophical questions of what rationality is: - we cannot tell if a phylogeny is accurate, we can only tell if it is rational to believe that it is accurate.