Following several years searching for good phylogenetic hypotheses I'm convinced that bootstrap is not a good test for phylogenetic trees. Instead, I prefer to compute the posterior probability through Bayesian Analyses.
Bootstrap values tell you something very specific (which is different to the posterior probability), and for that purpose they are exactly what you want. The problem arises when people interpret them to mean something else. You need to decide what question you are trying to answer before you choose which measure is the most relevant.
Bootstrap values give you a measure of how robust to *noise in the data* a given edge of your tree is *under a given method*. It does not tell you how reliable your tree or your method is. It's also worth noting that the bootstrap values are only a measure of how robust the edge is to the kind of noise that you apply.
Bootstrap values tell you something very specific (which is different to the posterior probability), and for that purpose they are exactly what you want. The problem arises when people interpret them to mean something else. You need to decide what question you are trying to answer before you choose which measure is the most relevant.
Bootstrap values give you a measure of how robust to *noise in the data* a given edge of your tree is *under a given method*. It does not tell you how reliable your tree or your method is. It's also worth noting that the bootstrap values are only a measure of how robust the edge is to the kind of noise that you apply.
Ok, I agree with you... But, this robustness to "noise in the data" is really important? I'll think about it in my next work. In fact, I use the test only to avoid criticism from reviewers...
There is other factor to be considered: when you use very large sequences (such as complete genomes of some bacteria - I did this kind of analyses) your bootstrap values are 100% in ML, MP and NJ. In my case, using 8 different groups of complete genomes, the topologies obtained using different methods (with bootstrap 100% in each node) were not identical... These 100% bootstrap means that the noise is lower than the phylogenetic signal of my data or means that using very large databases the pseudo-sampling is not enouth to detect this noise (in ML is impossible to do more than 1000 replications using 100,000 bp, at least using my computer).
hmm, and in my case I use it (basically also to avoid criticism) with morphological characters , which is a small dataset compared to sequences. My bootstraps never get higher than 70% (50% is the average of my clades), although I have already obtained 99% to a group very very well supported, with a high number of Bremer support. But it is really sad to me to see my low numbers of bootstrap compared to the analyses with molecular data.
Nice answers... As Julian noticed BS values gave a measure about probabity of taxa given in a clade for certain taxon. But Yokotos's thought is also critical as to whether BS is realy important. In this case, SplitTree,in which network reticulations are used to clarfy the noise or uncertainity of data, might be suggested for robustness, as an alternative methology to the phylogenetic trees.
Once a multiple sequence alignment is in hand, a phylogenetic reconstruction method (or methods) can be chosen, i.e., NJ, MP, ML or UPGMA. If the right types of data are used, distance methods (NJ and UPGMA) can be a very powerful tool in phylogenetic analysis. One limitation of both the distance and parsimony methods is that although they may select one tree over another on the basis of some criterion, it is not possible to say how much more probable one tree is than another. Likelihood and Bayesian methods have been designed to provide such a statistical framework for phylogenetic reconstruction. Likelihood and Bayesian approaches are more statistical than the other methods. To understand how likelihood-based phylogenetic methods work, it is useful to consider the concept of likelihood in a statistical sense.
On the other hand, to assessing how well a tree represents all of the data is to resample the data repeatedly and reperform the phylogenetic analysis to see how often the same result is obtained from these resampled (and nonidentical) datasets. Resampling can be done by bootstrapping in which the characters (e.g., alignment columns) are resampled with replacement, or by jackknifing, in which the characters are resampled without replacement. Generally, 1000 of these new resampled datasets are generated and a phylogenetic tree is built from each of them. The new trees are then compared to determine in what fraction of the trees particular evolutionary groupings are found. It is
very important to realize that these tests do not determine how accurate a tree is, just how well it reflects the underlying data. If the data are biased in some way (e.g., there has been significant convergent evolution), the result can be high bootstrap or jackknife support for an incorrect tree.
All phylogenetic methods make assumptions about the evolutionary processes that underlie the character changes being studied. Because the accuracy of these assumptions is not always known, methods are also evaluated by comparing their degree of dependency on these assumptions (i.e., their robustness). Each method used to construct phylogenetic trees has its advantages and disadvantages. Some researchers favor one method over another on principle. Some criterion for evaluation of method are efficiency (i.e., how fast each method performs), consistency (i.e., the method will reliably generate the correct tree), the method’s power, robustness and falsifiability (i.e., whether or not the results produced will allow us to determine if the underlying evolutionary assumptions have been violated). This is especially important for methods that are not very robust.
Well, another problem that I see in bootstrap is the resampling method: it has the assumption that the aligned sites are independent. Everybody knows that they are not independent. DNA sequences depends on the order in which the nucleotides are arranged so each replication is just an aberration.
Ok, if you think about bootstrap as a way to determine how well the tree reflects the underlying data (following Bhattacharjee), it means that bootstrap measures the homogeneity of the data? Whenever I think about it I just think that there is no reason to believe that a sequence actually have to evolve homogeneously. The active site of a protein, for example, be encoded by a sequence more conserved than the globular region. It is reasonable to test a topology by the homogeneity of the sequence that generated it?
In this case, Amanda Mendes found low bootstrap values because her morphological traits did not evolve homogeneously? why one should take that into consideration, knowing that different characters actually obey distinct evolutionary mechanisms?
I asking about bootstrap because I really want to learn more about it in order to understand why phylogeneticists (reviewers and editors of many reputable journals) insist on bootstrap tests.