Hi everybody,
after building some very simple trees based on 16SrRNA for a publication and reading about bootstrapping, I am getting a bit confused on the proper interpretation of bootstrap values.
Bootstrap will create a bunch of trees using newly made sequences, by selecting random column in the original alignment. This will basically answer the question: Is my tree supported by the whole alignment or only some part of it? So, why is that linked to: can I trust my tree?
Many reviewers will complain about bootstrap values under 70, why? If the value is low, it does not mean that the tree is wrong, simply that only part of the alignment shapes the tree. No? What is the problem with that?
On top of this, as far as I understood, bootstrapping completely omits the weight put on different parts of the sequences (eg, gamma corrections) and neglect evolutionary models.
In this example of 5 short sequences:
AA
CA
AA
GA
TT
in the second column, you can trust that sequence 5 is different from the 4 others, and a A mutated to a T.
However, in the first column, where there is obviously a high rate of mutation, it is impossible to say if sequence 1 and 3 are same because they are photogenically close or just by luck after several mutations both ended up with a A.
This has influence on the tree, and I don't think bootstrapping cares about it.
Too finish, if you would look at the few bp before these sequences: If they are conserved, this should make you think that 1 and 3 are indeed same as we are obviously in a conserved area. If they are completely different, then it is more likely that the 2 A are the result of several random mutations. If you pick randomly different columns (bootstrap), I think you loose the info that is around the column.
So, can somebody help me for this? I am really confused...
Thanks.