When I am constructing a phylogenetic tree with 1000 bootstraps in MEGA software, the tree is showing a very low bootstrap value (1, 3 and 5) for some branches. What interpretations/reasons can be made for these low values?
As mentioned, bootstrap measures the (proportion) of times that a node is found when re-sampling WITH REPLACEMENT the informative sites, thus, in a particular re-sample, some sites are ignored while others are taken into account more than once. Also, it is very important to take into account the number of OTUs that the phylogeny includes. The number of possible trees grows very fast as function of the number of OTUs (see Felsenstein, J. (1978). The number of evolutionary trees. Syst. Zool., 27, 27–33.), thus under similar conditions of distances, larger trees will have a tendency to have lower bootstrap values, simply because some nodes are not (frequently) found in the samples -you can test that doing a sub-tree of your data. I find it useful to make a Jackknife analysis also; it tell you if (some nodes) depend only on one site... Hope it helps.
Bootstrap values just measure the self-consistency of your data. Low values means that sampling different columns of the alignments give you different tree topologies. This can happen if the number of informative sites is too small or unrelated sequences are aligned together.
Bootstrap involves site-wise re-sampling of the aligned sequences with replacement, sequences of N sites long will be re sampled N times to generate a new set of aligned sequences of the same length. A value of 90 or higher associated with a sub-tree is generally taken as strong statistical support for the sub-tree. Then I guess the very low values might be less significant
If the bootstrap values are low it might suggest that the sequences are not related in a tree-like way. In other words, there was recombination or gene flow between different 'branches'.
The branches with low bootstrap values suggests that the members on the branch should not be divided into two separate groups as it seems. On every iteration of bootstrapping the members of the low bootstrap value branch had been jumping into other branches. It can happen due to either insufficient number of informative sites or due to presence of chimeric genes due to recombination or gene flow.
Also, don't forget the alignment itself and the distance measures. Recombination can mess things up but I suspect a more common issue is sequence and alignment quality. You might have more joy if you restrict your alignments to regions where the sequences are visually aligning well, otherwise you may well just be modelling noise. Gap treatment can have a big influence too. If some of your sequences have large deletions (e.g. missing termini or exons), then they can get dragged to different parts of the tree depending on which parts of the alignment get re-sampled in the bootstrapping. If it's a distance-based tree, that might also have an influence as you may have saturated your distance calculations. (Or, at the other extreme, there may be almost no changes.)
As mentioned, bootstrap measures the (proportion) of times that a node is found when re-sampling WITH REPLACEMENT the informative sites, thus, in a particular re-sample, some sites are ignored while others are taken into account more than once. Also, it is very important to take into account the number of OTUs that the phylogeny includes. The number of possible trees grows very fast as function of the number of OTUs (see Felsenstein, J. (1978). The number of evolutionary trees. Syst. Zool., 27, 27–33.), thus under similar conditions of distances, larger trees will have a tendency to have lower bootstrap values, simply because some nodes are not (frequently) found in the samples -you can test that doing a sub-tree of your data. I find it useful to make a Jackknife analysis also; it tell you if (some nodes) depend only on one site... Hope it helps.
Hi everyone, I've formerly encountered with this problem about phylogenetic analysis in MEGA5 and i select Pairwise deletion Instead of complete/partial deletion for removed gaps/missing data treatment. then my problems is solved. I can this way recommended you
Hi everyone! I had the same problem, although I am working with mtDNA, and the phylogenetic tree was made for one single species. But the evidence of gene flow is the answer I needed. Could anyone provide some references on that subject: low bootstrap values = evidence of gene flow? Thanks!!
Catarina, I thought that the mitochondrial rarely, if ever, recombines as it is (almost) uniquely maternally inherited. Is this no longer the prevailing wisdom? In the absence of other data, I would suspect that a lack of variation, poor alignment and/or homoplasy are far more likely explanations of low boostrap values than gene flow. If your sequences are all from the same species, it may be down to a lack of variable sites - or informative sites if you are using maximum parsimony.
Hi Richard. Thanks for the answer. I know mtDNA almost never recombines. However individuals can migrate from one region to the other, or may have migrated, and in this way we may have gene flow between populations. Or is it wrong to call it gene flow?
If two of my isolates fall into a single monophyletic branch with a bootstrap value of 99%in the NJ phylogenetic tree, how do I interprete this? Does 99% bootstrap mean that the two isolates are more closely related or does it mean that they are distantly related?
@Catarina, it is still gene flow - that bit I was not arguing with - but it will not have any affect on the bootstrap values of your mitochondrial trees. Unless there is recombination WITHIN the sequence that you have aligned and are making a tree from, gene flow will not result in conflicting trees from different parts of your sequence (and thus reduced bootstrap support). Does that make sense?
Low bootstrap values indicate a lack of consistent signal across your alignment. This could be due to different parts of the alignment having different trees but it could also be due to a poor signal:noise ratio (few variable/informative sites and/or poor alignment) and/or homoplasy, i.e. independent shared mutations, which will be more common in mutation hotspots. Because mtDNA is non-recombining, you can essentially rule out gene flow/recombination as the explanation. As all of your sequences are from the same species, there is a fair chance that the sequence diversity is low and your low bootstrap values might be indicative of this lack of information. You might have more joy if you concentrate on the variable regions of the mtDNA - remember that bootstrapping is a random sampling method, and so if the random samples are likely to pull out predominantly invariant sites, the chances of getting the "right" tree are going to be small.
The other problem is if you have a more "star-like" phylogeny, where the evolutionary time since the last divergence is much higher than the evolutionary time between splits. (i.e. short basal branches and long terminal ones). In this scenario, homoplasy is relatively high versus informative sites and the signal is too weak to get decent bootstrap values. I can dig out a reference for this latter issue if you like?
@Ahmed Hassen, it simply means that your data consistently supports those two isolates grouping together when the NJ algorithm is applied. It says nothing about how closely-related they are. They could be the two most divergent sequences in your tree and all the others are forming a clade with high support, or they might be more closely related to each other than any other isolate. It all depends what the branch lengths look like and where the tree is rooted.