It is always necessary to use 1000 bootstraps during tree building or can lesser values be used. If so , then how can one determine the bootstrap values during tree building?
Typically 100-2000 BS replicates are used to estimate tree reability depending on the time required for each replicate (you can think of your computer quality and program that used in this case ;). NJ and parsimony are fast methods, you can almost run a BS on trees constructed by those methods, and you can try 2000 BS replicates. ML can take a very long time to execute 2000 BS, you may try 100 replicates as lower limit offered by Barry G. Hall (2011).
It is important to separate tree building from tree search. NJ build trees while MP and ML search for the best tree. In any case, you perform pseudo replications of your alignment and build or search for the best tree of the replication. This seems me very complex and unwarrantable, mainly in ML, when one define "a priori" a substitution model, which fits with the original alignment and does not fit with the pseudo replication.
I really think that people worried about good phylogenetic hypotheses must to concern about it.
Dear Yokoto, I dont think someone defines an "a priori" substitution model. Since you premise "a model". Thats why psedoreplication is neccesary to measure the probability of the taxa given in a clade, are they members of that clade or not? there is absolutely no waranty... and (maybe!) there is no original alignment, either...
Bootstrapping is a re-sampling technique, so from theoretical point of view the optimal number of replicates depends on the length of your alignment. (I suppose that you want to do phylogenetics from sequence data.) The exact function is not that obvious, that is why the 1000 replicates is used as a good thumb of rule. (But the longer your alignment, the larger number of replicates would be needed to achieve the same levels of estimates.)
But have a look on this paper:
http://www.ncbi.nlm.nih.gov/pubmed/20377449
In that you can read a practical approach and a solution, with which you can calculate the optimal number of replicates for your particular dataset.
Thank you for the replies. Since i am working on proteins and using amino acid sequences to build trees.. mainly ML.. should the bootstrap values be altered/changed in accordance with the length of the sequence ?
The bootstrap _values_ should not depend on the length of the alignment (number of residues in your sequences). The ML methods can handle (for a certain extent) gaps in your alignment, and the poorly aligned terminal regions should not included to the analysis anyway.
Regarding the number of needed _replicates_, yes, you should use more replicates for longer alignments, but the best someone can do is to apply the software in the linked paper (or some similar one) to estimate the optimal number of replicates.
I did some tests using different number of concatenated sequences in several groups of bacteria which complete genome were availabe. My conclusion was that the longer the sequence (more genes concatenated), larger the bootstrap values... By the way, it was impracticable for me to do more than 1000 bootstrap replications for each test.
in my case, using highly conservative sequences, the number of BS does not make sense to infer phylogenetic relationships. From 100 to 2000 BS give almost similar results. Thus, not only certain size but also homology of the sequences should be important as to whether how much BS is used or not.