Many people spend a lot of time answering this question as a preliminary to other analyses. I never see published the outcome of this work. Please can someone tell me how much better one model is than another and whether the choice significantly affects the outcome (topology of tree or evoutionary rate or TMRCA). Is this a real issue, or an issue invented by statisticians with shared in computer companies?
Thanks adrian, you really concentrated on the core of the problem as there is no clear identification for this point most of publications make them without explanation of why? And also some of publications use more than model
Hi Mariana and Hani. Thanks for responding so helpfully. No, I didn't explain clearly enough my problem, it is like this. Say I go thru all the rigmarole using PAUP, and I am told to use a particular model, how can I be sure that I wouldn't have got an equally correct final set of conclusions by just picking one model at random, and saving myself a lot of angst. In essence, what is the sensitivity to the outcome of subsequent analyses to the model chosen, and the work involved. I suppose that I should go and check for myself, but if you could point at some comparisons, I'd be grateful.
The models you are describing relate to the assumed (modeled) rate of change between nucleotides. In the Likelihood and Bayesian methods branch lengths are estimated based on these models. Models of sequence evolution are used to make corrections on the estimates of genetic or evolutionary distances. The more diverged a pair of lineages (sequences) is, the more likely it is that they will have accumulated multiple substitutions at their fast-evolving sites, which results in the accumulation of a stochastic signal in the sequences (homoplasies). The way I think about it is that the models can correct for over or underestimation of genetic distance, e,g, for a gene that has fast rate of change you may conclude two related taxa are more distant than they are.
The models give probabilistic estimates of rates of change between bases at a site, and you should test your alignment to see which model is best given your alignment. It is usually a short step taken right after getting your alignments right. Often the topology is similar for different models, but the amount of evolutionary distance to a most recent ancestor could change.
A free user friendly program for phylogenetic analysis (and model selection) is described in this article: Koichiro Tamura, Glen Stecher, Daniel Peterson, Alan Filipski, and Sudhir Kumar (2013) MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Molecular Biology and Evolution:30 2725-2729 http://www.kumarlab.net/publications
See also:http://www.ccg.unam.mx/~vinuesa/Model_fitting_in_phylogenetics.html for a long explanation.
Thanks for that explanation however if the two most used criteria for deciding which is the best model to use (hLRT and AIC) often give different answers, then what is a mere mortal like me to do. Perhaps just use the models that are most frequently used by others (say, GTR for nucleotide sequences and Blosum 62 for amino acids). Provided that choice is clearly stated then others can check. However, going back to my original question, if I use the wrong model, how far out are my conclusions about broad scale topologies, evolutionary rates and TMRCAs likely to be, given the very wide error estimated by HPD95s? It's one of those "can't see the wood for the trees" problems.
Please I am in a very need help to detect the molecular clock claribration between bufo species samples using MEGA6 methods as I dont know the min and mx divergence and i want to estimate the TMRCA , Itried to do it with BEAST but i could not do it.
by MEGA what are the divergence times i have to put in min and max time
JMODELTEST is a good choice. You can download the software from this website (http://darwin.uvigo.es/our-software/). Also consider PartitionFinder (http://www.robertlanfear.com/partitionfinder/), in case you have different loci.
In the manual of JMODELTEST the authors raise an interesting point. You should use some sort of criterion (e.g. Akaike Information Criterion) to assess the support for the models. Sometimes two or more models are highly supported (e.g. delta AIC < 2). In such cases it is convenient to perform the phylogeny estimation using these models, and then compare the results.
Danny do u have any idea for the estimation of the divergence time for bufo samples, I have bufo arabicus samples and there no samples sequenced in genebank and i want to do the divergence time but i dont know which outgroup samples I have to use. I am using MEGA6
If you have DNA samples for Bufo arabicus you can download DNA sequences from GenBank for other species and use some calibration points (e.g. fossils) and an appropriate software (e.g. BEAST) to estimate divergence times. Check these papers on phylogenetic relationships of amphibians:
You can check what model may fit better your dataset, Models are dependent on the data. You can test them on Modeltest (http://www.molecularevolution.org/software/phylogenetics/jmodeltest).