Technically speaking, it depends on the data that you are analyzing.
If the data have evolved under globally stationary, reversible, and homogeneous conditions, then a program like jModelTest should point you at the substitution model that best fits the data.
If the data have evolved under more general conditions, then, unfortunately, there is only a few models that can be used.
It so happens that I am working in this very area, so check out some of my most recent publications (http://www.csiro.au/people/Lars.Jermiin.aspx). In particular, you might find a book chapter from 2008 useful.
I find that MEGA5 works very well for determining the optimal model. Then use the model parameters to generate the tree. MEGA5 will also take codon position into account based on the list of settings.
Hi Omer, I am unfortunately not an expert in this type of analysis. I am sure you would have had a response from a knowlegeable person by now. Good luck.
Technically speaking, it depends on the data that you are analyzing.
If the data have evolved under globally stationary, reversible, and homogeneous conditions, then a program like jModelTest should point you at the substitution model that best fits the data.
If the data have evolved under more general conditions, then, unfortunately, there is only a few models that can be used.
It so happens that I am working in this very area, so check out some of my most recent publications (http://www.csiro.au/people/Lars.Jermiin.aspx). In particular, you might find a book chapter from 2008 useful.
Jmodeltest not accepted aligned Fasta file. Also I did as you said using example file (primate-mtDNA). how to interpret the result? Which models are most appropriate? can you explain on example?
I aggree with MEGA. There is also one option to find best model for your sequences. At the end of analysis, you will find the the best model in an order. Then you may manage the statistics what programme offered you... it is simple with MEGA, but I dont know others
In my group, we developed modelgenerator. This works on both DNA and protein and automatically detects which kind of data is in the datafile. it runs on the command line and will test your data on several dozen models (note: this can take quite a while for large datasets). You can find the program here: http://bioinf.nuim.ie/modelgenerator/
It has a very simple interface, just issue one command and it does the analysis.
In general, we find that the general time reversible model plus a gamma distribution of rate variation across sites is almost always selected if there is enough data in the alignment. The length of sequence needed varies depending on the diversity between the sequences. After running ModelTest, or our FindModel implementation of it, you should notice that the difference in likelihood scores is rather trivial between the different models (HKY, F84, GTR etc) in comparison to the difference with and without including rate heterogeneity between sites if you are using DNA from genes (protein coding or structural RNA).
The FindModel page links to information of ModelTest etc.
http://molecularevolution.org/software/phylogenetics/modeltest (also PROTTEST for proteins)... it implements statistical tests for choosing the model that best fit the data.
You can use jModelTest (http://darwin.uvigo.es/software/jmodeltest.html). Be careful with the number of substitution schemes that you choose. Sometimes the best fitting substitution model is not available in other programs of phylogenetic reconstructions. Most important, be careful with model uncertainty and pay attention to the delta values provided by jModelTest if you base your choice in AIC, AICc or BIC.