One you have your sequences aligned, model testing is of utmost importance. If you have protein sequences: ProtTest, if you have DNA seqs, then as mentioned before: jModelTest or MrModelTest.
For phylogenetic inference I prefer statistical based methods, ML or Bayesian, with the packages mentioned previously. An interesting approach can be to run an ML inference and then use its output (tree topology) as a prior in a bayesian inference.
Be aware of the problem of gene trees vs species trees. If you have a single locus, depending on the specific gene you have, you may encounter problems in inferring the true relationships among the species. If you can, it would be better to use a multilocus approach.
I would use different phylogenetic approaches (Neighbor joining, Maximum Parsimony, Maximum Likelihood, Bayesian) and see whether they agree. For those approaches that require specifying the model you can always do a prior model testing approach (e.g. with ModelTest), but there are several models meant for chloroplast sequences. If your sequence is very short you may have problems with low bootstraps and disagreement between the methods, this will indicate that you do not have enough information contained in this short sequence.
Regarding gap treatment, you can trim highly gapped regions using trimAl (http://trimal.cgenomics.org), but with a short sequence I would not go for a very stringent trimming.
Hi Grigorious, first you must know the variability and informative sites (parsimonious) of that sequence, see how its variation in first, second and third position of codon, later you can use two different methods of phylogenetic reconstruction: 1. Cladistic, and you can use MEGAV5.0 o TNT. 2 Statistical inference, as ML o bayesian, I like bayesian and you can use MrBayes3.2 o BEAST. About the model use Jmodeltestv2.0 that software with Akaike o Bayesian criterion resolve than question. In Bayesian we used Probability posterior clade is more relax that bootstrapp in ML, in parsimony generally used 1000 replicates. In gap/indels, BEAST I believe that software compute gap as fith state (A,T,C,G and gap) but I don´t sure that same method you can use in parsimony.
The next book is a good guide for beginning in phylogenetic analyses:
One you have your sequences aligned, model testing is of utmost importance. If you have protein sequences: ProtTest, if you have DNA seqs, then as mentioned before: jModelTest or MrModelTest.
For phylogenetic inference I prefer statistical based methods, ML or Bayesian, with the packages mentioned previously. An interesting approach can be to run an ML inference and then use its output (tree topology) as a prior in a bayesian inference.
Be aware of the problem of gene trees vs species trees. If you have a single locus, depending on the specific gene you have, you may encounter problems in inferring the true relationships among the species. If you can, it would be better to use a multilocus approach.
Choosing of the adequate model very important for futher analysis and depends on variability of your locus in general. Before using any model you may to study some special literature about it. All programs which are noted by previous authors are good for solving such problems.
Also if samples are available I think it would really be beneficial to generate some comparable sequences from a closely related genus (or 2). These will be used to root your tree and allow you to determine which are the more basal lineages and which are the more derived within your genus....giving an idea of the bigger evolutionary picture.
Bear in mind the cautionary notes (above) about the use of a single locus. If this is all that will be available to you, are there other characters (morphological etc) that could be used to give you an independent assessment of the relationships within the genus?
Apart from a single species from a closely related genus there are no other sequences from the specific locus. And this species will serve as a root. Also morphology is not the question here as I just want to test the relative phylogenetic utility of this locus compared to others used so far by reproducing a tree as close as I can to the already proposed for the genus. Thanks anyways!
Given that all your species are within a single genus and you have only one locus, I wouldn't worry too much: your assumptions of rate homogeneity should be pretty robust both along the tree and across sequences. Given it's a coding sequence, use a high gap opening penalty and a low gap extension for you MSA. For your tree reconstruction, I would definitely go with a Maximum Likelihood method (e.g. RAxML with GTRGAMMA model)
Hi Grigorios, I believe that accurate sequence alignments of the data matrix are required for meaningful tree inferences. So, you can verify your alignments manually before you proceed further. Consider the following articles for help: (i) Löhne C, Borsch T (2005) Molecular Evolution and Phylogenetic Utility of the petD Group II Intron: A Case Study in Basal Angiosperms Mol Biol Evol 22(2): 317-332 and (ii) Morrison DA (2006) Multiple sequence alignment for phylogenetic purposes. Aust Syst Bot 19:479–539.
You can include gaps in your analysis by coding them automatically in a binary matrix using SeqState. The binary data can be appended along with your aligned DNA sequence data and analyzed for phylogenetic tree reconstruction using MrBayes. For further reading, refer to Simmons MP, Ochoterena H (2000) Gaps as Characters in Sequence-Based Phylogenetic Analyses Syst. Biol. 49(2):369–381.
I think that the best way it's compare trees from differents analyses; RAxML, Maximum parsimony (MP) , bayesian (using before a evolutionary model calculeted with jmodeltest) whatever, for each genes. Programs: Mr. Bayes, Garlic, Beast etc...You could run it also on line in http://www.phylo.org/portal2/login!input.action
or
http://www.bioportal.uio.no/
They are free and fast.
Which methods are the best? Imposible to answer sorry, depends of the loci used and of course the evolutionary history of your individuals, so you have to check. I recomend you also Phylogenetic_Trees_Made_Easy, is realy good supported to start in phylogeny analysis.
I can tell you that in my research (infra-species level) I get the best trees topology with bayesian analysis, and I complete with MP.
if you want everyone's extremely helpful answers streamlined in 3 steps, then here you go:
1, alignment : use MAFFT/Muscle
2, Model selection: JModeltest (use AICc correction criterion as you have short sequences)
3, Tree making: use Bayesian/Likelihood method with the model selected with above step. I would highly recommend Beast for Bayesian, its far superior to MrBayes in speed and interface. Likelihood: PhyML or Paup*.
PS: try to include outgroup, use neighbour joining at the beginning to get a feel for analysis before proper model selection.
A good book guideline that explains from sequence evolution->alignments->phylogenetic reconstruction->hypothesis testing is Phylogenetic Handbook: A Practical Approach to Phylogenetic Analysis and Hypothesis Testing
Here is the link: http://www.kuleuven.be/aidslab/phylogenybook/home.html
Geneious is really easy to use and the alignment normally is very fast (it dependes of data ser, of course). The problem with this software is the price (very expensive).
Philogenetic reconstruction: ML, MP and bayesian analysis.