I'm a bit lost. After having the sequences of the different samples for chilli species in Mauritius, a phylogenetic tree is to be constructed, which program can I use?
First you will need to align the sequences, and you will probably want to also align your sequences to a "reference set" of chili pepper species of known ancestry (I am assuming you want to know something about the heritage of yours, like people sending their dog DNA to find out what breed of dog they have).
Then you can use a phylogeny package such as DAMBE, or MEGA, or PHYLIP, or PAUP (I would recommend MEGA for a start) to read in the alignment and have pull-down menus for various types of phylogenetic reconstructions. The simplest would be neighbor-joining with a simple model of evolution. Selecting the method and model depends on how many sequences (a dozen or so vs hundreds or thousands of species) and how diverse the species are (are they all 99% identical to each other, or more like 80% or a range from 90% to 50% identity? It also depends on whether you need to infer things like rates of evolution and effective population sizes or if you just want a rough estimate of who is related to who.
The third step will be rendering the treefile into a nice tree that makes biological sense. A stand-alone tree rendering program like FigTree is far better than the tree rendering program that is typically "built in" to a package like DAMBE or MEGA.
I really wish this was a simple question to answer, but it is not. Before asking about programs, you probably need to ask yourself some questions about your data. Does your data seem to manifest any biases, such as significant variation in base/amino acid composition? Are there fast-evolving and slow-evolving sequences in your dataset? Do you have enough variation in order to construct sensible trees? Is there too much variation and therefore are the sequences saturated for change.
At this point, you should probably read some of the textbooks on this issue, since a small comments section on a website is unlikely to capture the difficulties of the question.
Bottom line: It is not simply a matter of which software to use, it is also an issue of being able to interpret the possible reasons for getting the answers that you are getting.
There are a number of available programs and tastes vary. It is my belief that the program MEGA is a good place to start. The program is free and available at (http://www.megasoftware.net).
#1. Save your sequence text file as a .fasta file (e.g. Mauritius.fasta) from a general text program like Wordpad or TextEditor. the .fasta file extension allows MEGA to recognize the file you create.
The format is as follows:
>Capsicum1
ATTCA...
>Capsicum2
ATTCC...
>Distant_Solanaceae_to_root_tree
ATTTT...
#2. Open Mauritius.fasta in MEGA (I prefer to right-click on the file and "open with MEGA)
#3. Select all of your sequences that are to be aligned using the left mouse button or Ctrl+left mouse click (for PC anyway). Align the selected sequences using the "ClustalW" option under the "Alignment" tab in MEGA. Once you have aligned the sequences you need to export the alignment file as a .meg file (e.g. Mauritius.meg)
#4. Then open the Mauritius.meg file in MEGA (I prefer to right-click on the file and "open with MEGA). Then click on the "Construct/Test Neighbor-Joining Tree" option under the "Phylogeny" tab. Next, select the "bootstrap method" under the "test of phylogeny" tab. Change the "No. of bootstrap replications" to oh let's say 500. Select the"Maximum Composite Likelihood" option under the "Model/Test" tab. Then click compute.
#5. After the tree is generated you can save the tree file as a .mts (MEGA tree session). This will allow you to go back and make changes to the tree.
This is a very basic introduction to creating a Neighbor-Joining tree using Maximum Composite Likelihood but I think it is the best way to get you started.
I am agree with James. You should check your data for the first and your research purpose. Because some methods is not valid or match for some purpose research.
www.phylogeny.fr is also a good starting point. The work-flow goes from alignement to curation of the alignment (G-blocks) to maximum-likelihood tree in a glimps. Just for non-experts...
I'm using G-Block as well, but not all the time for certain data sets. Especially those gene sequences that cannot easily align, etc. variable region of ITS gene.
Exactly Maria, that's what I did to resolve some of the phylogeny relationship for certain genus. Sequence-structure information is always a powerful tools. As well as the structural information like compensatory base changes which helps to support the biological species concept.
Just to add to the diversity, I use PAUP for decades with no complaints, though MEGA is really not bad, as well. However, inasmuch is a bit specific in format it uses and looks less user-friendly than MEGA, PAUP gives much more opportunities in fine choses of analyses options. Also free, with the comprehensive manual.
MrBayes for Bayesian analysis - quick and easy. PAUP for maximum likelyhood and maximum parsimony analyses, also MEGA. For preparing the datased - Bioedit. I can recomend on-line tool FaBox for easy editing of datasets you get via Bioedit. FaBox: http://users-birc.au.dk/biopv/php/fabox/
Theoretically, you must first choose an outgroup for your data set. Automatic alignment with Bioedit or Seaview is an easy way which must be completed, sometimes, with manual correction. Then, you have to select a DNA substitution model with, for example, MrAIC. You can generate your phylogeny using PhyML (maximum likelihood) or/and MrBayes (Bayesian inference) with bootstrapping to evaluate the strength of nodes support on your trees. I say "and" because MrBayes is known to overestimate bootstrap values. Finaly, you can visualise your trees with Treeview or NJplot.
I think these programs (including MrAIC, MrBayes, PhyML) are not easy to use for beginners and one demonstration is a minimum.
All of this is theoretical and supply in phylogenetic software is important. You can read recent publications to get an idea of fashion in phylogeny !
The answer of Patrice is quite complete. Few additive comments.
For all your analysis the base is PAUP.
But if you want to explore more options in parsimony use TNT.
If you want to use ML : Garli is much faster than PAUP but with few algorithm problems and really not friendly!!!! RAXml seems to be nice, but I've got few experience with it.
MEGA....easy to start/friendly.....but FULL of bugs....
For me Bayesian statistics are not valid for phylogenies (not taking in account biological similarities). Therefore I can' recommend Mr Bayes. Posterior treatments (dating) using BEAST are also dangerous as you've got severe flaws between 6 and 3 millions of years. Bayesian statistics are extremely powerful for population genetics (discriminate populations, gene flow etc)...but not when searching evolutionary relationships between species.
As you are working among a single species (Caspiscum varieties), I suppose that you've got (using genes) few differences/polymorphism. You may explore parsimony based networks (type Bandelt). Try the software: "Network", it is rather friendly to use.
In several journals MEGA is not alowed anymore, because is a very "basic" software. If its only to undestand and to learn about phylogeny and other topics like that one, its okay MEGA but if its for a paper, I suggest another programs like MetaPIGA or MrBayes, etc...
Alejandro, can you explain a bit more which journals do not allow phylogenies constructed with MEGA and why? I find that very surprising. The authors of MEGA know their stuff and although it can be abused by the ignorant due to its ease of use, I think it is very unfair to describe it as "very basic software".
Mega6 is a wonderful software . I got the sequences alignment with it and calculate p-distance . Also I would like to construct the NJ tree by using it. But I do not use it to construct the ML tree because the accuracy of the result is debated. I do not know why. The topology the bootstrap are similar between the Mega6 and other software (I use the Phyml) . So I use the Phyml to construct the ML tree to avoid the argument and disagreement . Phyml and RAxML are two softares with most citation in the paper. But phyml is more friendly to user that we can easely handle it under Windows environment.
Mrbayes is a good software for Bayesian inference .
Well done, you have complete overview of the tools to construct phylogenetic trees starting from PAUP, MEGA, ..., until RAxML. But you must know which kind of phylogenetic tree you want?
Phylogenetic trees can be constructed from one of two methods: either the construction of distance matrix (calculate the distances from your sequences), or maybe you need to constuct your phylogenetic tree based on a prefered outgroup? In this case, you need to find the core genes among your species, then calculate the distance between your core genes and the outgroup one.
A lot of tools was stated in the answers which belongs to distance matrix method or in outgroup one.
All ML, Mr.Bays, PhyML, and RAxML for the outgroup method, while SeaView, Parsimony, ..., etc are for distance matrix.
Note that, with distance matrix you need to make a multiple alignment process in order to calculate the distance (you will need to use T-coffee, Muscle, ClustalW,...) , or you can use simply the edit distance from global or local alignment process such as (Needleman wunch, smith waterman, ....).
Finally, there is no meaning of the word "Best" in Bioinformatics, because it depends on many factors that make your tree good or not.
@Bassam, I find your answer very confusing. "Outgroups" are used for rooting, not actual tree construction. (Most likelihood/Bayesian models are symmetrical, I think, and give an unrooted topology that is subsequently rooted by another method, e.g. midpoint or outgroup.) Parsimony is most definitely NOT a distance method and explicitly does not use distance matrices. ALL *molecular* phylogenetics tools start with an alignment. Some generate (and/or can also use) a distance matrix and essentially perform hierarchical clustering whilst others use an internal model of evolution to assess different trees and pick the best. In each case, all-by-all comparisons are used, rather than comparisons versus a specified outgroup.
@Richard, you are right, in all cases sequence alignment should be done all-by-all comparisons. From this comparisons a distance matrix will take place, then using a hierarchical clustering algorithm to construct unrooted phylogenetic tree. For rooting it, one or more outgroups could be selected to root the phylogeney.
There are many online programs are available. But, I think clustal omega will be better. Clustal omega is online server, which gives you results with in few minutes.
Clustal Omega is a (good) multiple sequence alignment tool, NOT a phylogenetics program. Under no circumstances should you use the guide tree generated by an alignment tool as a phylogenetic tree.
As most of the experts prefer different software for doing the phylogeny, all will give you the results but I found MEGA easy. I prefer to use MEGA program and it is also freely available.
My first preference to use the Clustal X for alignment and then run dnd file in MEGA 7 for a phylogenetic tree. I am using both of these software for analysis.