Which program is best to use for phylogeny analysis?

03 March 2013 49 8K Report

I'm a bit lost. After having the sequences of the different samples for chilli species in Mauritius, a phylogenetic tree is to be constructed, which program can I use?

Brian Thomas Foley

First you will need to align the sequences, and you will probably want to also align your sequences to a "reference set" of chili pepper species of known ancestry (I am assuming you want to know something about the heritage of yours, like people sending their dog DNA to find out what breed of dog they have).

Then you can use a phylogeny package such as DAMBE, or MEGA, or PHYLIP, or PAUP (I would recommend MEGA for a start) to read in the alignment and have pull-down menus for various types of phylogenetic reconstructions. The simplest would be neighbor-joining with a simple model of evolution. Selecting the method and model depends on how many sequences (a dozen or so vs hundreds or thousands of species) and how diverse the species are (are they all 99% identical to each other, or more like 80% or a range from 90% to 50% identity? It also depends on whether you need to infer things like rates of evolution and effective population sizes or if you just want a rough estimate of who is related to who.

The third step will be rendering the treefile into a nice tree that makes biological sense. A stand-alone tree rendering program like FigTree is far better than the tree rendering program that is typically "built in" to a package like DAMBE or MEGA.

James McInerney

I really wish this was a simple question to answer, but it is not. Before asking about programs, you probably need to ask yourself some questions about your data. Does your data seem to manifest any biases, such as significant variation in base/amino acid composition? Are there fast-evolving and slow-evolving sequences in your dataset? Do you have enough variation in order to construct sensible trees? Is there too much variation and therefore are the sequences saturated for change.

At this point, you should probably read some of the textbooks on this issue, since a small comments section on a website is unlikely to capture the difficulties of the question.

Bottom line: It is not simply a matter of which software to use, it is also an issue of being able to interpret the possible reasons for getting the answers that you are getting.

lei Gao

PAUP, MEGA, etc.

Ijad Madisch

Hey,

for your purpose: Mega is the best software for phylo analysis.

Best

Ijad

Andrew D Winters

There are a number of available programs and tastes vary. It is my belief that the program MEGA is a good place to start. The program is free and available at (http://www.megasoftware.net).

#1. Save your sequence text file as a .fasta file (e.g. Mauritius.fasta) from a general text program like Wordpad or TextEditor. the .fasta file extension allows MEGA to recognize the file you create.

The format is as follows:

>Capsicum1

ATTCA...

>Capsicum2

ATTCC...

>Distant_Solanaceae_to_root_tree

ATTTT...

#2. Open Mauritius.fasta in MEGA (I prefer to right-click on the file and "open with MEGA)

#3. Select all of your sequences that are to be aligned using the left mouse button or Ctrl+left mouse click (for PC anyway). Align the selected sequences using the "ClustalW" option under the "Alignment" tab in MEGA. Once you have aligned the sequences you need to export the alignment file as a .meg file (e.g. Mauritius.meg)

#4. Then open the Mauritius.meg file in MEGA (I prefer to right-click on the file and "open with MEGA). Then click on the "Construct/Test Neighbor-Joining Tree" option under the "Phylogeny" tab. Next, select the "bootstrap method" under the "test of phylogeny" tab. Change the "No. of bootstrap replications" to oh let's say 500. Select the"Maximum Composite Likelihood" option under the "Model/Test" tab. Then click compute.

#5. After the tree is generated you can save the tree file as a .mts (MEGA tree session). This will allow you to go back and make changes to the tree.

This is a very basic introduction to creating a Neighbor-Joining tree using Maximum Composite Likelihood but I think it is the best way to get you started.

Good luck.

Erta Puri Rosidiani

I am agree with James. You should check your data for the first and your research purpose. Because some methods is not valid or match for some purpose research.

Maria Valeria Ruggiero

www.phylogeny.fr is also a good starting point. The work-flow goes from alignement to curation of the alignment (G-blocks) to maximum-likelihood tree in a glimps. Just for non-experts...

Hong Chang Lim

I'm using G-Block as well, but not all the time for certain data sets. Especially those gene sequences that cannot easily align, etc. variable region of ITS gene.

Maria Valeria Ruggiero

Me too, I use G-block for ITS, though a secondary-structure guided alignment could be the best solution.

Hong Chang Lim

Exactly Maria, that's what I did to resolve some of the phylogeny relationship for certain genus. Sequence-structure information is always a powerful tools. As well as the structural information like compensatory base changes which helps to support the biological species concept.

Andres Parada

Try TNT for parsimony http://www.zmuc.dk/public/Phylogeny/TNT/

raxml for Maximum Likelihood https://sites.google.com/site/raxmlgui/

and BEAST for Bayesian estimation http://beast.bio.ed.ac.uk/Main_Page.

You should take a look http://evolution.genetics.washington.edu/phylip/software.html

Mauro Sanna

Mega is the best

Lucy Nongbri

Yes MEGA is the best.

Predrag Simonović

Just to add to the diversity, I use PAUP for decades with no complaints, though MEGA is really not bad, as well. However, inasmuch is a bit specific in format it uses and looks less user-friendly than MEGA, PAUP gives much more opportunities in fine choses of analyses options. Also free, with the comprehensive manual.

Kateřina Rylková

MrBayes for Bayesian analysis - quick and easy. PAUP for maximum likelyhood and maximum parsimony analyses, also MEGA. For preparing the datased - Bioedit. I can recomend on-line tool FaBox for easy editing of datasets you get via Bioedit. FaBox: http://users-birc.au.dk/biopv/php/fabox/

Gautier Calmin

Theoretically, you must first choose an outgroup for your data set. Automatic alignment with Bioedit or Seaview is an easy way which must be completed, sometimes, with manual correction. Then, you have to select a DNA substitution model with, for example, MrAIC. You can generate your phylogeny using PhyML (maximum likelihood) or/and MrBayes (Bayesian inference) with bootstrapping to evaluate the strength of nodes support on your trees. I say "and" because MrBayes is known to overestimate bootstrap values. Finaly, you can visualise your trees with Treeview or NJplot.

I think these programs (including MrAIC, MrBayes, PhyML) are not easy to use for beginners and one demonstration is a minimum.

All of this is theoretical and supply in phylogenetic software is important. You can read recent publications to get an idea of fashion in phylogeny !

Kumarakurubaran Selvaraj

I would like to suggest a Book at this point "Phylogenetic trees Made easy- A How to Manual 4th edn" Barry G. Hall, 2011

Luc Legal

Dear Taahira

The answer of Patrice is quite complete. Few additive comments.

For all your analysis the base is PAUP.

But if you want to explore more options in parsimony use TNT.

If you want to use ML : Garli is much faster than PAUP but with few algorithm problems and really not friendly!!!! RAXml seems to be nice, but I've got few experience with it.

MEGA....easy to start/friendly.....but FULL of bugs....

For me Bayesian statistics are not valid for phylogenies (not taking in account biological similarities). Therefore I can' recommend Mr Bayes. Posterior treatments (dating) using BEAST are also dangerous as you've got severe flaws between 6 and 3 millions of years. Bayesian statistics are extremely powerful for population genetics (discriminate populations, gene flow etc)...but not when searching evolutionary relationships between species.

As you are working among a single species (Caspiscum varieties), I suppose that you've got (using genes) few differences/polymorphism. You may explore parsimony based networks (type Bandelt). Try the software: "Network", it is rather friendly to use.

Cheers from Luc

Alejandro Gonzalez-Treviño

In several journals MEGA is not alowed anymore, because is a very "basic" software. If its only to undestand and to learn about phylogeny and other topics like that one, its okay MEGA but if its for a paper, I suggest another programs like MetaPIGA or MrBayes, etc...

Richard J Edwards

Alejandro, can you explain a bit more which journals do not allow phylogenies constructed with MEGA and why? I find that very surprising. The authors of MEGA know their stuff and although it can be abused by the ignorant due to its ease of use, I think it is very unfair to describe it as "very basic software".

Antony T Vincent

The best is certainly PhyloBayes with heterogen model CATGTR+G4. But, RAxML can do something good and you have the oppotunity to do "rapid bootstraps"

Da-Song Chen

Mega6 is a wonderful software . I got the sequences alignment with it and calculate p-distance . Also I would like to construct the NJ tree by using it. But I do not use it to construct the ML tree because the accuracy of the result is debated. I do not know why. The topology the bootstrap are similar between the Mega6 and other software (I use the Phyml) . So I use the Phyml to construct the ML tree to avoid the argument and disagreement . Phyml and RAxML are two softares with most citation in the paper. But phyml is more friendly to user that we can easely handle it under Windows environment.

Mrbayes is a good software for Bayesian inference .

Qiang Wei

Phyml also is easy to use, I think.

Bassam Alkindy

Dear Taahirah Goolbar,

Well done, you have complete overview of the tools to construct phylogenetic trees starting from PAUP, MEGA, ..., until RAxML. But you must know which kind of phylogenetic tree you want?

Phylogenetic trees can be constructed from one of two methods: either the construction of distance matrix (calculate the distances from your sequences), or maybe you need to constuct your phylogenetic tree based on a prefered outgroup? In this case, you need to find the core genes among your species, then calculate the distance between your core genes and the outgroup one.

A lot of tools was stated in the answers which belongs to distance matrix method or in outgroup one.

All ML, Mr.Bays, PhyML, and RAxML for the outgroup method, while SeaView, Parsimony, ..., etc are for distance matrix.

Note that, with distance matrix you need to make a multiple alignment process in order to calculate the distance (you will need to use T-coffee, Muscle, ClustalW,...) , or you can use simply the edit distance from global or local alignment process such as (Needleman wunch, smith waterman, ....).

Finally, there is no meaning of the word "Best" in Bioinformatics, because it depends on many factors that make your tree good or not.

hope I answer what you want.

Richard J Edwards

@Bassam, I find your answer very confusing. "Outgroups" are used for rooting, not actual tree construction. (Most likelihood/Bayesian models are symmetrical, I think, and give an unrooted topology that is subsequently rooted by another method, e.g. midpoint or outgroup.) Parsimony is most definitely NOT a distance method and explicitly does not use distance matrices. ALL *molecular* phylogenetics tools start with an alignment. Some generate (and/or can also use) a distance matrix and essentially perform hierarchical clustering whilst others use an internal model of evolution to assess different trees and pick the best. In each case, all-by-all comparisons are used, rather than comparisons versus a specified outgroup.

Antony T Vincent

If you want to perform a rigorous phylogeny, you can use PhyloBayes or RAxML.

Bassam Alkindy

@Richard, you are right, in all cases sequence alignment should be done all-by-all comparisons. From this comparisons a distance matrix will take place, then using a hierarchical clustering algorithm to construct unrooted phylogenetic tree. For rooting it, one or more outgroups could be selected to root the phylogeney.

Prabodh Ranjan

Multiple sequence analysis is best to do phylogeny analysis by clustel omega. use the following link.

http://www.ebi.ac.uk/Tools/msa/clustalo/

Haiam Aboul-Ela

Iam using Mega for phylogenetic tree construction but it doesn't support botstraping. Whuch progeam can I use for this purpose?

THANKS IN ADVANCE!

Mohamed Owis Badry

Hello,

You can use Mega to calculate the bootstrap support too!

Gabriel Augusto Marques Rossi

rapidnj and Mega7

Prabodh Ranjan

There are many online programs are available. But, I think clustal omega will be better. Clustal omega is online server, which gives you results with in few minutes.

Nancy Mekountchou

Dear Haiam Aboul-Ela , MEGA does support bootstrapping.

Richard J Edwards

Clustal Omega is a (good) multiple sequence alignment tool, NOT a phylogenetics program. Under no circumstances should you use the guide tree generated by an alignment tool as a phylogenetic tree.

Antony T Vincent

IQTREE is very easy to use, fast and gives trees very similar to RAxML.

http://www.iqtree.org

Richard J Edwards

@Antony, ease-of-use is a big benefit in this field. Looks like it has good documentation too!

Asif Naseem

MEGA is bit tricky or you can say difficult in a sense but gives more appropriate results as far as my experience is concern...

Ayodeji O. Olarinmoye

I have found MEGA quite uncomplicated, and the output easy to interpret. The available tutorials on YouTube are very easy to understand and apply.

Alice Jayapradha

I II I

Aashaq Hussain Bhat

As most of the experts prefer different software for doing the phylogeny, all will give you the results but I found MEGA easy. I prefer to use MEGA program and it is also freely available.

Jamal Othman

SPSS

Umair Hassan Khan

Hi i think you should go for CLUSTAL.

Kiril M. Dimitrov

I would recommend using RaxML - also available at many free HPCs

Mohammad Malekan

I my opinion the Mega software is user friendly and very easy to operate.

https://www.megasoftware.net/download_form

Muhammad Fahim

You best way out would be https://www.phylo.org there are plenty of useful tools including RAxML with superfast machine for analysis of big data.

Said Sajjad Ali shah

MEGA is very simple and easy to understand

Waqar Shafqat

My first preference to use the Clustal X for alignment and then run dnd file in MEGA 7 for a phylogenetic tree. I am using both of these software for analysis.

Waqar

Kanad Das

raxmlGUI 1.5 (Silvestro & Michalak 2012) is one of good and user friendly methods for the desired analysis.

Shahid Aziz

Mega software is user friendly and very easy to operate.

Badges
Science method

More Taahirah Goolbar's questions See All

Which universities offer online, reliable and internationally-recognised MSc courses in Bioinformatics or nanobiotechnology?

I know this may sound stupid but I actually NEED views from those in the fields.

31 December 2014 3,496 3 View

Any link to master python?

I'm just a beginner and I'm using this language for computational biology. Badly need some help.

10 November 2013 9,722 11 View

Ethical aspects of medical biotechnology?

I need some ideas. Or some journals.

03 April 2013 4,049 11 View

What is the scope of analyzing sequences?

The sequences obtained after amplifying with a specific primer are aligned and compared. Apart from finding ancestors and taxonomic classification, what can be done? I mean, what is the purpose of...

01 February 2013 8,977 5 View

What bioinformatics tools are the most appropriate for comparing and analyzing sequences?

After obtaining the sequences of 10 different PCR products for a specific matK region, I need to try to compare, analyze and obtain the different proteins involved. I don't know if I am being...

31 December 2012 2,631 10 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Which Scopus Journal provides the most affordable fees?

"PUBLISHING IN A SCOPUS JOURNAL" Researchers are now at a cross road. The critical need to publish in a Scopus or ISI, etc journal is ever vital. Journal Publication fees must be submitted....

10 August 2024 8,621 1 View

Seeking Advice on Viability and Execution of Undergraduate Thesis Topic?

Hello everyone, I am currently developing a thesis proposal and would appreciate your input on its viability and how to effectively carry it out. My proposed topic is: "Does the perceived threat...

10 August 2024 8,992 0 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

Who will be moral responsible for the death of thousands of people in the event of an earthquake?

Who will bear moral responsibility for the deaths of thousands of people in the event of an earthquake? Weeks and months remain before the onset of strong earthquakes that bring death to...

08 August 2024 6,134 12 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View