You can calculate all kinds of indices and metrics in R (http://cran.r-project.org/) these days. I would calculate a dissimilarity index, such as Jaccard's, Sørensen's, or Dice's and then build a UPGMA tree. PCR band data tend to harbor great levels of homoplasy, so I would avoid making phylogenetic statements and use only distance-based measures (no ML, no Bayesian inference, no MP).
Feel free to browse my collection of software links, although not for binary data: http://softlinks.amnh.org
Here's an R script I put together for calculating Sørensen's index and plotting a UPGMA tree to cluster patterns of 0/1 data.
#Calculate Sorensen's index and plot a UPGMA tree
#The dataset has to be in 0/1 pairwise table format with row+column labels and every cell separated with commas, hence the "csv"
#Get the vegan library first.
#The name the arrow points at is what I choose to call files and variables. Feel free to name accordingly.
I also would recommend MEGA, it is easy to use, although a bit slow if your sample size is very large (i.e. 10s of thousands of sequences), but it has a lot of options
I think it depends of what you want to do. Mega is really good to do some quick analysis using neighbor-joining method. However, depend on your data and on the journal you wanna publish I would recommend doing a Maximum Likelihood analysis and/or Bayesian. RaxML is good but you can only implement a few DNA substitution model (are you working with DNA or AA?). PhyML is also a great program (and instead of using bootstrap you can use aLRT with SH interpretation, it is reliable and really quick too. I can post the paper here if you would like more info). For Bayesian you can try MrBayes for example. All these programas are freely available.
If I am not wrong you can use binary data using MrBayes (not 100% sure)... in this case you will do a Bayesian Inference and you have posterior probability for branch support.
You can calculate all kinds of indices and metrics in R (http://cran.r-project.org/) these days. I would calculate a dissimilarity index, such as Jaccard's, Sørensen's, or Dice's and then build a UPGMA tree. PCR band data tend to harbor great levels of homoplasy, so I would avoid making phylogenetic statements and use only distance-based measures (no ML, no Bayesian inference, no MP).
Feel free to browse my collection of software links, although not for binary data: http://softlinks.amnh.org
Here's an R script I put together for calculating Sørensen's index and plotting a UPGMA tree to cluster patterns of 0/1 data.
#Calculate Sorensen's index and plot a UPGMA tree
#The dataset has to be in 0/1 pairwise table format with row+column labels and every cell separated with commas, hence the "csv"
#Get the vegan library first.
#The name the arrow points at is what I choose to call files and variables. Feel free to name accordingly.
Not sure if I should start a different post but am also working on MSAs and trying to figure out what is the best approach to use. I am aligning Phylum-level fungal rDNA sequences (Internal transcribed spacer regions) and am having difficulties generating good trees because of the numbers of gaps in the ITS regions. I am currently using MEGA5. I also started with CLUSTAL to generate the initial MSAs, but am now trying MAFFT to see if it helps. Anyone else using variable regions with a lot of sequence gaps and nucleotide changes between organisms?
ITS regions have lots of indels, especially above the species level, so it is almost impossible to avoid getting a gap-rich alignment. You can search for an appropriate secondary structure and guide your alignment with that. See http://goo.gl/p6n06 for papers on ITS and rRNA secondary structure in fungi and plants.
MAFFT performs better than Clustal. Do tweak the settings (offset, gap opening penalty), try the Q-INS-i algorithm that is written for RNA secondary structures, see if you can import a structure model from a published study as guide.
Consult the ITS2 database: http://its2.bioapps.biozentrum.uni-wuerzburg.de/
4SALE (http://4sale.bioapps.biozentrum.uni-wuerzburg.de/) is a nice tool.
Yes. Well this information. Great. But I do not know much about the genetics and molecular biology. I know only in the diagnosis and pathogenesis of bacteria. And antibiotics, not in the genes. Thank you
As said by many, you can use MEGA5 to perform bootstrapping for most of the analyses availlable in the programme such as NJ, ML or even MP. It is you who will decide the level of replicates to choose (e.g 1000). Goodluck!!
I am always use PAUP or R Program to phylogenetic analysis. If you want analysis it using R, you can read Emanuel Paradise book: Phylogenetic analysis using R. this program can analysis genetic or biner data. you can free download the program in the internet. and I can send the ebook by email if you want it.
ITS sequences are very unlikely to work well at the phylum level. As suggested above, MAFFT Q-INS-i will probably do the best job of the alignment, but the biggest problem is that you are using sequences that are evolving too fast for what you want to do. SSU, or even better, LSU sequences would work much better at this level.
You can try and fold your ITS1-5.8S-ITS2 to get their secondary structures.
As this secondary structures are well conserved in eukaryotes with four main helices in ITS2 and several motifs. The alignment is better for the analyses as it involve not only the sequence information but also the structural information. Alignment can be done by 4SALE. Then you can analyse it with ProfDist to get the PNJ tree. This method is useful up to sub-species level.
If not, you can also try out TCoffee for your alignment followed by G-Block as G-block will help you to select the conserved block from your alignment file and hence eliminate the gaps. This will also help to reduce the long branch effect in your tree.