How do you measure phylogenetic signal vs noise in a multiple sequence alignment?

More Brian Thomas Foley's questions See All

What is the largest known protien?

Today I came across a 5,224 amino acid long protein, from Drosphila, that has homologs in other insects. http://www.ncbi.nlm.nih.gov/nuccore/194894863 It is some sort of fancy zinc finger protein...

03 April 2014 728 1 View

What is a good tool for pulling complete genes from chromosomes/contigs/genomes?

I have aligned the complete mitochondrial genomes of well over 200 vertebrates, mostly mammals with some birds, reptiles, amphibians, fish for outgroups. The mitochondrial DNA evolves about...

31 December 2013 7,765 19 View

What are some of the best Phylogenetics/ Evolution textbooks out today, for undergraduate and PhD level?

A Google search for "Phylogenetics Textbook" brings up several titles of books, and some online courses etc. Can any of you add information about which books you have used and why you like them or...

11 December 2013 9,846 5 View

Is the devil in the details?

Point mutations are a small part of evolution overall. Gene duplications and acquisition of genes by horizontal transfer is larger. Sex and recombination, genetic bottlenecks, population sizes,...

03 April 2013 9,532 7 View

Is this good advice for a student or post-doc doing bioinformatics in a traditional "wet bench" biology group?

It is sort of a rhetorical question, about this advice I came across...

03 April 2013 9,199 5 View

Is there a tool for annotation-assisted multiple sequence alignment?

At the HIV Sequence and Immunolgogy Databases where I work, we have used a bit of creativity to solve some difficult problems in multiple sequence alignment. Often we want to produce an alignment...

03 April 2013 1,017 5 View

Would this paper be good for your project?

Would this paper be a good addition to your project: https://www.researchgate.net/publication/324212102_The_evolutionary_history_of_vertebrate_RNA_viruses The evolutionary history of vertebrate...

01 January 1970 1,030 2 View

How to fix a phylogenetic tree, long branches attract problem?

With HIV-1 phylogenies we quite often have data sets where we know important details about the true evolutionary history. For one example, with a local transmission chain such as husband to wife...

01 January 1970 1,068 10 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Cristián E Hernández Popular answer

Please do not forget that the consistency index (CI) measures the amount of homoplasy in a cladogram, however this measure only makes sense in a Parsimony analysis framework. Remember that in phylogenetic analysis based on explicit models (ML and Bayes) all sites are important to calculate the likelihood given the model of molecular evolution, and then the CI do no sense at all in these methods.

Brian Thomas Foley

I like the DAMBE function (under the graphics menu) for plotting transitions and transversions vs pairwise distances, to check a data set for saturation of silent sites. Transitions (purine purine, pyrimidine pyrimidine; G A and C T) outnumber transversions because of the biochemical properties of DNA damage repair and replication error. But there are twice as many possible transversions ( G C, G T, A C, A T) so saturated sequences in theory could have up to 2 fold more transversions than transistions. I am attaching a couple of plots which illustrate this.

Cristián E Hernández

Given that the diversification of some taxa is very ancient, the selected molecular markers could be saturated (non phylogenetic signal), and provide spurious phylogeny. Therefore I suggest evaluating whether the sequences were saturated and thus useful for the phylogenetic analysis (Phylogenetic signal), for example, using Xia’s test implemented in DAMBE (http://dambe.bio.uottawa.ca/dambe.asp). You need to test by potential effect of Saturation in DNA mutation before to perform the phylogenetic analyses.

Xia X, Xie Z, Salemi M, Chen L, Wang Y (2003) An index of substitution saturation and its application. Mol Phylogenet Evol 26: 1–7.

Xia X, Xie Z (2001) DAMBE: data analysis in molecular biology and evolution. J Hered 92: 371–373.

Diversity is not the only cause of "noise" in data. Other sources can be recombination, where one part of a sequence has one history while other parts have another history. It is possible to have data in the very nice, linear, nonsaturated range of the Ts and Tv vs distance plot that has a very poor consistency index. I don't think there is any SINGLE answer to how to detect good data, but I'd like to hear about many different methods that people are using.

Maximiliano Manuel Maronna

AliGROOVE – visualization of heterogeneous sequence divergence within multiple sequence alignments and detection of inflated branch support

http://www.biomedcentral.com/1471-2105/15/294