Alt. Amino Acid Substitution matrices [BLOSUM]: Why use Blocks?

08 August 2014 4 7K Report

I'm working on an "alternative" amino acid substitution matrix, in the style of BloSUM (I.e - focus is on functional and structural alignment, between remote homologs).

I know the original blosum matrix extracted blocks of conserved residues from pair-wise (or MSA?) local alignments, (i.e "local, ungapped alignments").

I don't understand what would be the best way to work with this in the following context:

1) I have MSAs for the various sequences & families (I wish to analyze, in order to get the statistics for building my new AA substitution matrix).

Should I be looking at pairwise alignments? what does it mean "no gaps"? What's the minimal length of such an alignment?

2) What are the best, modern tools for extracting these ungapped blocks?

(The BLOCKS+ database, and the tools Henikoff used there aren't maintained anymore. And i'm unsure if just extracting local alignments is enough. I've seen something similar in G-blocks, but that seems aimed at phylogenetic analysis ).

3) To reduce "sequence similarity" (i.e 62% similarity = Blosum62, etc'), should I simply cluster/filter all the sequences in my database using CD-hit/UniRef? Or should the clustering be applied on the level of the individual protein blocks? Or somehow applied post alignment?

4) I recall that in the original paper, Heinikoff & H. didn't use any existing matrices (PAM) to get their blocks/aligned motifs. I'm confused as how to do that using existing methods, and whether I even should. (I.E, motif extraction vs pairwise alignment vs MSA alignment, for block extraction).

[Disclaimer: I lack any previous background in MSA work & the like].

Thank you very much!!

Guy Bottu

There are alignment methods that do not rely on an existing AA comparison matrix because they sort-off generate it on-the-fly from the data themselves. I have not followed the developments of the last years, so maybe someone has a better proposal, but I think that a good an still maintained tool is MEME (http://meme.nbcr.net/meme/).

Dan Ofer

I've seen MEME, but I don't know what of the various tools in it extract BLOCKS. ?

Give a set of unaligned sequences (eventually "filtered" to avoid bias caused by subsets of similar sequences) to the MEME program, ask simple text output, then use a program written in Perl to extract the blocks.

Leon Kaplan

it is not my cape of tea

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

Separation of organic acids-HPLC?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?