Could anybody please let me know of offline, or online tools and software for the prediction of % of Homology prediction among 100 nucleotide sequences other than blast?
Well, first you should use % similarity not homology. Homology is yes or not.
Blast is the worst tool to use, because it uses local alignments (HSPs, see the documentation) and not what you want which is a global alignment.
With only 100 sequences, this is a trivial problem, that you can solve even with a laptop computer.
Now the real question is why do you want to compute similarities ?
Sequence similarity is often meaningless, because there are more than one way to compute a distance between two sequences.
The best (easiest) way to go is probably
- Do a MSA (multiple sequence alignement), using Clustal or MAFT or Muscle or ...
- Open the resulting file (for example with SeaView) and manually check that there are no obvious errors
- Compute a tree (with seaview), reorder sequences according to this tree.
- Check alignement again.
- Choose the option output a distance matrix, you get your measures.
An other approach would be to downlad a Needleman-Wunsch algo, and compute pair-wise distances.
Now
- do you want to use or not a distance correction method (advised for building a tree).
- do you want to exclude badly aligned positions (if some sequences are distantly related), in this case use the "Site" menu to exclude these positions.
- again, sequence similarities are not very informative, because there is no molecular clock !
Well, first you should use % similarity not homology. Homology is yes or not.
Blast is the worst tool to use, because it uses local alignments (HSPs, see the documentation) and not what you want which is a global alignment.
With only 100 sequences, this is a trivial problem, that you can solve even with a laptop computer.
Now the real question is why do you want to compute similarities ?
Sequence similarity is often meaningless, because there are more than one way to compute a distance between two sequences.
The best (easiest) way to go is probably
- Do a MSA (multiple sequence alignement), using Clustal or MAFT or Muscle or ...
- Open the resulting file (for example with SeaView) and manually check that there are no obvious errors
- Compute a tree (with seaview), reorder sequences according to this tree.
- Check alignement again.
- Choose the option output a distance matrix, you get your measures.
An other approach would be to downlad a Needleman-Wunsch algo, and compute pair-wise distances.
Now
- do you want to use or not a distance correction method (advised for building a tree).
- do you want to exclude badly aligned positions (if some sequences are distantly related), in this case use the "Site" menu to exclude these positions.
- again, sequence similarities are not very informative, because there is no molecular clock !
@richard christen : i have used muscle alignment for msa., and lastly we have used Megaversion software for computing paire wise distance.,, but i dont kno how to interpret the result it gives like Matrix form let say between 5th & 6th sequence [5,6] pairwise distance= 2.457 .,, like wise for all the sequences from 1 to 100..so do 2.457 indicates no of base substitution per site ?? how could i know % similarity which gives homology among the sequences
As Richard said, percentages are meaningless, here. (IMHO often useful for presentations to MDs, though ;-) As the MUSCLE papers say: non-aligned distances from a pairwise comparison are kmer distances, else the Kimura distance is given (just google for it - wikipedia has a nice entry).
yes, the pairwise distance of nucelotide sequences is usually measured in nucleotide substitutions per site. However, most programs use correction models such as Jukes & Cantor. These models correct for the fact that over time several substitutions per site may occur in two related sequences, so a simple observed change from A -> C may in fact be the result from A -> G -> A -> G -> C. If you just want to calculate the % identity, most alignment viewers should have an option for calculating these. Or you could use this website http://imed.med.ucm.es/Tools/sias.html (disregard the numbers for amino acid similarity).
The term homology should not be used in this context. Homology means that two sequences have the same evolutionary origin (both have evolved from the same ancestral sequence). If sequences are sufficiently similar you may assume that this is due to homology, but you cannot calculate a percentage value for the degree of homology. As Richard said, sequences either are homologous or they are not.
Richard, Janus and Christian are right...we estimate % similarity and not homology most of the times...Homology cannot be quanitified.....and regarding 'how to interpret' the distance values....check the parameters you have used to caculate the distance matrix and see the details of the 'algorithm/model' used in the program either in the software helpfile, manual or in the original paper...softwares use different models for calculating the genetic distances and they have different basis for it...it would be advisable to go through these things before actually using the software and generating the data....
I guess that you pretend trace the evolutionary relation chip of your sequences, however, i recommend to use MEGAN that can help with that if you already have the multiple sequence alignment.You can perform the process whit some parameters mentioned by Ajay Saini as substitution model and can make the cladogram too for a more descriptive analysis. A tip: as a general rule, if you have a pairwise alignment over the complete gene sequence you can say that those 2 genes are homologous if have minimum at 70% of identity (in nucleotide sequences).
First of all homology cannot be measured. It exist or not. In percent, You can only measure sequence similarity, identity and gaps in alignment. To measure these parameters, I use GeneDoc (http://www.nrbsc.org/downloads/gd322700.exe ) and Blast...
Check the tool at imed.med.ucm.es/Tools/sias.html It takes an MSA and returns a square matrix with the similarities and identities between the sequences. I believe that is what you wanted.
@pedro : following above given link tool is only applicable for Amino acid sequence not for nucleotide sequences..can u suggest some another tool. as i m intrestead in prediction % similarity and identity of ucleotide sequences.
@Priyanka: I do not know a tool like that for DNA but I will make sias work with DNA. Actuallly, i think that for identities It should work as It is. Please, give It a try
for percentage similarity I think JalView software is good option....You can generate protein structure, run MSA and find sequence siiliarity in one tool