What kind of stuff we must have from the research, so that I have to do protein or nucleotide BLAST? why we do BLAST? What do we want to get from BLAST?
BLAST algorithm has many advantages. The first and most primary function is to figure the "sequence match or sequence identity" whether its a nucleotide or amino acid sequence. The BLAST tool basically compares the sequence of our interest with the available sequences in the database (All available genome databases). This helps us to identify sequence similarity across genomes, to identify right primers for research work, to understand if there are any mutations in the sequence of interest etc., to name a few.
BLAST is (an abbreviation of Basic local alignment search tool) is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA (Altschul et al., 1990).
I herewith attached the procedure for BLAST and uses of BLAST presentation for your information.
In our lab, we use blast framework to identify sequences similarity aiming to identify biological species. We use sequencing technology to get a DNA fragment (sequence) of a specie and we blast the sequence we´ve obtained in some database like Bold Systems or NCBI and we get as result what specie this sequence is similar to. If the similarity is up to 98% We assume that the individual we analyzed belong to that species.
The reason one might want to use BLAST is to investigate homology between aligned sequences as well as conserved domain searching in the case of protein protein BLAST.
You might ask why we even use protein protein BLAST instead of just comparing their corresponding sequence alignment, the reason is because of the degeneracy of the aminoacid codons.
Why do we use a tool like Blast which is a database search tool ? The basic use/necessity of a database search is:
The biological data in the form of DNA / protein sequences are extremely complex and impossible to understand ab intio. When we do not understand any thing the best option available is to ask the question "Is there anything similar to this is already known ?". To answer this question we need to compare the new sequence to already known information (sequences). Databases are actually collection of existing information in digital format.
So, when a new sequence is identified, as we can not identify it ab intio, we compare the new sequence to sequences available so far in the databases. There are many database search tools (Pairwise alignment based) available for this comparison which are broadly of too types 1. Rigorous algorithms which compare the new sequence to all sequences in the database to find homologous sequences (eg. Needleman and Wunch & Smith and Waterman algorithms). 2. Heuristic algorithms algorithms which employ a shortcut to identify the few hundred database sequences which are likely to be related to the query sequence and compare the query (new) sequence only to them. Because of this these algorithms can find homologous database sequences extremely fast. BLAST is one such heuristic algorithm extensively used for database search.