I want to know the list of phylogenetically close organisms of C. floridanus. How can I access this information. (C. floridanus is sequenced recently).
"Following a simplified version of the method described by Ciccarelli and Bork (science, 2006), we will be able to compare all organisms across all kingdoms using the 31 genes present in all organisms appointed by Ciccarelli et al as being more resistant to horizontal gene transfer (HGT) events. Table 1 represents the 31 gene names and their corresponding COG identification number."
"Each collected gene will be aligned separately and concatenated per organism. The final alignment will be submitted to PhyML for reconstruction of the tree of life."
COG0533 Metal-dep. proteases with chaperone activity
Perform a rpsblast search with those COGs to find species that hit all the COGs. Then, for each species, concatenate the sequences of best hit that hit the cogs for species that had at least one hit to all the cogs. Then once this is done for all species, create a multiple sequence alignment with mafft. Then, once aligned, transform the format from FASTA in to PHYLIP, and put into PhyML or RaxML to reconstruct the tree. Once the tree is built, you can easily see which species are close phylogenetically. Ask me if you need more clarification, I'm going for a short reply with a reference or two to get you started.
I spoke incorrectly a little in the above. One wants to create a multiple sequence alignment with each species and the sequence that hit the cog first. Then, concatenate the alignments after. In the above, I mentioned the concatenation first, followed by the MSA. It should be MSA followed by concatenation. Anyways, I wasn't being too careful with language. The best thing to do is read the Ciccarelli and Bork (science, 2006) paper.
You can use Mega software verison 5.0. which will very helpful to you to find out Phylogenetically similarity sequences and you can find the genetic distance with related species by using your nucleotide sequences. you can download it as free of cost!!!!
To keep this thread going, I pose the following question. How much computing horsepower will be required to concatenate and apply similarity algorithms i.e phyML etc to the alignments?
first you can do simply the blast to get some idea and compare your sequence, if you have, with sequence from close organism using CLUSTALW. Of course for this NCBI is the best one. I also agree to other people what they are saying
"How much computing horsepower will be required to concatenate and apply similarity algorithms i.e phyML etc to the alignments?"
IT can be quite intense depending on how many species/sequences you are interested in and how many COGs you want to use. However, with that said, I can use rpsblast, mafft, concatentate, switch from FASTA to PHYLIP format and then use RaxML to build an entire phylogeny from raw protein sequences in NCBI's refseq database in under a day on a very fast workstation...
But, I have created a toolchain to automate some of the work, and that does cut down on some of the time. Doing things the more manual way I describe may not be the best option, just the one I understand the best. Some other people's suggestions may be easier and faster to do.
mr Saurabh is right..but the question is..he wanted to know homologes..best option is genious..bcs it will automatically look for at NCBI and then multiple alignment can be performed to know the proximity
Hi, first you should find similar organisms from GenBank (NCBI), then select their approriate and comparable genes in their FASTA format especially their rRNA genes to determine their similarities for this copy the fasta format to MS word document, then use MEGA4 program in order to make multiple alignments and phylogenetical analysis to obtain phylogenetical tree formation and can see the similarity,
First use BLAST (native program on NCBI's site) in NCBI with few archives like ESTs, nucleotides and some others. It is very simple user friendly procedure. Then you can download chosen sequences and try to make phylogeny. MEGA, as it was mantioned above, is very good. Other choices: trial of Geneious if you have large amount of sequences. It depends on your will and skills. :) Personally I like MEGA for sequences editing possibilities and Geneious for annotation feature.
MEGA5 is the best answer...but I agree with Agnieszka that doing blast would first help you identify..which organisms to take ahead in this analysis and then you can proceed with MEGA
i also wish to know..........once we download sequences(FASTA format) say of COI of different species FROM NCBI......lengths are different. for COI for different species..how to ensure the lengths are of only COI...is there any need of editing them..before we go for multiple alignment..and then tree preparation and the analysis..i am using mega5
Orthologues can be searched on KEGG-SSDB server (http://ssdb.genome.jp) and EuPathDB Bioinformatics Resource Center (eupathdb.org). Aligen these orthologous using CLUSTALW and bootstrap phylogenetic tree can be generated using Unipro UGENE: Integrated Bioinformatics Tools (ugene.unipro.ru). To analyze the evolutionary direction submit these sequences to SPRING server (http://algorithm.cs.nthu.edu.tw/tools/SPRING/
@Arvinder Singh....for your question..i would proceed with protein sequences of COI...since they would have certain domains which would be conserved and i could relate to while considering COI sequences of other species, if there is domain hit then they are similar COIs...(although length of protein would not vary so much)....once I would get an idea which COI is of interest, then proceeding ahead is easy. I hope you got my point.
Because, you can get domain information from protein sequences...personally i trust proteins more, since it has more information... well, if you already have nucleotide sequence of a particular organism..blast it and see with which all species it matches (you can add filter for some specific species to be checked)...when you get the result, download it in tab delimited format. copy it in excel and filter according to identity (say 35 %)...whatever is less than that..you can omit them, whatever is left are your probable COI in different species...then proceed ahead or else you can proceed according to Manmeet rawat's suggestion. Its a very good option too....
@sonalji...my question still stands....when we download sequences for COI for different species..there lengths are different..is there any need for editing before going for multiple alignment
I agree with Sonal. Problem is when some sequences contains info about whole product of PCR and they are longer than gene. It is good to convert using MEGA to proteins, check domains and coding sequence and yhen re-convert to DNA again. MEGA does it very correctly because remembers DNA triples coding each aminoacid. I found in few older sequences from NCBI added end with noncoding sequence or just with added part of different genes. It happens.
@Arvinder Singh: best way is kind selectivity against different lengths, then checking these "different sequences" using protein sequences.
There are surely thousands of species of ants that are closely related to this species, but at the moment there are only a few hundred with gene sequences deposited in GenBank:
In order to be able to compare them, you need the same gene from each species, it is not possible to compare DNA polymerase gene from one species to a ribosomal RNA gene from another species. There are 56 "population Sets" in GenBank each of which is a multiple sequence alignment of the same gene from many different species of carpenter ants:
It may be useful to include the same gene from one or more species outside the carpenter ant genus as an "outgroup" when building a phylogeny of these ants.
Aaron Vose has a great suggestion for using many genes, but it is rather unlikely that all of those genes have been sequenced for many ant species. If you are interested in the other ants, you may need to settle for fewer genes and more species, rather than many genes but few species.
1. To identify the close organisms, you have to Blast it, so that you will know what organisms are probably close to your query sequence.
2. To study about the phylogenetic relationship, lots of softwares (e.g. PHYLIP, MEGA, PhyML) could be used. I will recommend MEGA as it is a very handy software.