I am performing a metagenomic analysis and found a fraction of sequences quite different from those found in databse (40% of identity and 70% coverage). It could be considered non homologue? Is it a potential new genes?
I performed a shotgun sequencing and the target sequences are CAZymes (carbohydrate-active enzymes) in microbial communities from soil.
Carbohydrate-active enzymes (CAZymes) are responsible for the synthesis and breakdown of glycoconjugates, oligo- and polysaccharides. They typically correspond to 1-5% of the genes of a living organism. The functional annotation of CAZymes in genomes is challenging for non-specialists, due to the varying modularity of these enzymes and the grouping of enzymes with different substrate specificity in the same sequence-based families.
The CAZy classification is widely used by the scientific community by using of tools for unambiguous high-throughput modular and functional annotation of CAZymes in sequences issued from genomic and metagenomic efforts.
As i mentioned previously, i found a number of divergent sequences but assigned as cazymes deposited in CAZY Database.
Hope the informations shed light to help to find a answer.
Im looking for CAZYmes in my metagenomes (shotgun library). The search was performed using HMMR software, which allow to assign sequences based on conserved domain. The domains "signatures" are used as "probes" (result of multiple alignment from sequences available in cazy database) to find the cazymes in my samples. I used Illumina platform to sequencing. The sequences were not assembled.
Hi guys, I am no expert in NGS sequence analysis but I have experience in the phylogenetic analysis of genes and some in bacteria.
As I have learned, different genes have different patterns of divergence, so as Artur said, it is a case by case analysis. Another problem with bacteria is that they are promiscuous, getting genes by horizontal transfer, so that some genes will show high identity because they came from the same ancestor but they do not reflect the evolutionary history of the genome in which they are located. So you have to be careful with the analysis.
Nir was wright suggesting the use of 16S rRNA since those genes are not transmitted horizontally. But if you cannot do it, then you have to stick with the data you have.
What I would recommend, is to try to define which are the orthologs you have, discerning de different members of gene families that you can. Then, do a phylogenetic analysis of each gene and include the genes from the closest species you can find, but use several species in which you can determine was is the minimum and the average amount of divergence expected for that particular gene to call a different species. Then compare all the genes to see if they follow the same pattern. Ideally, you will use the same species in all comparison, but I know some times it is not possible, so you try your best that they are the same.
Those genes with different pattern have to be taken with precaution, since they could be the result of gene transfer. Those with the similar phylogenetic pattern can be used to do a combine phylogenetic analysis and determine the total divergence among the known species and your unknown species.
Without having the 16S, it is hard to have a conclusive results, but this way you should have a good idea of the phylogenetic relationship of the species you are working with.