If you want to do taxonomic profiling of your ngs data, mainly comprising the variable regions of 16S (v3-v4), which database you will prefer and why ?
It does depend somewhat on what you need from your study. According to Pat Schloss (who authored Mothur) the alignments in Silva are done with the most precision, so that would be my first choice. Since I work for a company access to Silva is not free so I use Greengene, which works fine for my needs.
@Frank R Burns, I used RDP for my V3-V4 and on average ~30% of the sequence were of unknown phyla of bacterial origin. So, I wanted to refine and i tried the other two database, i found similar results with few variation. As for variation, greengene showed some uncultivated candidate phyla like TM6, TM7, OP3, WS2.
Hi Saurav, That seems pretty high at the phylum level. Although they are 30% of the reads, how many different taxa do they represent? I am suspicious that mitochondrial or plastid DNA from a eukaryote might be being amplified and seen as bacterial, but not classified further. If its relatively few taxa I would try blasting the sequence in genbank(s) to see if it provides any insight. Good Luck!