First of all 16s rRNA genes are used for classifying and identifying of bacteria. Eucaryotes have different sizes of rRNA, but they are also used for these purposes. There is a few reasons why they are used: these genes are universally present in all bacteria, they have conserved, essential function, so they evolve slowly and number of random mutations can be correlated with the evolutionary distance between species.
First of all 16s rRNA genes are used for classifying and identifying of bacteria. Eucaryotes have different sizes of rRNA, but they are also used for these purposes. There is a few reasons why they are used: these genes are universally present in all bacteria, they have conserved, essential function, so they evolve slowly and number of random mutations can be correlated with the evolutionary distance between species.
Actually, it is not only 16s being used, but also other genes. The reason for 16s to be quite 'successful' is simply because the primers are very conserved, so that the method can easily be applied to various organisms. Another advantage is that 16s is located on the mitochondrion and therefore available in numerous copies. So, it is easy to get sequences even from old or insufficiently stored material. Finally, because mitochondria are nearly only inherited from the mother, there should usually only be one version in the genome of an organism (haplotype). In nuclear genes, you may have two different alleles, which means that you need to use cloning to separate them. This was not done during the early years of molecular phylogenetics. Meanwhile, most researchers study several different genes, combining also mitochondrial and nuclear genes - and it will probably not take long until it becomes common practice to study complete genomes.
The more general term is 'small subunit ribosomal RNA.' This is both the 18S (eukaryotes) and 16S (bacterial, archaea). SSU is universal and highly conserved; it does not have a codon structure, it can be aligned unambigiously. Early uses were by Norm Pace; his papers provide the logic.
As is has been already answered, 16S is present in all bacteria. In eukaryotic organisms, the 18S gene can be used with the same aim. It encodes the small subunit of the ribosome, and therefore, it is highly conserved (doesn't evolve as quickly as other "less essential" genes for the machinery of the cell. The other two genes encoding proteins implicated in the ribosome formation can be also used to study phylogenetic relations, but this is the most common one, because its medium size is big enough to contain conserved regions, which allow to study relations at up-species level, and some more variable ones, which allows distinguishing between species.
The other advantage of this gene is that for describing a new species, it is mandatory to sequence and deposit in an available site (as NCBI), so you can have access to the sequences of all described species and use them to identify any bacteria that you are interested in.
1/ It is not 16s rRNA the term to use but "SSU rRNA gene sequences" !
Indeed rRNA is not often used, because it is unstable and we do not have the tools we have to deal with DNA. Thus what is most studied is:
- either amplicons derived from PCR on DNA using universal or specific primers.
- or amplicons of the cDNA molecules obtained from rRNA using RVT.
The latter is thought to be more representative of active bacteria and less prone to show the presence of dead bacteria.
2/ SSU rRNA gene sequences can be derived from:
- Bacteria and Archaea
- Eukaryota
- Mitochondria
- Chloroplast
- and several other Organelles, not well known by most biologists.
- Note that a given organism can also host one or several symbionts.
3/ Why rRNA gene sequences ?
- Because this is the only gene for which primers with broad range taxonomic coverage can be obtained; using any other house keeping gene, no primer can be designed that can amplify any better than a given phylum (ofter coverage is worse).
- Because not other gene can be found in the public databases with sequences covering so many clades and species.
- For Prokaryotes, the deposit of a new species name (new isolate) requires the SSU rRNA gene sequence to be deposited
4/ Why SSU rather than LSU ?
For historical reasons. In the good old times, sequencing was difficult. For a while there was a competition between SSU and LSU, when sequences were directly derived from rRNA using RVT instead of DNApol to Sanger sequence the molecules (no cloning required).
Then with PCR and sequencing automates, it was easy to entirely sequence the SSU gene (mostly 16S) but not the LSU. SSU won, despite the fact that LSU has a better taxonomic and phylogenetic signal.
5/ rRNA gene sequences are not always that good.
For Fungi for example, the signal is not very good, and people rely on the ITS1-5.8S-ITS2 sequences to identify fungi at the species level.
However, ITS domains have a good taxonomic signal but almost no phylogenetic signal ==> if the sequence obtained is not in your database, the the only thing you know if "nothing", this fungi can be anything.
This leads to a second problem, it is almost impossible to "curate" public ITS sequences from bad annotations (as is done by Silva, RDP or PR2 for SSU sequences).
Even for bacteria, SSU rRNA gene sequences can be poorly informative. For exemple in Enterobacteria, it is very difficult to use a sequence for identification at the species level.
6/ Contrarily to what said by somebody here, rRNA genes are most of the time present in multiple copies in a genome. Homogeneity of sequences is insured by the process of "gene conversion".
A few slow growing bacteria (mostly Mycobacteria) have single copy, E. coli has seven, many bacteria can have ten or more. Most animal have hundreds, that can be spread on two or more chromosome, althought they are generally repeated in tandem.
==> Using rRNA gene sequences, it is difficult to assess how many cells are present in a given sample, when you dont know how many rRNA gene copies (operons) there are in each genome...
Finally for Viridiplantae or Metazoa, other gene sequences are usually used. They are more discriminative than rRNA gene sequences, and contrarily to bacteria, archaea and protists, scientists interested in Viridiplantae or Metazoa would rarely ask the question: "how many different metazoa are present in this sample" using molecular tools. Rather they would for example ask "how many different nematodes species are present in this sample", and in such case, you need only to use primers with a good coverage for nematodes and a gene with a very good resolution at the species level for nematodes.
And there are pages to write on that subject.
Please consider doing some bibliography or even read a text book, next time, before asking a trivial question. There are definitively too many naive question on this site :-(
I would like to suggest that the main reason people use these genes is "historical constraint." Back in the days before PCR, it was easier to extract enough rRNA via differential centrifugation to get sequenceable or RFLP-able samples. (The same reason mtDNA became popular). Once there is an established set of primers and a lot of sequences in GenBank to compare, there is a sort of snowball effect where people tend to use the same markers.
Contrary to some statements above, I think that rRNA genes are generally more difficult to align than protein-coding genes, and while it is true that they have highly variable and largely invariant regions, the variation often includes chaotic length variation that renders the sequences impossible to align plausibly. This has been the driving impetus behind the development of the direct optimization software POY.
Last, unless I am mistaken, the bacterial 12S rRNA is homologous with the eukaryotic 18S, and the bacterial 16S is homologous with the eukaryotic 28S.
"Last, unless I am mistaken, the bacterial 12S rRNA is homologous with the eukaryotic 18S, and the bacterial 16S is homologous with the eukaryotic 28S."
Yes you are mistaken
There is no bacterial 12S, strictly speaking, but there is 12S mitochondrial (SSU), of bacterial (Proteobacteria) origin.
16S prok, and 18S euk, are SSU rRNA
23S prok and 28S euk, (or 26-28S) are LSU rRNA
12S is from organelles and is SSU
Now in some species a particular gene might be interrupted. Exemple :
The LSU genes of Salmonella typhimurium LT2 are known to carry intervening sequences (IVSs) at two sites, helix-25 and helix-45, which are excised by RNase III during rRNA maturation, resulting in rRNA which is fragmented but nevertheless functional.
Thanks for clarifying that. I was thinking of the mtDNA ribosomal genes. So am I correct in stating that the mitochondrial 12S is homologous with the small SSU of bacteria and the nuclear 18S of eukaryotes, and the mtDNA 16S is not homologous with the bacterial 16S but with the bacterial 23S and the eukaryotic 28S? The discussion above does not seem to be entirely clear about whether nuclear or mitochodrial genes are being discussed (in eukaryotes).
I would like add up to Richard Christen, for his following sentences " Because not other gene can be found in the public databases with sequences covering so many clades and species." Although we are talking about bacteria here, given that the sequence is 16S rRNA, is rRNA rigorous enough for eukaryotic species' taxonomic classification? If that is the case, I would go for cytochrome oxidase 1 (COI) gene, since it is widely sequenced for many species and well deposited in the databases like NCBI Genbank and BOLD. Moreover, compared to COI, is not the percentage cutoff for assigning specimens to species relatively arbitrary (for COI it ranges between %1-3) for rRNA when used as a marker? Hence I am wondering, which one of these genes (COI and rRNA) is more superior in terms of representation of sequences in the databases and accuracy of taxonomic classification.
For Bacteria, the SSU rRNA gene sequences are the gold standart.
For Archaea, the SSU rRNA gene sequences are the gold standart.
For both, when a new strain is isolated, the SSU rRNA sequence is required for validation of a name. see http://www.bacterio.net.
For Eukaryota:
For unicellular Eukaryota (protists) the SSU rRNA gene sequences are the gold standart (see http://ssu-rrna.org).
For Fungi, Animals and Viridiplantae, SSU rRNA gene sequences are often not very resolutive. Other genes are often used, but there is no universal primers allowing to amplify every clade (as it is almost the case for Prokaryotes and Protists - but see an exception with Foraminifera for example), except when ITS are used for Fungi, but ITS have a very poor phylogenetic signal, even though they allow to identify a species or better (I can detail that if needed).
Finally:
- For Bacteria the SSU rRNA gene sequences always allow to assign a taxonomy down to the genus level, but not always down to the species level. Some different species may have exactly the same sequence (two mycobacteria for example), and for Enterobacteriaceae (an example), the situation may be problematic when the entire sequence is not available (as in NGS analyses). In these cases, it is best to use a house-keeping gene (tuf, rpoB,....)
- When there are several operons in a bacterial genome, they may have different sequences, some may be misleading concerning taxonomy, see PubMed id: 8742634.
- The definitive identification of Bacteria down to the species level or better is now based on the combination of several house keeping genes (see for example PubMed id: 24409173). The best genes combination may depend on the genus analyzed, and remember you need to use different primers for different genera!