Hi,

I would like to find out which KEGG pathways and/or gene ontologies (GO) are present in a microbial community from shotgun metagenomic data. I understand it can be done without pre-assembly of the reads, and I would prefer such an approach, although advice with assembled scaffolds is also welcome. The environment I am studying (plant aerial surfaces) is not so well known, i.e., not a lot of genomes are available.

What I have:

- Shotgun metagenomics reads from Illumina (2x150) from three biologically independent samples

- Access to a large computer cluster, although online methods might be preferable, as software installation is complicated.

What I need:

- A list of KEGG ids present (maybe above a threshold of abundance) in the samples.

- A list of gene ontologies present (maybe above a threshold of abundance) in the samples.

What for? To know what metabolic capabilities are present in the microbial communities.

What I have considered:

- GhostKoala. Advantages: Best option so far. Well integrated with KEGG. Disadvantages: I would have to submit a subsample of my reads, due to file-size constraints (Not too bad, right?). Only KEGG, no GO (I can live with that).

- Blast2GO. Advantages: Seems to be just what I need for GO. Drawback: Very expensive. Only a one-week trial version for free.

- Kaas. Advantages: Easy to use. Can upload clean fasta files directly. Disadvantages: It only searches against a small number of reference organisms.

- Blast against nr or another database (Swiss Prot?). Advantages: Search against a really large and diverse query database. Disadvantages: Collect and parse the results seems very difficult and would give a messy collection of reference sequence names.

So far, my plan is this: clean the reads with trimmomatic, join forward and reverse pairs with vsearch, translate to amino acids in all three possible reading frames, discard all reads with a stop codon, take a random subsample of 1 to 3 million reads, submit each biological sample separately to GhostKoala. Does this sound right? Are there better options I have not considered?

Thanks for your time.

More Eneas Aguirre-von-Wobeser's questions See All
Similar questions and discussions