Hi all,
this one is a (I guess) tricky question...
RNA virus discovery from metagenome/metatranscriptome dataset (overall from environmental samples) is particularly difficult because of their VERY DIVERGENT genome sequences, with poor relationship with what is available in reference sequence databases.
Can you recommend a "typical" protocol for this?
I found 2 "versions" by now:
**#FIRST PROTOCOL#**
- Assemble reads with Trinity or metaSPAdes.
- Do tBLASTn with the generated contigs/scaffolds against a database made of RNA virus proteins (ssRNA and dsRNA viruses). Use an e-value cutoff of