Is there any pipeline/publication(bioinformatics) to identify microbial sequences in Plant RNA-seq data?

Aligning your reads to a database/index of microbial sequences is a quick way to identify potential sequences of microbial origin if you have some idea what kind of microbes to expect in your sample. However, if the microbial sequences in your RNAseq data don't match well with what's in current databases then you may miss things. There are also some issues with Blasting very short sequences (eg. reads). How short are your reads?

Another approach would be to map reads to the genome of you plant of interest and keep reads that do not map. You can do this with Bowtie (--un option) fairly quickly. Use these un-mapped reads for de novo transcript assembly with a program like Trinity. Now you have longer sequences to use as Blast query with the added benefit of already filtering out most of the data that maps well to your plant. To be even more thorough you can use these sequences to make phylogenies, which will give you a good idea whether they are more closely related to sequences from other microbes or plant lineages. Take a look here for some phylogenetic methods used to look for lateral gene transfers, which is kind of similar to what you're trying to do: http://www.cell.com/current-biology/fulltext/S0960-9822(17)30138-0

Rajesh Kumar Gazara

Hello Iqbal,

First thing, You can download microbial sequence data and create a database. Then you can perform blast with reads.

Second, download microbial sequence and create a index file of microbial sequence. Then you can map your reads over the index file (using any mapping tool.) I suggest you to use STAR (very fast).

or you can use both way to be sure.

Good luck.

Cameron J Grisdale

Salvatore Camiolo

I agree with Cameron suggestion. Mapping the how reads dataset to the microbial genomes would also have some additional issue. It may happen during evolution that the genome of some microbe can be integrated within the genome of the host organism. I do no know if in this case the coding sequences of the microbe will be expressed as well but, if so you may end up revealing the presence of a microbe in your plant that actually is not present. I used the Cameron approach (I used SPades instead of Trinity to assemble the transcripts of unmapped reads on grapeveine reference genome) and it works very well. I also tried to align the RNAseq data on the available virus sequences and also this approach proved to be extremely informative on the viruses infecting the plants I am working on

Muhammad Shahzad Iqbal

Thank you all. Its really making some sense now to do this task. I have tracked down the viruses back to their corresponding samples and now want to take it 1 step further.

To be more clear about this work, I have RNA-seq data which shows some virus sequence contamination in it. But these virus seqs are only partial sequences. I want to search them by using the contig so upstream and downstream seqs of those viral sequences can tell which could be potential microbial host of that specific virus. I don't know how much this strategy is accurate or feasible. So, Please let me know if that kind of work can be done by any specific way.

Is there a problem with my RNA pellet?

Strugglling with m6A dot blot any suugesstion ?

RNA Extraction Using Hot Borate Method No Longer Working?

Does Anyone have expertise in in vitro transcription and RNA pull down assay?

E.coli contamination in human RNA seq data ?

How do soil microflora interact with plant roots and influence plant nutrition, health, and productivity?

RNA later for the preservation of RNA in fecal samples at room temperature for one day (37°C)?

Do you have good tips for seaweed tissue preservation in the field for post RNA extraction?

What are the best (less toxic to human) methods of microbial lab fumigation?

What is the acceptable p-value cutoff for GO enrichment analysis ?