I am currently using miRDeep2 to identify miRNAs in a non-model organism (a butterfly). After running mapper.pl to map the small RNA reads to the genome, I used miRDeep2.pl to predict novel miRNAs of the species. Here are some of my questions:
I have 8 samples with 2 conditions (4 replicates for each condition). Do I have to run miRDeep2.pl for each sample individually and select the common miRNAs as the novel miRNAs?
There are no previously reported mature miRNAs or hairpins in miRBase for my species. When I set them as 'none' and include the mature sequence of the related species (all Lepidoptera + drosophila), the outputs are all classified as 'novel'. But in this case, I can't figure out how many of them are conserved (or share homology) in other species. Then I tried to include all metazoa miRNAs&hairpins or all Lepidoptera miRNAs&hairpins as the 'reference miRNAs&hairpins' of my species (of course they are not). I got different numbers of predicted miRNAs and known miRNAs in each case (mapped to either metazoa or Lepidoptera)...
What are the criteria to select true-positive miRNAs from all predicted miRNAs? Based on my understanding I should choose those with significant randfold p-value (labeled as 'yes') and those with high miRDeep2 score. It says that the range of miRDeep2 score is from -10 to 10, but I got many extremely high scores up to 1.8e+6...Why is that...Also, there are miRNAs with very high miRDeep2 scores and read counts, but the randfold p-values are not significant. Do I consider them as true positive as well?
How to deal with precursors showing substantial sequence redundancy? There are many identical miRNA loci from different chromosomal locations. Since I will do differential expression analysis of mature sequences, I need to exclude the extra loci in the downstream analysis. Which one of those loci should I choose as the representative? Do I have to manually look for the redundant loci and modify them?
I am working on similar project on a non-model organism and found some other guys working on non-model organism just used the whole miRNA sets to do the research. Maybe we could discuss it here.
Here is the paper about what I have said, Article Identification and Expression Profiling of MicroRNAs in the ...
Hi Yuelong Chen , I feel that the whole miRNA method you mentioned here is not quite reliable since it is possible that the small RNA sequences mapped to the known miRNAs are not true miRNA candidates. For instance, no corresponding hairpin structures can be found in the genome, no star sequence can be found in the sequencing data, etc. I have successfully predicted miRNAs in my species following the valuable suggestions from the first author of the paper that introduced the miRDeep2. You may also try it, or an alternative, miRCat2.