I'm a novice to RNA-seq analysis and DGE, hoping to get some good ideas moving forward.
For context: I have 15 samples, 3 replicates each of 5 sample types. They're mRNA samples obtained from bacterial cultures enriched on minimal media with different carbon sources for each sample type, using a starting inoculum of field soil. We've aligned the reads back to a metagenome taken from the same soil and now I have a dataset consisting of about 74000 transcripts. I've used DESeq2 to normalize the dataset and have been looking at the patterns within using R.
Question 1: how (if at all) should I filter out the initial dataset to remove low-abundance, low-information transcripts? One method I've tried is to remove transcripts that have above 1 or 3 zeros across all 15 samples, but I'd like to know if there's a more optimal approach, as I think this will help with my confidence down the line.
Question 2: how to conduct differential gene expression in the context of my specific design? I want to determine which transcripts are significantly enriched (or depleted) in each of my 5 sample types, particularly if there are some transcripts (or KEGG categories / subcategories, as I have that information as well) that are specifically enriched in one or two sample types relative to the others. One method I've tried is to compare the transcript expression values for each sample to the mean expression (i.e. making a new column of mean values in the DESeq object and calling results() on it using the contrast of my sample type over the mean). However, even after DESeq2 normalization, there appears to be some variation in sampling depth between my sample types that is biasing what I'm finding. If I perform variance-stabilizing transformation on the dataset and look at expression patterns on a heatmap, I can clearly see clusters of transcripts that are more associated with one or two sample types than others, but I'd like to know what they are more precisely and with statistical significance.
Thanks so much for any help you can give me! If there's any more clarification I can provide, I'm happy to.