Dear Researchers,
I'm a veterinarian who ended up in the bioinformatic world as I am analyzing the RNA-Seq using samples from colons of 2 groups of dogs; 1 control group of 3 beagles where I have sent 3 colonic samples with histopathology as normal colon mucosa and 1 disease group of 3 dogs where I have sent 1 set of diseased and normal colonic sample for each dog with histopathology as diseases and normal colon mucosa. Basically I have 3 replicates from 2 groups of dogs, but with 3 different conditions (1 condition of normal dog colon, 1 condition of diseased dog with diseased colon, 1 condition of diseased dog with normal colon) making it up to 9 samples, which warrants the use of edgeR since I have less than 12 samples.
The colonic RNA samples were sent for RNA-sequencing in another DNA research lab and I honestly am not sure which protocol they have used; the information received was the sequences were proceed using CLC genomics workbench 12 to produce the excel file with total count, RPKM, TPM and CPM.
Machine used was Illumina HiSeq 2500 with single-read,50bp library; and sequenced with 15 million reads, where FASTQ file was retrieved.
The flow of RNA sequencing was done as below:
1. From FASTQ reads, the Illumina Adaptor was removed
2. From FASTQ reads, the unreliable bases were removed
3. The reads were mapped on the CamFam reference
4. RPKM was calculated.
I am in a dilemma now if I should re-process the raw FASTQ files myself, or by just using the current Excel file; as I do not have access to supercomputers in my facility.
I am trying to analyze the excel output using Rstudio with the EdgeR package for normalization and hopefully DGE analysis, but I'm pretty unsure of at what point my current excel data is at. Will I need to normalize it, or should I not be using the Excel but the raw FASTQ for the EdgeR instead (as the user manual says to not use FPMK in the EdgeR, but the raw data instead). Will total count and total read numbers be suffice to come out as a raw data for EdgeR?
I have up to 20,000 genes to analyze, I wonder if I do have sweet time to start from raw data.
Thanks everyone!