I have 40 cancer cell lines that were treated with a specific drug, and drug response was quantified with an appropriate numeric score ranging form 0 to 1. I have gene expression data (untreated) for all 40 cell lines and I wanted to create a regressive randomforest model in order detect features (genes) associated with drug sensitivity. For feature selection I wanted to used rfe, Boruta or others. Basically, a classic scenario.

Here is my problem:

For all the cell lines, I have both RNA-Seq (Illumina) and Microarray (AFFY HGU133P2) gene expression data. RNASeq has been properly processed and normalized for library size in order to be used for RandomForest. (STAR+Cuffquant+Cuffnorm).

1 SCENARIO: R SQUARED ASSOCIATED TO SCORE

Separately, for both RNASeq and Micorarray, I measured the correlation between gene/probe expression and response score. I calculated the r squared and the r - associated p-value.

1883 genes have p60%+) between RNASeq and Microarray which is more than I expected, great.

2 SCENARIO: EXTRACTING FEATURES WITH RANDOMFOREST

When using Boruta for feature extraction, or other more relaxed feature selection methods, despite the very good overlap that I mentioned before, I end up with 2 different signatures that overlap by only 10% of the genes.

In an example, the gene with the highest variable importance in Microarray signature was validated to be associated to drug response in vitro using siRNAs. But this gene is not part of the RNASeq signature, even though it has a high R squared value (expression/score correlation) in both RNASeq and Microarray and was part of the "intersection" mentioned in point 1.

I would like to combine the information coming from both RNASeq and Microarray for selecting final genes in the signature. I was thinking about using only the overlapping genes from point 1 for randomforest model generation, but this way I would loose possibly important transcripts measured only by RNASeq and not microarray. The opposite might also be true in certain conditions.

How should I proceed? What would you do?

More Marco Bolis's questions See All
Similar questions and discussions