I am working in R and have micorarray expression matrices from Agilent in log10 format. These matrices have the following colums:

Structure: BrightCorner, E1A, Structural, and blank

Probe: (-)3xSLv1, (+)E1A_..., A_..., DarkCorner, DCP_..., ERCC-..., ETG..., and GE_BrightCorner.

And then Gene Symbol.

Plotting the expression of the first 20 subjects (attached boxplots.pnf file) shows that the data is probably not normalized.

Since I am used to working with data that has already been properly processed, I am not sure of how should I process this information to obtain a proper expression matrix. A very naive approach would be to simply ignore any row that wasn't mapped to a gene symbol, collapse expression by gene symbol, anti-log the values, normalize between arrays and calculate log2. But I don't know what kind of information is stored in all the other rows and how I should use it.

More Dario Rocha's questions See All
Similar questions and discussions