10 October 2016 2 2K Report

Hi everyone,

    Here is the thing, we have recently submitted a paper about tumor prognosis biomarker, and a reviewer asked us to validate our findings in a given microarray dataset on http://www.ebi.ac.uk/arrayexpress.

    After I downloaded the Raw data (.CEL files) and opened it with R (which I have never used before), I can only find manuals of full-scale expression analysis which took hours on my computer. After that, I have to pick out that specific gene's expression, integrate it with the survival data, and go for the validations, which is quite time-consuming.

    I am wondering if there is a way to only extract the expression data of that specific gene(e.g. gene name 284065_at in HG U133 plus 2) from the Raw data files(in this case .CELs) and get the results I need much faster.

    Meanwhile, does anyone knows how can I combine that expression data directly with survival data, (or better, direct data processing online without using R on my computer)? Or if you know a way to improve my work flow(look below for the codes I used), please share with us :)

P.S. the R code I am using is from this site: http://jura.wi.mit.edu/bio/education/bioinfo2007/arrays/array_exercises_1R.html

first installed Bioconductor, then: 

1.Use the pull-down menu (File >> Change dir [Windows]) to go to the directory where you put the raw data (CEL files).

2.Load the "library" that contains the Affymetrix microarray code we'll want to use with the command

library(affy)

3.Read the CEL files (first command below) and then summarize and normalize with MAS5 (second command below). This could take a few minutes.

affy.data = ReadAffy()

eset.mas5 = mas5(affy.data)

4.The variable 'eset.mas5' contains normalized expression values for all probesets, along with other information. Let's continue by getting the expression matrix (probesets/genes in rows, chips in columns).

exprSet.nologs = exprs(eset.mas5)

# List the column (chip) names

colnames(exprSet.nologs)

5.exprSet = log(exprSet.nologs, 2) (not quite sure what this does)

6.To print out our expression matrix (as with most data), we can use a command like

write.table(exprSet, file="RESULTS_mas5_matrix.txt", quote=F, sep="\t")

to get a tab-delimited file that we could view in Excel or a text editor.

7.While we're doing Affymetrix-specific preprocessing, let's calculate an Absent/Present call for each probeset.

# Run the Affy A/P call algorithm on the CEL files we processed above

data.mas5calls = mas5calls(affy.data)

# Get the actual A/P calls

data.mas5calls.calls = exprs(data.mas5calls)

# Print the calls as a matrix

write.table(data.mas5calls.calls, file="Su_mas5calls.txt", quote=F, sep="\t")

PPS: these microarray datasets are really amazing, wish I understand more of the tools to utilize them~

Similar questions and discussions