Microarray validation

Thomas Lettner Popular answer

Hello Saeid,

You did not specify what samples you have analyzed, so I just assume it was the human genome.

When your microarray experiment covered the whole genome, then I would suggest that you first analyze your microarray data with a bioinformatic online tool called DAVID (http://david.abcc.ncifcrf.gov/). Because, when you have many hundred or more up- and down- regulated genes then you can't impossibly confirm all of them with real-time PCR. This would cost too much, and many of the regulated genes will probably not be relevant for the biological problem that you are investigating.

I assume you have your microarray results processed as an Excell file. Then you should sort your data by fold expression (or Sig log ratio), beginning with the highest up-regulated gene on top and then descending down. The most down-regulated genes are at the bottom.

You also should set a threshold. I personally consider a threshold of 2-fold up- or down- regulation as significant, but this can vary according to the biological question addressed.

Then you copy and paste your data into the DAVID online form. Here you have to decide, how you are going to submit your data. You can choose from different options. E. g. You could copy-paste the mRNA accession number or the probe-set ID (if it is an Affymetrix microarray) or any other gene-identifyer into the online form. I think you can enter up to 3000 genes at once. Then you submit your data and DAVID sorts your genes into functional groups. (You will find instructions on how to use DAVID on the homepage).

Example: You copy-paste the 1000 (or as much as you want...up to 3000) highest up-regulated genes into the DAVID online form and submit your data. Then DAVID will sort all genes that have a similar function, into separat groups. All transcription factors will be in one group, cytoskeleton genes will be in one group, genes belonging to the RAS pathway will be in one group, genes of metabolic enzymes will be in one group, etc., etc. You should do this several times, and you can vary the number of genes that you enter. You can do it with the first 100 genes, then with the first 500 genes, etc. And of course you have to do the same with the down- regulated genes.

When DAVID has sorted your genes into functional groups, then the really hard work beginns. Now you have to decide what groups are relevant for your biological problem. E. g. When you are working on tumor cells, and DAVID identified a group of up- regulated RAS genes, then this group is of relevance. When you are working with cells of the immune system, and DAVID identified interleukins as a group of up-regulated genes, then this group will be of relevance. You will have to read a lot of papers in order to distinguish between relevant and non-relevant groups. The more you know about the biological problem, the better you can distinguish between relevant and non-relevant. But you should be careful, because some genes will certainly be relevant even if you don't find anything about it in the scientific literature. The decision is up to you and your know-how.

Once you have selected interesting groups, you could start to analyze the highest up- or down- regulated representatives of these groups with real-time PCR. When you have a group with five up-regulated RAS proteins and a group with ten down-regulated transcription factors, and so on, then you confirm the highest up-regulated RAS protein and the most down-regulated transcription factor with real-time PCR. By analyzing one representative of every group, you can reduce the number of real-times that you have to perform.

So in short:

1) Analyze your microarray data with DAVID.

2) Select relevant groups of genes.

3) Confirm representatives of each group with real-time PCR.

Once you have confirmed representative genes of every group, you still can decide to confirm more genes from these groups.

I have performed microarrays for my PhD as well. I analyzed about 3000 up- and 1000 down-regulated genes with DAVID, and I endet up with about 40-50 genes that I confirmed with real-time PCR.

I hope I could help you a little.

Good luck!

Thomas Lettner

Hello Saeid,

You did not specify what samples you have analyzed, so I just assume it was the human genome.

You also should set a threshold. I personally consider a threshold of 2-fold up- or down- regulation as significant, but this can vary according to the biological question addressed.

So in short:

1) Analyze your microarray data with DAVID.

2) Select relevant groups of genes.

3) Confirm representatives of each group with real-time PCR.

Once you have confirmed representative genes of every group, you still can decide to confirm more genes from these groups.

I have performed microarrays for my PhD as well. I analyzed about 3000 up- and 1000 down-regulated genes with DAVID, and I endet up with about 40-50 genes that I confirmed with real-time PCR.

I hope I could help you a little.

Good luck!

Luke Tregilgas

I would add to the great answer provided by Thomas by saying that there is some distinction between - validating the original expression changes found in the array data; and validating the correct identification of a differentially regulated process in DAVID. Using sequences from the array probes in RT-qPCR would provide evidence in support of the original array data, while using primer sequences separate from those used in the array probes would provide better coverage of the gene and act to corroborate that the chosen gene was indeed affected. It should also be noted that this is not necessarily a validation that the downstream process is actually affected, since transcriptome changes are not a direct indicator of changes within the proteome (not all RNA is translated into protein). Proteomics experiments and functional assays would then confirm the downstream effects of gene expression changes identified.

One common problem with microarray analysis is the focus on the outer extremes of fold-change values, while equally biologically relevant expression changes may be more subtle (lower fold-change). While the extent of fold-change is a good indicator, these are not necessarily those that are most significantly affected (identified by statistical values such as q-value and B-statistic). Applying appropriate thresholds to the q-value (false-discovery rate adjusted p-value) could allude to more significant, yet relatively subtle, changes in expression that would otherwise have been missed.

Best of luck.

How to create a figure for gene structure (exon/intron organization)?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How to confirm the site-directed mutagenesis result without performing NGS?

I can't see the ssDNA band after performing asymmetric PCR. Is there any way to do this?

How are iso-frequency contours plotted?

Is it possible to plot the atom-projected band structure using GPAW?