On a given microarray design there are multiple different probes spotted for many genes. The (normalized) signals of the features (all referring to the same gene) often are quite different (log2 values can vary between 2 and 16, so essentially from "almost undetectable" to "completely saturated").
If a gene set analysis or an over-representation analysis is performed, there should be one value per gene.
How to select which signal to use for the gene? I don't feel good to take the average of all the multiple features, because they are often so different. Taking the highest signal only also seems to be wrong.
Any ideas?
The attached file shows a table with example data (from an Agilent Microarray) with 5 different probes addressing the gene "PRDM". The last 4 colums show the log signal intensities for 4 different samples. The values range from 3 to 10, so there is a more than 100-fold difference in the signal intensities between the probes.