I want to get Entrez IDs for Affymetrix probe sets (hgu133a) to map them on genome-scale metabolic models (GSMMs). Generally, GSMMs use Entrez gene IDs; therefore, to integrate gene expression profiles with these reconstructions, ID conversion between probe sets in respective Affy platform and Entrez IDs is required. The problem with this conversion in Bioconductor is the presence of multiple mapping between identifiers. A simple example would be:
> select(hgu133a.db, c("200080_s_at"), c("SYMBOL","ENTREZID", "GENENAME"))
'select()' returned 1:many mapping between keys and columns
PROBEID SYMBOL ENTREZID GENENAME
1 200080_s_at H3F3A 3020 H3 histone, family 3A
2 200080_s_at H3F3B 3021 H3 histone, family 3B (H3.3B)
3 200080_s_at H3F3AP4 440926 H3 histone, family 3A, pseudogene 4
The question is, what is the choice here? 3020, 3021 or 440926?
One should notice that the resulting Entrez IDs will be used for GPR (gene-protein-reaction) purpose; therefore, the expression level of all these three Entrez IDs is the same.
Thanks in advance for sharing your thoughts.