CMAP is a wealth of data, but I'm finding the sheer volume of transcriptional signature data generated given a drug (e.g., /sig amitriptyline) very hard to work with.
The CLUE website will use CMAP to generate a matrix of UP/DOWN gene expressions loading in and extrapolating from from the L1000 dataset.
In the example above, amitriptyline will return 14 cell lines, three different doses and two time/exposures (6 hr and 24hrs).
I concentrated on 10uM and 24hr exposure and downloaded the matrix (gct file). Using R I separated the UP from the DOWN genes, keeping only those > +1 and < -1.
I then performed an intersect operation between the gene entrez IDs across the cell lines. Turns out, there isn't a single identical gene expressed across the cell lines! Any idea how I can rectify this issue? We are expecting the down regulation of certain genes seen in pain models.
I am struggling to find any publications where the number of cell lines were factored into a computational analysis. Note: I understand I should match my cell line for the cell lines in the LINCS1000 assay. I do not have a matching cell line.