Hi everybody,
I have a set of genes that came out as differentially expressed in an RNA-seq experiment I conducted. The set itself looks reasonable and I would like to look into potentially underlying regulatory mechanisms. In principle, I am well aware of the algorithms used for motif detection and discovery, there is just one question I did not find an intuitive answer to: How many bases upstream should be included in an analysis like that? I've seen some paper using 1 kb, others only 500 bp, others using quite different values. All of these values seem rather random to me, so here's my question: Are there any objective reasons behind specific upstream bp-ranges included in motif-detection analysis? Like, is there maybe a paper I didn't find so far that shows conclusively that transcription factor binding motifs are found (and used!) 1 kb upstream of the promoter in 90% of the cases? That's something I could accept as a good reason. Any hints are welcome.
Best,
Lukas