I'm doing some preliminary analysis on a list of genes of interest, and I was using a data set that contains RNA-seq data from patients and the corresponding clinical information, including survival. For a given gene, I split the patients up into two groups, "high" expression (>= median expression of the gene) and "low" expression (< median expression), plotted Kaplan-Meier curves, and did a log-rank test to see if survival was significantly different. I was surprised that none of the results were significant, even though those genes have been shown to divide patients into two groups with significantly different survival in other data sets.

I checked my data and realized that the genes were not normally distributed--I had thought they were already normalized, but I guess they weren't. My question is, though, would whether the data is normalized or not make a difference to the outcome of my analysis? I'm dividing the patients based on the median, so the patients in the upper 50% would be the same regardless of whether I normalized or not, and the patients in the lower 50% would be the same regardless of whether I normalized or not, since it's based on rank, not so much the actual values. Right? Or am I missing something?

More Lon W. R. Fong's questions See All
Similar questions and discussions