Is it necessary to normalize gene expression data when comparing survival between patients with and high and low expression of a gene based on median?

More Lon W. R. Fong's questions See All

How to handle big data sets (>10GB) in R Bioconductor packages?

I am running an R script that downloads and preprocesses all the available methylation data sets from TCGA. I'm using the Bioconductor package MethylMix for this. However, when I try to process...

08 September 2018 2,667 6 View

How can I identify which genes in a list are most functionally similar to a given gene?

I have a gene, let's say p300, and I have a list of genes that are co-expressed with it. I want to find which of these genes have the most similar function to p300. What would be the best tool for...

06 July 2018 1,359 5 View

Can blind docking be used to determine which ligands are most likely to bind a known binding site?

I know that blind docking is most often used to determine an unknown binding site, but can it be used to determine whether a ligand is thermodynamically more likely to bind a given (known) binding...

09 October 2017 7,630 4 View

Is it okay to passage suspension cells by simply adding more media to the culture flask?

I have some HL60 and U937 cells I'm growing in 50mL of media in T75 flasks, and I was wondering if it would be okay to do a 1:2 split by just adding 50mL more media to the same flask instead of...

02 March 2017 5,165 3 View

How do I Subset a Matrix in R by Selecting Rows Whose Values for a Column Equal Those in a Vector?

Hello all, I have a matrix of gene-expression data from 50 samples whose rows are the probes and whose columns are the samples. There is an additional column consisting of the Entrez gene IDs for...

08 September 2015 6,639 8 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Is there a problem with my RNA pellet?

Hello, I am currently having problems with RNA extraction. I am using mouse liver (C57BL6J), and I have extracted RNA from mouse liver before. Before this experiment, my final RNA pellets were...

11 August 2024 7,082 3 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Strugglling with m6A dot blot any suugesstion ?

I have been doing the m6A dot blot for a while with no improvement, I am extracting the RNA, and I can see the dots although the three biological replicas give a different reading on the memberan...

10 August 2024 8,539 5 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

RNA Extraction Using Hot Borate Method No Longer Working?

I've been performing RNA extraction on cotton petiole tissue for a few months now using the method described in the following paper, a derivative of the typical hot borate method...

08 August 2024 9,882 2 View

Does Anyone have expertise in in vitro transcription and RNA pull down assay?

I am currently working on LncRNA; to know the lncRNA-protein interactions I want to do RNA pull down assay, so I need to design primers with T7 promoter. I need assistance in this regard.

07 August 2024 6,622 1 View

Rolando Garcia-Milian Popular answer

Dear Lon,

1. Please make sure you are working with normalized data across samples before making any comparison.

2. Try comparing the 1st quartile (low-expressing) vs the 3rd quartile (high-expressing) instead of using the median.

3. If you are going to use a parametric test to compare your groups, applying some transformation (e.g. logarithm) in order to make the distribution normal is also recommended.

4. Adding 10E-04 to all genes it is also helpful when dividing by 0 in order to capture increase or decrease in gene expression.

Regards,

Rolando

Rolando Garcia-Milian

Albino Bacolla

In my experience plotting log2(normalized_rsem + 1) and dividing the samples by the mean value (high = above mean; low = below mean) is sufficient to capture significant differences. Although it may be useful to try different cutoffs, the statistical power decreases with decreasing sample size. It may be instructive then to plot the confidence intervals to visualize the extent of overlap between lines.

Marko Lucijanic

you could also try to run ROC curve analysis using survival status as classification variable and gene expression data as numerical variable to obtain potential cut-off value of gene expression that you can further test in survival analysis. This is not always optimal approach but might help at finding best cut-off.

Additionally, if you did your molecular analyses on both diseased and control patients, you can run ROC curve analysis using disease status as classification variable to find "optimal" cut-off value of gene expression for discriminating diseased and control subjects, then test whether this cut-off value discriminates survival among diseased patients (some kind of normal vs high expression among diseased). I'm not sure whether your data have to follow normal distribution for ROC curve analyses to be reliable or not, however this is only auxiliary first step and your cut-offs need to be tested on survival data as a next step and you can try to play around given cut-off value to see whether it affects prognosis or not. Data separated at median or at quartiles do not need to follow normal distribution.

If you don't obtain significant result, but see that survival curves diverge apart, this might be due to too small number of events in your data-set. Your study is then probably underpowered to detect statistical significance for this particular analysis.

I hope this would help you, Kind regards, Marko

Ajit kumar Roy

Hi Lon-

Normalization is a precondition for analysis of Gene expression data.Therefore. one must do normalization of data.with best wishes.

ajit

thanks for recommending my answer@

Syed Ismyl Mahmood Rizvi
Yehya A. Salih

Riccardo Aiese Cigliano

Dear Lon

RNA-seq data does not follow a normal distribution and even after normalization it is very difficult that you will get it. Indeed, the distribution of the reads counts follows a negative binomial distribution, in fact, most of the programs which are used to perform statistical analysis of RNA-seq data do not apply models based on normality. So my suggestion would be to apply non-parametric tests to check for differences in survival.

Kind regards

Riccardo

Bioinformatics data analyst at www.sequentiabiotech.com