Hello Everyone I am a beginner in world of science, I want to know that what is Log? what is log transformation? and why we do it in gene expression analysis?
The main reason is the distribution of the data and the symmetry of the ratios.
Non transformed data usually is quite substantially skewed, while it approximates (often, depending on the data) normal distribution when transformed. Ratios on the other side are not symmetrical around 1 (and this is the predominant type of rations you observe in comparative experiments) but become symmetrical around 0 when log transformed.
Keeping those things in mind I would highly recommend this video by Rafael Irizarry:
https://www.youtube.com/watch?v=3huF0DwxCtU
The first few minutes cover the question you asked.
It' much easier to have a clear view on a log scale! For instance, a gene which is expressed a hundred fold more than the control will be at +2 because log(100/1)=2 and a gene which is expressed a hundred fold less will be at -2 because log(1/100)=-2. So the log scale will be symetrical to zero towards ratios and furthermore it allows to explore clearly a broad set of values on the same graphics: it's easier to spot what is going on between -2 and +2 than between 10-2 and 102 on a single scale...
Please see below from "Seven tips for bio-statistical analysis of gene expression data" [ https://www.biogazelle.com/seven-tips-bio-statistical-analysis-gene-expression-data ]:
Always log transform your gene expression data
Gene expression levels are heavily skewed in linear scale: half of the data-point (the lower expressed genes) are between 0 and 1 (with 1 meaning no change), and the other half (the higher expressed genes) between 1 and positive infinity. Consider the case where the normalized expression levels are 0.1 (A), 1 (B) and 10 (C) for 3 samples (A-C) under study. Intuitively, we understand that sample A has a ten-fold lower expression compared to sample B, and that C has a ten-fold higher expression compared to B. However, in linear scale A and B are much closer (similar) to each other than B and C (0.9 units versus 9 units). A parametric statistical test will therefore be biased and not appreciate that A and C are equally different from B. Upon log transformation (I use base 10 here, but any base will do), the distance between A and B, and between B and C becomes equal (1 log10 unit, as the log10 values of A, B, and C are -1, 0 and 1). Log transformation makes your data more symmetrical and therefore, a parametric statistical test will provide you with a more accurate and relevant answer.