Do you support normalization of microarray data? why?

I support it because I think that it is a very reasonable assumption that the profiles of similar cells should be similar. You can see systematic differences between the individual profiles of raw signal intensities of similar samples. It is reasonable to assume that this is some technical artefact. Appropriate normalization will cause the data more closely adhere to these assumptions.

You will always have some trade-off between removing noise and artifacts at the cost of introducing bias. The question is if it pays off. To my experience it does pay off, since after normalization so many more genes and pathways have been discovered to play a role in some disease or process that have been evaluated already in further studies.

I think that the simplest method to achieve the goal (get a better, clearer, more reliable idea about the global changes in the expression network) is good enough, and I think that the quantile normalization usually performs well (again, given that the samples are or should be comparable).

One inevitely gets into trouble if very different samples should be analyzed together. I would hope that the group-wise normalization will be ok in these cases.

---

*for the case that the biological sources are comparable: the expression profiles should be expected to be quite similar. The experimental perturbation should affect some genes, not the majority of genes. If different cell types are compared one quickly starts to compare apples and peaches anyway.

Sweta Kumari

Thank you Sir

Jorge G Pires

Dear Sweta Kumari ,

I am not familiar with most of the content you have pinned down, but I know what microarrays is, I know since I have worked with systems biology, and I am familiar with normalization of data, e.g. artificial neural networks, however I cannot send you to a nice reference on the topic. Please, accept my comments as an attempt to help.

Just to be straight, by normalization I understand the mathematical manoeuver of "shrinking" data in a smaller space, in general [a,b]->[0,1], where a and b >>1. It is widely used the opposite in genetic algorithms for improving the search, thus the difference between solutions will be noticeable.

By microarray data we are talking about the "heat table", a table with genes and experiments, each cell represents a crossing between a gene expression state and an experiment, such as presence of glucose in an environment.

So normalization can be applied to compress any data within the same space, therefore we are sure that techniques such as regression and artificial neural networks will work properly. Further, the same way units is useful in physics to compare, we can easily compare results, such as scores created to compare different situation, e.g ResearchGate Scores.

The problematic of this techniques that I can see is that it might be a problem from a computational standpoint. Mathematically, between 0 and 1, we have infinity number, but not for computers, the space where most of the data is analyzed, not to say all.

Thus, I would infer that the technique is useful for homogenization, but problematic due to computation issues. As always in science "no free lunch", you should consider trade offs.

I hope that helps,

Best regards,

Jorge Pires,

Jochen Wilhelm

Jorge,

what you referred to is rarely done with microarray data. Often, (log-)signals shown in heatmaps are standardized: they represent (gene-wise) z-scores. But this is still not a "shinkage" to fit the values in [0,1].

With "normalization" in the context of microarray data is usually meant the correction of intensity artifacts and the adaption or equalization of intensity distributions between different samples.

How can I fix Gromacs Lincs error?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?