The Preprocessing and Normalization Method used for TCGA Microarray Data?

07 July 2019 1 9K Report

I am trying to use the microarray expression data from TCGA legacy archive (only for comparison purpose). I have noticed that there are four different platforms existed for microarray expression: (1) Agilent G4502A_07_1, (2) Agilent G4502A_07_2, (3) Agilent G4502A_07_3 and (4) HT_HG-U113A. I am wondering if the first three platforms are the actually the same?

The expression data for all the platforms seems to be normalized, but I couldn't find any documentation about the preprocessing and normalization methods used in the datasets.

There are some research using these datasets describe the normalization methods:

Jing Han and Raj K. Puri (2018) states that the Agilent datasets are presented as the log2 ratio of GBM/HuRNA, or Normal brain/HuRNA. But I am not sure whether any normalization methods have been applied before calculating the log2 ratio.

Another research conducted by Yan Guo et al. (2013) states that the microarray datasets have been normalized using Robust Multi-array Average (RMA), and the Agilent expression values were gene-centered.

When I looked into the dataset, I found that the expression data of HT_HG-U113A seemed to be log-transformed, and the expression for each genes of Agilent dataset are indeed centered in zero. But it is still very obscure to me about how the datasets were preprocessed? Can anyone gives me some details explanation? Thank you so much for your help.

A. Saleembhasha

Agilent G4502A_07_3 is a normalized data, no need to do again normalization. You can use this data directly for further analysis.

Badges
Science topic

I need the datasets of Microgrid for system identification?

Hi I am working on data driven model of the microgrid, for that, i need the reliable datasets for the identification of MG data driven Model. Thanks

02 August 2024 5,748 4 View

AGILENT HPLC 1100 SERIES , any expert available please?

AGILENT HPLC 1100 SERIES operation assistance needed.

25 July 2024 10,045 2 View

Which file formats are accepted for supplementary material?

I have a dataset consisting of json files. i tried to upload a zip or tar of it but the system tells me that the file format is not accepted... br

25 July 2024 1,316 3 View

Dataset of synchronized cardiac angiography and ECG?

Hello, I'm working on medical project and I would need synchronized angiography with ECG? Does anyone know if some open source dataset of this kind exist? Regards, Bruno

25 July 2024 2,214 2 View

How to Select the most suitable machine learning algorithm depending on the characteristics of the given dataset ?

I'm working on a project that involves analyzing a new dataset, and I'm at the stage of selecting the most appropriate machine learning algorithm. The dataset consists of both numerical and...

22 July 2024 6,097 7 View

How to use evolutionary algorithms with real parameters in ryu sdn controller with large scale?

Hi, I wanna to implement evolutionary algorithms in ryu sdn controller in mininet, i have some challenges, how i can run the big scale topo with one sdn contoller??? and another question is to...

21 July 2024 246 2 View

Can you suggest reliable procedures to get displacements from accelerations in frequency domain ?

I have identified many solutions. I need suggestion from somebody with application experience of this topic to identify the most reliable and robust procedure.

21 July 2024 3,465 5 View

How to use NCBI datasets ?

I have been trying to extract genome from NCBI using their dataset tool, however some examples seem not to work : ./datasets download genome taxon "Homo Sapiens" --annotated --assembly-level...

20 July 2024 1,339 2 View

How do I access .vcf files without an R statistical package?

I am currently working on a mendelian randomization study, and I have downloaded the datasets needed from the ieu opengwas project (mrcieu.ac.uk) in .vcf format. I do not have access to an R...

19 July 2024 2,342 5 View

Which is the best approach for anomaly detection in scanned image data set?

Anomaly detection in scanned image data set

18 July 2024 3,578 3 View