I have thousands of samples from `TCGA` retrieved using `TCGABiolinks`. I want to remove the `batch effect` from the datasets. It's mentioned that batch can be detected from sample ID itself

How do we identify the batch info from the sample ID?

My IDs look as follows.

```

TCGA-LL-A73Y-01A-11R-A33A-13

TCGA-AO-A03U-01B-21R-A10I-13

TCGA-E9-A1NH-01A-11R-A14C-13

TCGA-BH-A1EY-01A-11R-A13P-13

TCGA-AO-A1KS-01A-11R-A13P-13

TCGA-B6-A0I6-01A-11R-A035-13

TCGA-E9-A229-01A-31R-A156-13

TCGA-D8-A27H-01A-11R-A16E-13

TCGA-A2-A0EM-01A-11R-A035-13

TCGA-E2-A1II-01A-11R-A143-13

TCGA-BH-A0H3-01A-11R-A12O-13

TCGA-E2-A1IL-01A-11R-A14C-13

TCGA-BH-A0GY-01A-11R-A057-13

TCGA-BH-A0DG-01A-21R-A12O-13

```

I have looked at following link get information on `sample ID`, but not specifically mentioned about batches.

https://github.com/kevinblighe/TCGAbarcode/blob/master/README.md

Is it a combination of `PlateId`, `ShipDate`, and `Tissue Source Site` or can I consider `plates` or `tss` as batch?

More Snijesh V.P.'s questions See All
Similar questions and discussions