I want to check if decidua tissue samples in dataset (from GEO database) GSE60438 are mixed with blood. I merged dataset GSE60438 with dataset GSE73685 consisting of multiple tissues including decidua and maternal and umbilical cord blood. I selected only healthy samples with decidua and blood from both datasets, merged their expression matrices by rownames (matrices were log transformed, quantile normalized and rownames were mapped do Entrez identifiers before that) and removed batch effect using R ComBat function with model matrix based on tissue and with datasets' accession number as batch

# secondaryaccession is a column with samples' dataset accession number: GSE60438 or GSE73685

batch = as.factor(pdata$secondaryaccession)

# Biological.Specimen is a column with tissue types in phenodata dataframe pdata

mod = model.matrix(~as.factor(Biological.Specimen), data=pdata)

# mrgd is a dataframe - expression matrices of two datasets merged by rowname

exprs = ComBat(dat=as.matrix(mrgd), batch=batch, mod=mod, par.prior=TRUE, prior.plots=FALSE)

Principal component analysis showed some of the sample leaning towards blood samples. How to check more rigorously whether those samples are really mixed with blood samples?

Similar questions and discussions