Hey,
I recently have a confusion about single cell ATAC-seq integration analysis between samples. I have read many discussions about that issue. So, I summarized them into two solutions as follows:
SOLUTION 1. (data QC ignored here) find the union feature set from different samples -> generate count matrix for each sample -> merge them into one large count matrix -> normalization/Scaling/cell clustering/ cluster annotations……
SOLUTION 2. generate the count matrix for each sample -> normalization/Scaling/cell clustering/ cluster annotations for each sample -> find common features among all samples -> generate count matrix against the selected common features for each sample -> merging data using pipelines, e.g. Signac/Harmony, to perform cell clustering, cluster annotation and other following analysis (which usually with give a new assay for common features).
My questions:
Either one selected, I will have cell clusters now. So the next plan for me is retrieving differential features for each cell type/cluster, which will be the key to the further investigation of biological functions.
Q1. I know that batch effect indeed exists between samples, but for SOLUTION 1, will normalization and scaling for a single large count matrix work for differential enrichment analysis between samples?
Q2. If SOLUTION 1 is not reasonable, SOLUTION 2 will give rise to a new assay only contain the selected common features, based on which the batch effect should be well corrected and the cell might be better clustered. However, how to perform the differential analysis for non-common features in each clusters? (That's to say, will the batch effect correction in the newly integrated assay by SOLUTION 2 will work for total differential feature detection in raw assays at the sample level?)
Thanks and best regards!