Hi Everyone, I have query regarding cell type annotation for single cell characterisation. Whether automated annotation (based on identified clusters) methods or based on known marker genes (available in databases) Is better ?
I think there are 2 major approaches: 1. Marker genes - the idea is that a gene with high average fold change and appropriate adjusted p value between all clusters is uniquely representing a cell type. Markers can be “canonical” - surface proteins detectable with flow cytometry, or “literature based” - genes known to distinguish cells by type and validated in literature. This is a developing area with new publications new frameworks to characterize cells and assign them with a defined type (Article What is a cell type and how to define it?
)
The second approach is classification based. 2. Using large collections of cells (I.e. https://www.humancellatlas.org or even cell line experiments with specific ceo type gene expression) we train a model to predict class assignment for new gene expression data. Such trained models rely on various numbers of datasets and features (hundreds) and more complex patterns than just logFC and adj. p-value. These are methods like celldex (https://bioconductor.org/packages/release/data/experiment/html/celldex.html).
Both methods have limitations in practice - many clusters can be assigned to several types of cells based on “good” logFC and p-value, so the user might choose the top values or go for unique cell type not present in other clusters. Since methods like Seurat can assign a cell type to each cell, you can also calculate proportion of cell types in a cluster and use that.
Automated annotation can also fail to assign a good cell type to a given set of cells. since there are many known types of cells and new variations are often found, combining these approaches and performing additional manual examination of marker genes, literature and expression patterns is typically required.
Hope that helps! We’re adding tutorials on this topic on our OmicsLogic portal: https://learn.OmicsLogic.com
For Cell type annotation in scRNA-seq analysis you may go first to identify the markers by using Seurat package then annotate them based on known markers genes.