Large-scale Data Integration with scMerge2
Summary
The novel advent of multi-faceted single-cell studies, encapsulating multiple samples, conditions, and cohorts, offers researchers an enriched perspective on diverse cellular states. The synthesis of these extensive cohorts portends the illumination of biological minutiae under varying conditions, a feat unachievable in isolation. scMerge2, a newly unveiled, scalable algorithm, facilitates the amalgamation of atlas-scale multi-sample, multi-condition single-cell studies, accommodating millions of cells from disparate single-cell technologies. Its efficacy is exemplified in a comprehensive COVID-19 dataset, comprising over five million cells from in excess of a thousand individuals, revealing cell-type expression signatures instrumental in discerning disease progression. scMerge2 also mitigates dataset variability in CyTOF, imaging mass cytometry, and CITE-seq experiments, underlining its versatility across a wide array of single-cell profiling technologies.
Research Criteria
The research concept of this article is to develop a tool that can address the shortcomings of existing tools by providing a more accurate and efficient integration of larger cohort datasets. It also aims to eliminate dataset variability in CyTOF, imaging mass cytometry, and CITE-seq experiments.
Result—scMerge2 Integrates Single-Cell Multi-Sample, Multi-Condition Data Effectively
As a refined progression of its predecessor, scMerge2 presents a robust tool in the realm of single-cell studies, adroitly managing large, multi-condition datasets. This advanced algorithm navigates the labyrinthine variation within and between datasets, a feature that fortifies its ability to detect genuine biological signals. In contrast to traditional data integration methods, scMerge2 boasts a three-pronged innovation strategy: it utilizes hierarchical integration to discern local and global variation, implements pseudo-bulk construction to alleviate the computational burden, and incorporates pseudo-replication within each condition to accommodate myriad conditions. The culmination of these features results in a consolidated, adjusted expression matrix, primed for subsequent analyses.
Fig. 1 Experimental design.1
Result— In Detecting Differential Expression, scMerge2 Outperforms Existing Integration Methods.
ScMerge2 astounds in its performance by adeptly eradicating multi-level unwanted variation in multiple scRNA-seq datasets, outperforming existing methods in detecting differential expression. Applied to subsets from two COVID-19 studies, it evinces that a hierarchical approach to integration fosters a superior outcome in discerning cellular signals. Regardless of the batch integration strategy, scMerge2 surpasses its rivals in both removing batch effects and preserving biological signals. Notably, it demonstrates computational efficiency and a markedly better memory usage profile than its counterpart, Seurat. Adept at pinpointing differentially expressed genes, scMerge2 asserts a lower false discovery rate and a higher true positive rate. Unperturbed by fluctuating tuning parameters, the method displays unparalleled performance robustness, thus underscoring its efficacy, utility, and efficiency in scRNA-seq data integration.
Fig. 2 CD4+ T cell characterization and dynamics in HGSOC.1
Creative Biolabs' Service
Single Cell RNA Sequencing Service
At Creative Biolabs, we acknowledge that cell clusters seldom exhibit homogeneity or synchronization. Our expertise in single-cell RNA sequencing is deployed to elucidate transcriptomic variances in such heterogeneous samples. We undertake comprehensive workflows encompassing sample pre-processing, library fabrication, and subsequent data interpretation, thereby optimizing your project's adaptability, expeditiousness, and precision of data.
Learn moreCreative Biolabs provides top-tier single-cell RNA sequencing services to researchers and innovators worldwide. Harnessing state-of-the-art technology and a wealth of experience, we deliver precise and sensitive scRNA-seq data, facilitating the exploration of genetic diversity and complexity at the cellular level and the identification of distinct cell subpopulations and their gene expression patterns. From customized protocols to full-service solutions, we cater to the unique needs and budgetary considerations of our clients, positioning ourselves as an ideal partner in advancing scientific research and discovery.
For any information, please contact us.
Reference
- Lin, Yingxin et al. "Atlas-scale single-cell multi-sample multi-condition data integration using scMerge2." Nature communications vol. 14,1 4272. 17 Jul. 2023, doi:10.1038/s41467-023-39923-2. Distributed under Open Access license CC BY 4.0, without modification.
Related Sections
Search...