Single-cell sequencing circumvents the averaging artifacts associated with traditional bulk population data and has seen rapid technological developments over the past few years. This offers new opportunities to study genomic, transcriptomic, and epigenomic heterogeneity at the cellular level without cell type confounding, but it also requires novel analytical approaches. One major challenge in such genomic studies is the lack of rigorous methods for integrating bulk-tissue and single-cell sequencing data and for aligning multi-modal single-cell omics data. The research program of my lab centers around developing statistical/computational methods and bioinformatics tools to better utilize and analyze different types of next-generation sequencing data, with a special focus on detecting structural variants, deciphering genomic and transcriptomic heterogeneity, and assessing cellular heterogeneity by single-cell omics approaches. Our long-term vision is to introduce problems arising from new biomedical data to the statistics community and to provide data-driven statistical methods and open- source tools to biomedical researchers for better data analysis and experimental design. Specifically, in the next five years, our proposed program of research will focus on the following interconnected objectives: (i) bulk omics deconvolution aided by single-cell sequencing, followed by association testing with clinical variables; (ii) joint modeling of bulk genomic sequencing and single-cell transcriptomic sequencing data to simultaneously infer DNA and RNA variation at the single-cell level; and (iii) multi-modal alignment of single-cell omics data. During this period, we will keep collaborating with experimental labs, applying our developed methods to interrogate cellular heterogeneity under both biological and clinical settings. We will provide our methods as freely available and open-source R packages, which will include extensive tutorials and workflows that are accessible and useful to the biomedical research community.
The overall goals of this research are to develop novel and high-impact statistical and computational tools to address some of the key analytical challenges in joint modeling of bulk-tissue and single-cell sequencing data. These methods will enable biomedical scientists to simultaneously characterize multi-level genomic variations and to accurately disentangle within- and between-subject cellular heterogeneity. The successful completion of the project will greatly facilitate the translation of basic research findings into clinical studies of human disease and have substantial implications for diagnosis and prognosis.