Identifying causal genes and cell types underlying disease etiologies are essential for designing targeted diagnostic and treatment strategies. Genome-wide association study (GWAS), DNA-sequencing, and RNA- sequencing studies have identified potentially causal genes in multiple human diseases. While these methods provide disease-associated ?gene lists?, they suffer from major shortcomings given the lack of cell-type information. First, each tissue is composed of multiple cell types with diverse contributions to disease phenotypes, and thus studies using bulk-tissue data alone result in the ambiguity of the causal cell populations. Secondly, causal gene signals from rare cell types may be masked in bulk tissues. Finally, understanding which genes are perturbed in which cell types is required for designing downstream functional studies. To identify the gene-cell pairs driving human disease, systematic approaches to integrate patient-cohort data with cell-type-specific data are urgently needed. My research program aims to identify causal genes and cell types driving human diseases using multi-omics approaches. Our central hypothesis is that dysregulated genes mapped to specific cell types drive disease etiologies. Previously, we developed algorithms that integrate large-scale data of common and rare genomic variants, epigenomes, transcriptomes, and proteomes to identify causal genes in tissue affecting specific cell types, providing strong biological and technical foundations for the project. Further, the proposed approaches are empowered by rapidly-expanding cell-specific epigenomic and transcriptomic data using sorted cell populations or single-cell profiling. In the next 5-year period, we will specifically develop algorithms that integrate genomic findings from patient cohorts with cell-specific transcriptomic data, addressing two major questions: (1) What are the gene-cell type pairs contributing to disease etiologies? (2) How are expressions of disease-associated genes regulated at a single-cell level? The proposed project will strongly impact the field by discovering gene-cell pairs associated with a wide range of diseases for downstream investigation. The development will afford new methods to integrate purified and single-cell transcriptome data to expand on findings from large-scale patient genomic cohorts. In the long term, the successfully identified gene-cell pairs can be translated into diagnostic markers or treatment targets of human disease.
Our ability to precisely target human disease relies on knowing what genes in what cells are dysregulated, yet current approaches using bulk tissue often lack cell-type information. We will develop innovative methods to identify disease-associated gene-cell pairs by integrating large-scale molecular data of patient and cell cohorts. The proposed approaches can be applied across human diseases to find new diagnostic markers and treatment targets.