The vast majority of genomic data are generated for heterogeneous tissues, whereas many genomic measurements (e.g., gene expression and methylation) are tissue and cell-type specific. Notably, cell-type- specific analysis can lead to important insights in understanding of underlying biological mechanisms. Furthermore, analysis that ignores cell-type-specific effects often results in a substantial power loss and false positive discoveries. Thus, there is a pressing need to develop methods that can facilitate cell-type specific analysis on existing and future bulk datasets. Existing efforts to address tissue heterogeneity focus on the inference of cell counts from bulk RNA and methylation, however these approaches do not detect cell-type specific association but rather are used to avoid false discoveries. By contrast, this proposal will focus on a novel set of statistical tools for the inference of the cell-type specific expression and methylation signal in each gene and each individual. The approach studied in this project will include the development of methods for the imputation of methylation from single nucleus RNA-seq. These methods will allow to generate reference data for methylation using publicly available single-nucleus RNA data. In addition, this project will generate single nucleus RNA-seq and methylation for sorted cells from Mexican and Finnish blood and adipose samples, resulting in the largest dataset that includes both types of data, particularly on Latinos and on adipose tissue. These reference data will be used as training data for the developed methods. Finally, the methods developed will be used to search for cell-type specific associations with obesity, nonalcoholic fatty liver disease, type 2 diabetes, and dyslipidemias, as well as perform cell-type specific eQTL and mQTL analyses on a large Mexican and Finnish population. In order to achieve this goal, bulk methylation data will be generated for Mexican and Finnish adipose samples for which genotypes, bulk RNA-seq, and refined phenotypes are already available. Importantly, the Latino data will be one of the largest non-European datasets with expression, methylation and genotype information. This data will be made available to the research community. Thus, accomplishing this project will advance the understanding of population-specific genetic and epigenetic components of highly common cardiometabolic disorders with high morbidity and mortality worldwide. Mexicans have the highest susceptibility of these cardiometabolic disorders, and this study will provide much needed new genomics data in this admixed minority population to combat cardiometabolic disease in diverse populations.
This project will aim at novel statistical tools for the detection of genes in which expression levels or methylation levels are correlated with disease in specific cell types. We will use these methods to search for cell-type specific associations with obesity, nonalcoholic fatty liver disease, type 2 diabetes, and dyslipidemias, on a large Mexican and Finnish population in adipose and blood tissues. This project will thus advance our understanding of population-specific genetic and epigenetic components of highly common cardiometabolic disorders with high morbidity and mortality worldwide.