One of the key challenges in human genomics currently is to dissect the impact of sequence variation on gene regulation. Characterizing the functional consequences of regulatory variation would lead to a greater understanding of evolutionary constraint on gene expression, improved interpretation of whole genome sequencing, and a more complete picture of the genetic architecture of complex disease. However, interpreting non-coding genetic variation remains difficult, particularly due to the complexity of gene regulation, which is highly specific to cell-type and environmental context. Understanding how disease-associated genetic variation impacts human tissues requires that we identify mechanisms behind cell-type-specific regulatory variation. In order to address these important goals, we propose to create a resource in which we can directly assay a range of regulatory phenotypes in multiple cell-types. We propose to establish a panel of induced pluripotent stem cells (iPSCs) from 70 individuals and collect genomic data from iPSCs and differentiated cells. This empirical effort will be complemented by formulating a novel statistical methodology to integrate data across different assays, cell-types, time points, and individuals; to robustly identify regulatory networks and genetic variants that affect gene regulation in each context. Together, these contributions will provide a basis for ongoing study of gene regulation and sequence variation in multiple disease-relevant cell types using a renewable model system and novel methods for robust analysis of these data.
In Aim 1, we propose to develop the resource of 70 iPSC lines, where we will collect extensive molecular regulatory phenotypes at multiple time points throughout differentiation to three cell types (cardiomyocytes, neurons and hepatocytes). Measuring gene expression, chromatin accessibility, and methylation in each sample throughout differentiation, we will provide a detailed picture of the cascade of regulatory influences active in each cell type and during development.
In Aim 2, we propose to develop a novel statistical framework for inferring universal and cell- type-specific regulatory factors from multi-dimensional data spanning cell-types, individuals, phenotypes, and time points. We specify a machine learning method based on Bayesian hierarchical transfer learning that provides dramatically increased power to detect shared effects while explicitly identifying context-specific regulatory changes. This approach will be adopted to infer regulatory networks, identify key regulatory sequence elements, and map QTLs in each phenotype and cell type.
In Aim 3, we will utilize the empirical data and novel methods to infer regulatory relationships and mechanisms underlying genetic variants associated with gene expression in primary tissue and with disease. We will do this by performing a careful integration of external association studies. All samples, cell lines, data, computational tools, and analytical results will be made freely available to the community. We expect our project will greatly advance the understanding of gene regulation, the consequences of genetic variation in diverse cell-types, and the genetic basis of disease.

Public Health Relevance

Understanding the impact of individual genetic variation on diverse tissues and cell types in the human body is essential to dissecting the genetics of complex human diseases. Here, we gather data from multiple cell types derived from induced pluripotent stem cells, and develop a novel statistical framework for integrative analysis of data across individuals, cell types, and time points. Our approach provides a transformative approach for understanding gene regulation and the effects of genetic variation on individual health.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Genetic Variation and Evolution Study Section (GVE)
Program Officer
Gaillard, Shawn R
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Johns Hopkins University
Biostatistics & Other Math Sci
Biomed Engr/Col Engr/Engr Sta
United States
Zip Code
Pavlovic, Bryan J; Blake, Lauren E; Roux, Julien et al. (2018) A Comparative Assessment of Human and Chimpanzee iPSC-derived Cardiomyocytes with Primary Heart Tissues. Sci Rep 8:15312
Li, Xin; Kim, Yungil; Tsang, Emily K et al. (2017) The impact of rare variation on gene expression across tissues. Nature 550:239-243
Knowles, David A; Davis, Joe R; Edgington, Hilary et al. (2017) Allele-specific expression reveals interactions between genetic variation and environment. Nat Methods 14:699-702