With hundreds of sequenced genomes available for many species, the challenge now lies in building predictive models for the genotype-to-phenotype map. Millions of polymorphic bases make each of us morphologically, intellectually, and psychologically unique. The approach of associating whole-genome polymorphisms with a myriad of phenotypes (GWAS) has been in fashion. Its reliance on purely statistical associations requires screening many thousands of individuals to pinpoint alleles that typically explain appreciable, though modest, fractions of natural variation. The next step - the long term goal of this project - is to move from association to causation;where a model of well-understood molecular pathways is modified, individually for each genotype, to reflect functional effects of it unique set of polymorphisms. We develop the concepts and models necessary to advance this goal using Drosophila, where the molecular tools are precise and quantitative predictions are verifiable. We will develop several levels of predictive models. First, we will predict the functioal consequences of SNPs on gene expression from sequence alone, based on knowledge of transcription factor (TF) binding sites and predictive models of how sequence affects DNA shape. These models will be validated with cis-eQTL approaches and directed measurements of expression and TF binding. Second, the composite effects of coding and regulatory polymorphisms will be incorporated into a network-level structural equation model (SEM). We will fit the model with two types of expression data gathered in multiple genotypes, and predict and experimentally verify the functional consequences of unmeasured polymorphisms. Third, the model will be extended to incorporate putative epistatic interactions, estimated using approximate Bayesean computation. This will generalize and 'quantitate'SEM, and evaluate sensitivity of downstream phenotypes to molecular perturbations at different tiers. We will validate these predictions using population genetic data. While conceptually simple, developing this framework requires close collaborations between computational and molecular biologists building refined molecular biological knowledge and tools. A developmental process - early embryo segmentation in Drosophila melanogaster - appears ripe for attack. The network is well-characterized and a wealth of functional data is available on the individual components, including DNA binding preferences and cellular resolution expression patterns of critical TFs. The requisite experimental techniques are scalable to process many sequenced fly genotypes. Abundant genetic variation in expression, timing, and morphology during embryo development are well-documented. Building the first mechanistic model of the embryo genotype-to-phenotype map is our focus, but this will have a strong impact on the medical field. Success in developing these integrated approaches will enable optimal choice of targets for therapeutic interventions to restore network function in disease. The concepts and tools we establish will serve as a template for analysis of complex networks relevant to human health.

Public Health Relevance

To build predictive genotype-to-phenotype maps, genetic epidemiologists must move from association to causation, where a model of well-understood molecular pathways is modified, individually for each genotype, to reflect functional effects of it unique set of polymorphisms. However, there is much scope for refinement using the Drosophila model, where the molecular tools are precise, and quantitative predictions are verifiable. This project will develop a variety of predictive models to accomplish this aim: annotating functional regulatory polymorphisms, developing linear network analysis, and annotating molecular networks in the context of population variation with approximate Bayesean computation;cellular and whole embryo data scales will be merged;joint models will be built to combine multiple scales of modeling.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project--Cooperative Agreements (U01)
Project #
Application #
Study Section
Special Emphasis Panel (ZEB1)
Program Officer
Lyster, Peter
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Southern California
Schools of Arts and Sciences
Los Angeles
United States
Zip Code
Li, Richard Y; Di Felice, Rosa; Rohs, Remo et al. (2018) Quantum annealing versus classical machine learning applied to a simplified computational biology problem. npj Quantum Inf 4:
Vincent, Ben J; Staller, Max V; Lopez-Rivera, Francheska et al. (2018) Hunchback is counter-repressed to regulate even-skipped stripe 2 expression in Drosophila embryos. PLoS Genet 14:e1007644
Wang, Xiaofei; Zhou, Tianyin; Wunderlich, Zeba et al. (2018) Analysis of Genetic Variation Indicates DNA Shape Involvement in Purifying Selection. Mol Biol Evol 35:1958-1967
Signor, Sarah A; Nuzhdin, Sergey V (2018) The Evolution of Gene Expression in cis and trans. Trends Genet 34:532-544
Chiu, Tsu-Pei; Rao, Satyanarayan; Mann, Richard S et al. (2017) Genome-wide prediction of minor-groove electrostatic potential enables biophysical modeling of protein-DNA binding. Nucleic Acids Res 45:12565-12576
Yang, Lin; Orenstein, Yaron; Jolma, Arttu et al. (2017) Transcription factor family-specific DNA shape readout revealed by quantitative specificity models. Mol Syst Biol 13:910
Li, Jinsen; Sagendorf, Jared M; Chiu, Tsu-Pei et al. (2017) Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding. Nucleic Acids Res 45:12877-12887
Scholes, Clarissa; DePace, Angela H; Sánchez, Álvaro (2017) Combinatorial Gene Regulation through Kinetic Control of the Transcription Cycle. Cell Syst 4:97-108.e9
Sagendorf, Jared M; Berman, Helen M; Rohs, Remo (2017) DNAproDB: an interactive tool for structural analysis of DNA-protein complexes. Nucleic Acids Res 45:W89-W97
Tangprasertchai, Narin S; Di Felice, Rosa; Zhang, Xiaojun et al. (2017) CRISPR-Cas9 Mediated DNA Unwinding Detected Using Site-Directed Spin Labeling. ACS Chem Biol 12:1489-1493

Showing the most recent 10 out of 43 publications