With hundreds of sequenced genomes available for many species, the challenge now lies in building predictive models for the genotype-to-phenotype map. Millions of polymorphic bases make each of us morphologically, intellectually, and psychologically unique. The approach of associating whole-genome polymorphisms with a myriad of phenotypes (GWAS) has been in fashion. Its reliance on purely statistical associations requires screening many thousands of individuals to pinpoint alleles that typically explain appreciable, though modest, fractions of natural variation. The next step - the long term goal of this project - is to move from association to causation;where a model of well-understood molecular pathways is modified, individually for each genotype, to reflect functional effects of it unique set of polymorphisms. We develop the concepts and models necessary to advance this goal using Drosophila, where the molecular tools are precise and quantitative predictions are verifiable. We will develop several levels of predictive models. First, we will predict the functioal consequences of SNPs on gene expression from sequence alone, based on knowledge of transcription factor (TF) binding sites and predictive models of how sequence affects DNA shape. These models will be validated with cis-eQTL approaches and directed measurements of expression and TF binding. Second, the composite effects of coding and regulatory polymorphisms will be incorporated into a network-level structural equation model (SEM). We will fit the model with two types of expression data gathered in multiple genotypes, and predict and experimentally verify the functional consequences of unmeasured polymorphisms. Third, the model will be extended to incorporate putative epistatic interactions, estimated using approximate Bayesean computation. This will generalize and 'quantitate'SEM, and evaluate sensitivity of downstream phenotypes to molecular perturbations at different tiers. We will validate these predictions using population genetic data. While conceptually simple, developing this framework requires close collaborations between computational and molecular biologists building refined molecular biological knowledge and tools. A developmental process - early embryo segmentation in Drosophila melanogaster - appears ripe for attack. The network is well-characterized and a wealth of functional data is available on the individual components, including DNA binding preferences and cellular resolution expression patterns of critical TFs. The requisite experimental techniques are scalable to process many sequenced fly genotypes. Abundant genetic variation in expression, timing, and morphology during embryo development are well-documented. Building the first mechanistic model of the embryo genotype-to-phenotype map is our focus, but this will have a strong impact on the medical field. Success in developing these integrated approaches will enable optimal choice of targets for therapeutic interventions to restore network function in disease. The concepts and tools we establish will serve as a template for analysis of complex networks relevant to human health.

Public Health Relevance

To build predictive genotype-to-phenotype maps, genetic epidemiologists must move from association to causation, where a model of well-understood molecular pathways is modified, individually for each genotype, to reflect functional effects of it unique set of polymorphisms. However, there is much scope for refinement using the Drosophila model, where the molecular tools are precise, and quantitative predictions are verifiable. This project will develop a variety of predictive models to accomplish this aim: annotating functional regulatory polymorphisms, developing linear network analysis, and annotating molecular networks in the context of population variation with approximate Bayesean computation;cellular and whole embryo data scales will be merged;joint models will be built to combine multiple scales of modeling.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project--Cooperative Agreements (U01)
Project #
Application #
Study Section
Special Emphasis Panel (ZEB1)
Program Officer
Lyster, Peter
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Southern California
Schools of Arts and Sciences
Los Angeles
United States
Zip Code
Estrada, Javier; Wong, Felix; DePace, Angela et al. (2016) Information Integration and Energy Expenditure in Gene Regulation. Cell 166:234-44
Dror, Iris; Rohs, Remo; Mandel-Gutfreund, Yael (2016) How motif environment influences transcription factor search dynamics: Finding a needle in a haystack. Bioessays 38:605-12
Mathelier, Anthony; Xin, Beibei; Chiu, Tsu-Pei et al. (2016) DNA Shape Features Improve Transcription Factor Binding Site Predictions In Vivo. Cell Syst 3:278-286.e4
Chiu, Tsu-Pei; Comoglio, Federico; Zhou, Tianyin et al. (2016) DNAshapeR: an R/Bioconductor package for DNA shape prediction and feature encoding. Bioinformatics 32:1211-3
Kuzu, Guray; Kaye, Emily G; Chery, Jessica et al. (2016) Expansion of GA Dinucleotide Repeats Increases the Density of CLAMP Binding Sites on the X-Chromosome to Promote Drosophila Dosage Compensation. PLoS Genet 12:e1006120
Salomon, Matthew P; Li, Wai Lok Sibon; Edlund, Christopher K et al. (2016) GWASeq: targeted re-sequencing follow up to GWAS. BMC Genomics 17:176
Estrada, Javier; Ruiz-Herrero, Teresa; Scholes, Clarissa et al. (2016) SiteOut: An Online Tool to Design Binding Site-Free DNA Sequences. PLoS One 11:e0151740
Stram, Alexander H; Marjoram, Paul; Chen, Gary K (2015) al3c: high-performance software for parameter inference using Approximate Bayesian Computation. Bioinformatics 31:3549-51
Deng, Zengqin; Wang, Qing; Liu, Zhao et al. (2015) Mechanistic insights into metal ion activation and operator recognition by the ferric uptake regulator. Nat Commun 6:7642
Wunderlich, Zeba; Bragdon, Meghan D J; Vincent, Ben J et al. (2015) Krüppel Expression Levels Are Maintained through Compensatory Evolution of Shadow Enhancers. Cell Rep 12:1740-7

Showing the most recent 10 out of 30 publications