A major challenge common to understanding phenotypic diversity, modeling selection in evolution, and developing precision medicine is enhancing our currently limited ability to predict disease and phenotypic outcomes based on genome sequence and environmental exposures. A comprehensive understanding of genetic variation and its role in conditioning phenotypes requires systematic, perturbation-based testing of genetic variants across the genome in multiple environments and in an isogenic background. Previous systematic genome perturbation efforts have focused primarily on engineering loss-of-function, but naturally occurring variants have the most relevance to understanding medically relevant phenotypes like human traits and disease. Such variants have been studied via genome-wide association studies (GWAS) and quantitative trait locus (QTL) analysis, but these approaches are limited to the haplotypes that appear in the study population, and only in few cases have the actual causative variants been identified. Advances in genome editing technologies have made engineering specific genetic variants feasible at a large scale. This proposal aims to systematically engineer and functionally profile a genome-wide `variation collection' in three genetically distinct strains that cover all natural single-nucleotide variants (SNVs) in the Saccharomyces cerevisiae species as well as SNVs associated with human diseases. The collection will be constructed by a high-throughput CRISPR approach, leveraging an in-house sequence parsing technology (Recombinase Directed Indexing, or REDI) that will allow rapid, inexpensive isolation of sequence-verified variant strains among the millions that will be generated. Because some variants only exert their effects in certain environments, this strain collection will be profiled in hundreds of conditions, including exposure to various stresses and drugs. DNA barcodes integrated into the genome of each strain will enable pooled, competitive growth, and allow the comprehensive identification of variants in a genome that modulate fitness in a given condition in a single experiment. Finally, to dissect the genetic architecture of pathways underlying diseases and identify key interactions, strains carrying combinations of SNVs will be analyzed. The strain collection will be made available to the community for further phenotypic investigations. In addition to the gene x environment (GxE) dataset that will likely be the largest produced to date, the technological, analytical, and visualization pipelines will be publicly shared and integrated into community resources. This work will constitute an unprecedented investigation of the consequences of genetic variation and their dependence upon environment, while providing valuable resources for the scientific community. It will lay technological and conceptual groundwork for systematic perturbation-based studies of genetic variation in human cells that will inform the prediction of disease risk and the design of therapeutic strategies based on genome sequence.

Public Health Relevance

A central challenge in precision medicine is predicting disease risk and designing therapeutic strategies based on genome sequence. Understanding the impact of individual genetic variations requires testing them systematically in different environments (e.g., drugs, tissues, cell types), which is now possible at a large scale via genome editing technology. Here we propose to systematically engineer all known single genetic variants in a species and measure each of their effects in hundreds of environments; this will yield an unparalleled resource for researchers to decipher how genetic variations combine with environmental influences to cause diseases and susceptibility to different treatments.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Genetic Variation and Evolution Study Section (GVE)
Program Officer
Janes, Daniel E
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Stanford University
Schools of Medicine
United States
Zip Code
Roy, Kevin R; Smith, Justin D; Vonesch, Sibylle C et al. (2018) Multiplexed precision genome editing with trackable genomic barcodes in yeast. Nat Biotechnol 36:512-520