Genome-wide association studies have been incredibly successful at identifying novel genes and pathways associated with a wide array of complex diseases. However, despite the formation of large consortia to perform meta-analyses across cohorts, only a small fraction of the expected heritability of most common, complex diseases has been explained. The human genetics community is now adopting large-scale sequencing approaches (e.g., exome and whole genome) to identify rare variants that potentially have larger phenotypic effects. In response, statistical geneticists have created a litany of tests for geared toward associating rare variants with disease. We hypothesize that the most parsimonious explanation for an inverse relationship between the frequency of causal alleles and their effect size is that many diseases are caused by an influx of newly arising deleterious mutations that are continually removed from the population due to natural selection. We therefore propose to develop simulation software that will integrate what we know about how allele frequencies change over time from the theory-rich field of population genetics into the data-rich field of human genetics. Our resulting software will be used to develop strategies for sequencing global cohorts with high discovery power, and to aid in the evaluation of existing/future statistical tests. To achieve broad impact, we will create a graphical user interface (GUI) that produces effective figures, and apply our tool to compare and contrast a wide variety of existing statistical tests. We will then revamp our population genetic simulator to become the first population genetic simulator based on the heterogeneous computing architecture of both CPUs and graphical processing units (GPUs). Through intensive parallelization, our software will achieve disruptive efficiency. Using this approach, we will develop a platform for simulation-based inference that can accommodate complex evolutionary models. We will apply this approach to analyze forthcoming whole genome sequencing data from humans and Drosophila. Finally, we aim to return cutting-edge research to the classroom by developing simulation-based teaching tools. Our teaching tool will be in the form of a GUI that enables hands-on learning of complex concepts.

Public Health Relevance

The next phase of genome-wide association studies (GWAS) will require whole genome resequencing. Make sense of this onslaught of data using the numerous tools that are currently being developed requires accurate simulation tools. We propose to continue development and maintenance of our population genetic simulator to become a driving force for designing high-powered sequencing-based association studies, inference of complex evolutionary models, and to bring research back to the classroom in the form of teaching tools.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Brooks, Lisa
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California San Francisco
Schools of Pharmacy
San Francisco
United States
Zip Code
Pemberton, Trevor J; Szpiech, Zachary A (2018) Relationship between Deleterious Variation, Genomic Autozygosity, and Disease Risk: Insights from The 1000 Genomes Project. Am J Hum Genet 102:658-675
Mangul, Serghei; Yang, Harry Taegyun; Strauli, Nicolas et al. (2018) ROP: dumpster diving in RNA-sequencing to find the source of 1 trillion reads across diverse adult human tissues. Genome Biol 19:36
Szpiech, Zachary A; Blant, Alexandra; Pemberton, Trevor J (2017) GARLIC: Genomic Autozygosity Regions Likelihood-based Inference and Classification. Bioinformatics 33:2059-2062
Blant, Alexandra; Kwong, Michelle; Szpiech, Zachary A et al. (2017) Weighted likelihood inference of genomic autozygosity patterns in dense genotype data. BMC Genomics 18:928
Szpiech, Zachary A; Strauli, Nicolas B; White, Katharine A et al. (2017) Prominent features of the amino acid mutation landscape in cancer. PLoS One 12:e0183273
Johnston, Henry Richard; Hu, Yi-Juan; Gao, Jingjing et al. (2017) Identifying tagging SNPs for African specific genetic variation from the African Diaspora Genome. Sci Rep 7:46398
White, Katharine A; Ruiz, Diego Garrido; Szpiech, Zachary A et al. (2017) Cancer-associated arginine-to-histidine mutations confer a gain in pH sensing to mutant proteins. Sci Signal 10:
Uricchio, Lawrence H; Zaitlen, Noah A; Ye, Chun Jimmie et al. (2016) Selection and explosive growth alter genetic architecture and hamper the detection of causal rare variants. Genome Res 26:863-73
Kessler, Michael D; Yerges-Armstrong, Laura; Taub, Margaret A et al. (2016) Challenges and disparities in the application of personalized genomic medicine to populations with African ancestry. Nat Commun 7:12521
Mathias, Rasika Ann; Taub, Margaret A; Gignoux, Christopher R et al. (2016) A continuum of admixture in the Western Hemisphere revealed by the African Diaspora genome. Nat Commun 7:12522

Showing the most recent 10 out of 17 publications