Genome-wide association studies have been incredibly successful at identifying novel genes and pathways associated with a wide array of complex diseases. However, despite the formation of large consortia to perform meta-analyses across cohorts, only a small fraction of the expected heritability of most common, complex diseases has been explained. The human genetics community is now adopting large-scale sequencing approaches (e.g., exome and whole genome) to identify rare variants that potentially have larger phenotypic effects. In response, statistical geneticists have created a litany of tests for geared toward associating rare variants with disease. We hypothesize that the most parsimonious explanation for an inverse relationship between the frequency of causal alleles and their effect size is that many diseases are caused by an influx of newly arising deleterious mutations that are continually removed from the population due to natural selection. We therefore propose to develop simulation software that will integrate what we know about how allele frequencies change over time from the theory-rich field of population genetics into the data-rich field of human genetics. Our resulting software will be used to develop strategies for sequencing global cohorts with high discovery power, and to aid in the evaluation of existing/future statistical tests. To achieve broad impact, we will create a graphical user interface (GUI) that produces effective figures, and apply our tool to compare and contrast a wide variety of existing statistical tests. We will then revamp our population genetic simulator to become the first population genetic simulator based on the heterogeneous computing architecture of both CPUs and graphical processing units (GPUs). Through intensive parallelization, our software will achieve disruptive efficiency. Using this approach, we will develop a platform for simulation-based inference that can accommodate complex evolutionary models. We will apply this approach to analyze forthcoming whole genome sequencing data from humans and Drosophila. Finally, we aim to return cutting-edge research to the classroom by developing simulation-based teaching tools. Our teaching tool will be in the form of a GUI that enables hands-on learning of complex concepts.
The next phase of genome-wide association studies (GWAS) will require whole genome resequencing. Make sense of this onslaught of data using the numerous tools that are currently being developed requires accurate simulation tools. We propose to continue development and maintenance of our population genetic simulator to become a driving force for designing high-powered sequencing-based association studies, inference of complex evolutionary models, and to bring research back to the classroom in the form of teaching tools.
|Szpiech, Zachary A; Blant, Alexandra; Pemberton, Trevor J (2017) GARLIC: Genomic Autozygosity Regions Likelihood-based Inference and Classification. Bioinformatics 33:2059-2062|
|Blant, Alexandra; Kwong, Michelle; Szpiech, Zachary A et al. (2017) Weighted likelihood inference of genomic autozygosity patterns in dense genotype data. BMC Genomics 18:928|
|Johnston, Henry Richard; Hu, Yi-Juan; Gao, Jingjing et al. (2017) Identifying tagging SNPs for African specific genetic variation from the African Diaspora Genome. Sci Rep 7:46398|
|Szpiech, Zachary A; Strauli, Nicolas B; White, Katharine A et al. (2017) Prominent features of the amino acid mutation landscape in cancer. PLoS One 12:e0183273|
|Mathias, Rasika Ann; Taub, Margaret A; Gignoux, Christopher R et al. (2016) A continuum of admixture in the Western Hemisphere revealed by the African Diaspora genome. Nat Commun 7:12522|
|Uricchio, Lawrence H; Zaitlen, Noah A; Ye, Chun Jimmie et al. (2016) Selection and explosive growth alter genetic architecture and hamper the detection of causal rare variants. Genome Res 26:863-73|
|Kessler, Michael D; Yerges-Armstrong, Laura; Taub, Margaret A et al. (2016) Challenges and disparities in the application of personalized genomic medicine to populations with African ancestry. Nat Commun 7:12521|
|Strauli, Nicolas B; Hernandez, Ryan D (2016) Statistical inference of a convergent antibody repertoire response to influenza vaccine. Genome Med 8:60|
|1000 Genomes Project Consortium; Auton, Adam; Brooks, Lisa D et al. (2015) A global reference for human genetic variation. Nature 526:68-74|
|Davis, Zoe H; Verschueren, Erik; Jang, Gwendolyn M et al. (2015) Global mapping of herpesvirus-host protein complexes reveals a transcription strategy for late genes. Mol Cell 57:349-60|
Showing the most recent 10 out of 14 publications