The technological and computational breakthroughs in the decade since the sequencing of the human genome have provided an unprecedented opportunity to understand the etiology of complex human diseases. Notably, the diminishing cost of next-generation sequencing means that it is now possible for researchers to obtain complete genome sequence information on thousands of diseased individuals. However, major statistical questions remain about optimal design and analysis of studies using next-generation sequencing data to study the contribution of rare variation to common diseases. At the foundation of many such questions is the lack of power for single marker rare variant tests of association, motivating the development of many potentially more powerful, gene-based tests, which aggregate evidence from several individual variants into a single test statistic. The proposed gene-based tests vary in how they combine and weight variants, leading to poorly understood differences in performance under different genetic models. Much of the current focus is on developing an all-around "best" rare variant test, typically through assessment on simulated data. Regardless of which test--or, more likely, tests--emerge as optimal, several challenges will remain toward applying these methods to real, imperfect sequence data and then inferring underlying genetic architecture based on a statistically significant test result. Ths, rather than focus exclusively on novel test development, our research will center on gaining a deeper understanding of the behavior of gene-based rare variant tests, the realistic application of these tests, and the development of methods to decompose significant test statistics to gain information that can guide future studies. We will pay specific attention to the interplay of various underlying disease models, test statistics, and study designs. This work will provide a critical step towards successfully identifying rare risk variants in future sequencing experiments and translating the results into public health practice. To achieve these goals, we propose the following specific aims: We will (1) develop a geometric representation to better understand the behavior of gene-based rare variant tests (2) evaluate gene-based rare variant tests in the presence of imperfect data and (3) develop post-hoc analyses to identify causal variants and inform replication study design. We will conduct the research using a combination of analytic, computational and simulation approaches. Additionally, the work we will perform addresses the three main goals of NIH's R15 program: (a) to conduct meritorious research that will (b) strengthen the research environment of the liberal arts college where the research will be conducted, while (c) exposing undergraduate students to statistical genetics research. With this last goal in mind, the fourth aim of our proposal is to provide research experiences to undergraduate students when conducting aims 1, 2 and 3.
The number of genetic association studies seeking to identify genetic variants that predispose to human diseases continues to grow. Furthermore, the environment for conducting these studies is rapidly changing due to declining sequencing and genotyping costs, new statistical technologies (e.g. imputation) and increasing understanding of the human genome. The proposed research will provide design and analysis strategies for genetic association studies in order to accelerate the pace of research towards the goal of a complete understanding of the genetic architecture of common human diseases.
|Valcarcel, Alessandra; Grinde, Kelsey; Cook, Kaitlyn et al. (2016) A multistep approach to single nucleotide polymorphism-set analysis: an evaluation of power and type I error of gene-based tests of association after pathway-based association tests. BMC Proc 10:349-355|
|KÃ¶nig, Inke R; Auerbach, Jonathan; Gola, Damian et al. (2016) Machine learning and data mining in complex genomic data--a review on the lessons learned in Genetic Analysis Workshop 19. BMC Genet 17 Suppl 2:1|
|Beck, Andrew; Luedtke, Alexander; Liu, Keli et al. (2016) A POWERFUL METHOD FOR INCLUDING GENOTYPE UNCERTAINTY IN TESTS OF HARDY-WEINBERG EQUILIBRIUM. Pac Symp Biocomput 22:368-379|
|Greco, Brian; Hainline, Allison; Arbet, Jaron et al. (2016) A general approach for combining diverse rare variant association tests provides improved robustness across a wider range of genetic architectures. Eur J Hum Genet 24:767-73|
|Kamp, Thomas; Adams, Micah; Disselkoen, Craig et al. (2016) IMPROVED PERFORMANCE OF GENE SET ANALYSIS ON GENOME-WIDE TRANSCRIPTOMICS DATA WHEN USING GENE ACTIVITY STATE ESTIMATES. Pac Symp Biocomput 22:449-460|
|Green, Alden; Cook, Kaitlyn; Grinde, Kelsey et al. (2016) A general method for combining different family-based rare-variant tests of association to improve power and robustness of a wide range of genetic architectures. BMC Proc 10:165-170|
|Held, Elizabeth; Cape, Joshua; Tintle, Nathan (2016) Comparing machine learning and logistic regression methods for predicting hypertension using a combination of gene expression and next-generation sequencing data. BMC Proc 10:141-145|
|Tintle, N L; Pottala, J V; Lacey, S et al. (2015) A genome-wide association study of saturated, mono- and polyunsaturated red blood cell fatty acids in the Framingham Heart Offspring Study. Prostaglandins Leukot Essent Fatty Acids 94:65-72|
|Blue, Elizabeth M; Sun, Lei; Tintle, Nathan L et al. (2014) Value of Mendelian laws of segregation in families: data quality control, imputation, and beyond. Genet Epidemiol 38 Suppl 1:S21-8|
|Rogers, Ally; Beck, Andrew; Tintle, Nathan L (2014) Evaluating the concordance between sequencing, imputation and microarray genotype calls in the GAW18 data. BMC Proc 8:S22|
Showing the most recent 10 out of 20 publications