Over the past 10 years human genetics studies made unprecedented numbers of discoveries relating genome variation to common diseases. While it is unquestionably true that we have learned substantially from the discoveries made to date, we must also acknowledge that with respect to the identification and characterization of the genome variation affecting risk of common human diseases - those accounting for the overwhelming majority of public health care expenditures - our discoveries have not generated nearly the knowledge of disease etiology that we would have expected given the sheer number of these discoveries. The combination of these observations coupled with the dramatic reduction in the cost of sequencing is driving the case for moving to whole genome sequencing studies for common disease. We have focused our application on three critical challenges for methods development and analysis of whole genome sequence data. While there has been substantial progress in methods development for analysis of exome sequencing, with gene-based tests that are relatively robust to the direction of effects of rare coding variants as well as the proportionof the gene's rare variants contributing to phenotype, it is clear that methods integrating the analysis of common and rare variation as well as functional genomics data will be essential for appropriate analysis of whole genome sequence data. Thus, we propose in Specific Aim 1 to develop novel methods, software and analysis pipelines for integrated analysis of common and rare variants for the analysis of whole genome sequence on 50,000- 100,000 individuals for any given common disease, and in Specific Aim 2 to develop and apply novel approaches for prioritizing results from these large-scale sequencing studies using BioVU, the 200,000 member biobank at Vanderbilt that is associated with 30 years of high quality electronic health records, and in Specific Aim 3 to develop queriable results databases and a comprehensive web portal to serve results of our studies on the sequence data for both the internal sequencing community and for the broader scientific community.
Large-scale whole genome sequencing studies are being carried out to understand the genetic architecture of human complex diseases. It remains challenging to decipher the genetic mechanisms of disease etiology. In this application we aim to develop a suite of statistical and computational methods to identify genes and variants associated with complex disease and use BioVU, a DNA BioBank linked to electronic health records, to validate and investigate genetic architecture of multiple complex diseases.