The coming NHGRI Centers for Common Disease Genomics (CCDG) and Centers for Mendelian Genomics (CMG) plan to generate whole genome sequencing (WGS) data on over 200,000 individuals. WGS will provide comprehensive and complete genetic data across coding and non-coding variation, presenting an unprecedented opportunity for discovery in the genetic analysis of human diseases. However, a lack of powerful analytic tools that fully realize the potential of these data has emerged as a bottleneck for effectively translating rich information contained in these massive WGS data into meaningful insights about human diseases. There is a pressing need to develop powerful and robust analytic methods for WGS that can accelerate genetic discoveries. To meet this need, we have assembled an interdisciplinary team of computational biologists, geneticists, and statisticians. Building on our extensive track record in sequencing studies, statistical genetics, functional analysis and computational biology, we will power the next round of genetic discoveries by (1) building a massive WGS control sample and developing the methods for incorporating these controls in studies of complex and Mendelian diseases; (2) creating more powerful statistical methods for rare variant analysis through the incorporation of functional and regulatory information and advanced statistical tools; (3) establishing methods to analyze multiple phenotypes to boost the power for association and understand how different phenotypes relate genetically. These methods will enhance our ability to identify novel associations across a wide range of genetic architectures, from Mendelian diseases driven by a strong acting allele to complex polygenic traits. Novel associations promise to lay the foundation for gaining new insight into the biological mechanisms driving disease and be the bedrock for precision prevention and medicine strategies. We will collaborate with the investigators of the Genome Sequencing Program, and will share the developed data resources, tools and methods with the community through user-friendly open source software and educational modules.

Public Health Relevance

Statistical and computational methods, as well as shared data and functional annotation resources, play a pivotal role in genetic analysis of human diseases using Whole Genome Sequencing (WGS) data. They will enable researchers to timely and effectively extract knowledge from massive WGS data and complex and diverse phenotype data, and to gain insights in disease etiology, risk and prognosis, and lay the foundation for developing new strategies to reduce disease burden and improving disease prevention and patient care strategies.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project--Cooperative Agreements (U01)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1-HGR-L (J1))
Program Officer
Felsenfeld, Adam
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Harvard University
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
Gazal, Steven; Loh, Po-Ru; Finucane, Hilary K et al. (2018) Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations. Nat Genet 50:1600-1607
Slowikowski, Kamil; Wei, Kevin; Brenner, Michael B et al. (2018) Functional genomics of stromal cells in chronic inflammatory diseases. Curr Opin Rheumatol 30:65-71
Liu, Zhonghua; Lin, Xihong (2018) Multiple phenotype association tests using summary statistics in genome-wide association studies. Biometrics 74:165-175
Verbanck, Marie; Chen, Chia-Yen; Neale, Benjamin et al. (2018) Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat Genet 50:693-698
Sun, Ryan; Carroll, Raymond J; Christiani, David C et al. (2018) Testing for gene-environment interaction under exposure misspecification. Biometrics 74:653-662
Li, Heng; Bloom, Jonathan M; Farjoun, Yossi et al. (2018) A synthetic-diploid benchmark for accurate variant-calling evaluation. Nat Methods 15:595-597
Cassa, Christopher A; Weghorn, Donate; Balick, Daniel J et al. (2017) Estimating the selective effects of heterozygous protein-truncating variants from human exome data. Nat Genet 49:806-810
Chun, Sung; Casparino, Alexandra; Patsopoulos, Nikolaos A et al. (2017) Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat Genet 49:600-605
Ritz, Beate R; Chatterjee, Nilanjan; Garcia-Closas, Montserrat et al. (2017) Lessons Learned From Past Gene-Environment Interaction Successes. Am J Epidemiol 186:778-786
Sofer, Tamar; Schifano, Elizabeth D; Christiani, David C et al. (2017) Weighted pseudolikelihood for SNP set analysis with multiple secondary outcomes in case-control genetic association studies. Biometrics 73:1210-1220

Showing the most recent 10 out of 21 publications