The coming NHGRI Centers for Common Disease Genomics (CCDG) and Centers for Mendelian Genomics (CMG) plan to generate whole genome sequencing (WGS) data on over 200,000 individuals. WGS will provide comprehensive and complete genetic data across coding and non-coding variation, presenting an unprecedented opportunity for discovery in the genetic analysis of human diseases. However, a lack of powerful analytic tools that fully realize the potential of these data has emerged as a bottleneck for effectively translating rich information contained in these massive WGS data into meaningful insights about human diseases. There is a pressing need to develop powerful and robust analytic methods for WGS that can accelerate genetic discoveries. To meet this need, we have assembled an interdisciplinary team of computational biologists, geneticists, and statisticians. Building on our extensive track record in sequencing studies, statistical genetics, functional analysis and computational biology, we will power the next round of genetic discoveries by (1) building a massive WGS control sample and developing the methods for incorporating these controls in studies of complex and Mendelian diseases; (2) creating more powerful statistical methods for rare variant analysis through the incorporation of functional and regulatory information and advanced statistical tools; (3) establishing methods to analyze multiple phenotypes to boost the power for association and understand how different phenotypes relate genetically. These methods will enhance our ability to identify novel associations across a wide range of genetic architectures, from Mendelian diseases driven by a strong acting allele to complex polygenic traits. Novel associations promise to lay the foundation for gaining new insight into the biological mechanisms driving disease and be the bedrock for precision prevention and medicine strategies. We will collaborate with the investigators of the Genome Sequencing Program, and will share the developed data resources, tools and methods with the community through user-friendly open source software and educational modules.

Public Health Relevance

Statistical and computational methods, as well as shared data and functional annotation resources, play a pivotal role in genetic analysis of human diseases using Whole Genome Sequencing (WGS) data. They will enable researchers to timely and effectively extract knowledge from massive WGS data and complex and diverse phenotype data, and to gain insights in disease etiology, risk and prognosis, and lay the foundation for developing new strategies to reduce disease burden and improving disease prevention and patient care strategies.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project--Cooperative Agreements (U01)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1-HGR-L (J1))
Program Officer
Felsenfeld, Adam
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Harvard University
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
Sohail, Mashaal; Vakhrusheva, Olga A; Sul, Jae Hoon et al. (2017) Negative selection in humans and fruit flies involves synergistic epistasis. Science 356:539-542
Chun, Sung; Casparino, Alexandra; Patsopoulos, Nikolaos A et al. (2017) Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat Genet 49:600-605
Liu, Zhonghua; Lin, Xihong (2017) Multiple phenotype association tests using summary statistics in genome-wide association studies. Biometrics :
Barfield, Richard; Shen, Jincheng; Just, Allan C et al. (2017) Testing for the indirect effect under the null for genome-wide mediation analyses. Genet Epidemiol 41:824-833
Fonseka, Chamith Y; Rao, Deepak A; Raychaudhuri, Soumya (2017) Leveraging blood and tissue CD4+ T cell heterogeneity at the single cell level to identify mechanisms of disease in rheumatoid arthritis. Curr Opin Immunol 49:27-36
Cassa, Christopher A; Weghorn, Donate; Balick, Daniel J et al. (2017) Estimating the selective effects of heterozygous protein-truncating variants from human exome data. Nat Genet 49:806-810