In this proposal, we address the enormous challenges common complex diseases pose for genomic analysis and the enormous opportunities surmounting them offers for advancing healthcare. The common genetic disorders proposed for study here are believed to have extreme locus heterogeneity, requiring the analysis of large numbers of samples to comprehensively identify the genomic variants underlying them. We propose that a combination of deep population studies and joint analysis of SNPs, indels, and structural variants both in coding and noncoding regions will provide the next level of understanding of common genetic disorders. Whole genome sequencing (WGS) will be critical to this next-generation approach to the genomics of complex disease. WGS will need to be accompanied by the technical ability to generate and handle very large data sets, a particular focus and strength of NYGC. WGS will also need to be accompanied by new statistical tools and algorithms, which will be developed by the strong core group committed to this proposal. An overarching goal of this proposal, one that capitalizes on the power of WGS, is to identify disease- associated variants at the individual nucleotide level. In many cases pathogenic mutations fall in noncoding regions of the genome, which can only be fruitfully explored with WGS. A major effort will be put into building new computational strategies to functionally annotate noncoding transcribed sequences, and to build new datasets to enable such strategies, opening new frontiers of understanding of disease-related regulatory variants. We will explore a wide spectrum of human variation using the WGS platform, including rare variants of modest to large effect, de novo variants of large effect, and common variants of small effect. We will combine available RNA and epigenomic datasets to predict modes of action of risk and identify protective alleles. These results, combined with the integration of environmental and clinical data, will enhance our understanding of genetic risk for common disease and lay the groundwork for utilization of personal genomics in disease prevention and treatment, including the delineation of pathways for drug development. Many of the population cohorts proposed for study are from New York, which harbors the most diverse population in the world. Analyzing diverse populations is a critical component of comprehensive common disease analysis, as effect sizes of individual alleles are believed to vary in different populations due to gene- gene interactions. Using the genetic admixture present in different populations from NY and throughout the United States, we will conduct the first systematic study of these interaction effects in many phenotypes.
These aims will be accomplished through widespread collaborations, with genomicists, physicians, and patients, organized through a focused team at NYGC. They will be enriched by the collaboration and support from independent Foundations.

Public Health Relevance

The diseases that NYGC proposes to study, autism, autism and Alzheimer's, all have a large public health burden and often one that differs based on individuals' ethnicity. By studying large, ethnically diverse cohorts, using family-based cohorts when possible, NYGC will uncover disease-associated alleles that can be used for prevention, screening and treatment. Further, the data sets created will serve as a resource to the community and can be mined for other disease associations and ethnicity-specific allele frequencies.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project with Complex Structure Cooperative Agreement (UM1)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1)
Program Officer
Felsenfeld, Adam
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
New York Genome Center
New York
United States
Zip Code
Mohammadi, Pejman; Castel, Stephane E; Brown, Andrew A et al. (2017) Quantifying the regulatory effect size of cis-acting genetic variation using allelic fold change. Genome Res 27:1872-1884
Kim-Hellmuth, Sarah; Bechheim, Matthias; Pütz, Benno et al. (2017) Genetic regulatory effects modified by immune activation contribute to autoimmune disease associations. Nat Commun 8:266
Stoeckius, Marlon; Hafemeister, Christoph; Stephenson, William et al. (2017) Simultaneous epitope and transcriptome measurement in single cells. Nat Methods 14:865-868
Hwang, Hun-Way; Saito, Yuhki; Park, Christopher Y et al. (2017) cTag-PAPERCLIP Reveals Alternative Polyadenylation Promotes Cell-Type Specific Protein Diversity and Shifts Araf Isoforms with Microglia Activation. Neuron 95:1334-1349.e5
Willems, Thomas; Zielinski, Dina; Yuan, Jie et al. (2017) Genome-wide profiling of heritable and de novo STR variations. Nat Methods 14:590-592
Huang, Yi-Fei; Gulko, Brad; Siepel, Adam (2017) Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data. Nat Genet 49:618-624
Turner, Tychele N; Coe, Bradley P; Dickel, Diane E et al. (2017) Genomic Patterns of De Novo Mutation in Simplex Autism. Cell 171:710-722.e12
Turner, Tychele N; Hormozdiari, Fereydoun; Duyzend, Michael H et al. (2016) Genome Sequencing of Autism-Affected Families Reveals Disruption of Putative Noncoding Regulatory DNA. Am J Hum Genet 98:58-74
Kim-Hellmuth, Sarah; Lappalainen, Tuuli (2016) Concerted Genetic Function in Blood Traits. Cell 167:1167-1169