One of the major problems in human genetics is understanding the genetic causes underlying complex phenotypes, including neuropsychiatric traits such as autism spectrum disorders and schizophrenia. Despite tremendous work over the past few decades, the underlying biological mechanisms are poorly understood in most cases. Recent advances in high-throughput, massively parallel genomic technologies have revolutionized the field of human genetics and promise to lead to important scientific advances. Despite this progress in data generation, it remains very challenging to analyze and interpret these data. The main focus of this proposal is the development of powerful statistical methods for the integration of whole-genome sequencing data with rich functional genomics data with the goal to improve the discovery of genes involved in autism spectrum disorders. We propose to integrate data from many different sources, including epigenetic data from projects such as ENCODE, Roadmap, and PsychENCODE, eQTL data from the GTEx, PsychENCODE and CommonMind consortia, data from large scale databases of genetic variation such as ExAC and gnomAD, in order to predict functional effects of genetic variants in non-coding genetic regions in a tissue and cell type specific manner, and generate functional maps across large number of tissues and cell types in the human body that we can then use to identify novel associations with autism in whole-genome sequencing studies. The proposed functional predictions and functional maps will be broadly available in the popular ANNOVAR database. We further propose to use these functional predictions in the analysis of almost 20,000 whole genomes from three large whole genome sequencing studies for autism. We believe that the proposed research is very timely and has the potential to substantially improve the analysis of non-coding genetic variation, and hence provide new insights into the biological mechanisms underlying risk to autism, and more broadly to other neuropsychiatric diseases.

Public Health Relevance

Autism Spectrum Disorders are common diseases with major impact on public health. Although coding variation has been extensively studied for its role in affecting risk to autism, the analysis of non-coding variation poses tremendous challenges. The proposed statistical methods and their applications to nearly 20,000 whole genomes from three large autism whole genome sequencing studies will improve our understanding of the biological mechanisms involved in autism with important implications for disease treatment strategies.

National Institute of Health (NIH)
National Institute of Mental Health (NIMH)
Research Project (R01)
Project #
Application #
Study Section
Behavioral Genetics and Epidemiology Study Section (BGES)
Program Officer
Arguello, Alexander
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Columbia University (N.Y.)
Biostatistics & Other Math Sci
Schools of Public Health
New York
United States
Zip Code
Polubriaginof, Fernanda C G; Vanguri, Rami; Quinnies, Kayla et al. (2018) Disease Heritability Inferred from Familial Relationships Reported in Medical Records. Cell 173:1692-1704.e11
Backenroth, Daniel; He, Zihuai; Kiryluk, Krzysztof et al. (2018) FUN-LDA: A Latent Dirichlet Allocation Model for Predicting Tissue-Specific Functional Effects of Noncoding Variation: Methods and Applications. Am J Hum Genet 102:920-942
Liu, Yuwen; Liang, Yanyu; Cicek, A Ercument et al. (2018) A Statistical Framework for Mapping Risk Genes from De Novo Mutations in Whole-Genome-Sequencing Studies. Am J Hum Genet 102:1031-1047
Sanna-Cherchi, Simone; Khan, Kamal; Westland, Rik et al. (2017) Exome-wide Association Study Identifies GREB1L Mutations in Congenital Kidney Malformations. Am J Hum Genet 101:789-802
He, Zihuai; Xu, Bin; Lee, Seunggeun et al. (2017) Unified Sequence-Based Association Tests Allowing for Multiple Functional Annotations and Meta-analysis of Noncoding Variation in Metabochip Data. Am J Hum Genet 101:340-352
Kiryluk, Krzysztof; Li, Yifu; Moldoveanu, Zina et al. (2017) GWAS for serum galactose-deficient IgA1 implicates critical genes of the O-glycosylation pathway. PLoS Genet 13:e1006609
Song, Xiaoyu; Li, Gen; Zhou, Zhenwei et al. (2017) QRank: a novel quantile regression tool for eQTL discovery. Bioinformatics 33:2123-2130
He, Zihuai; Lee, Seunggeun; Zhang, Min et al. (2017) Rare-variant association tests in longitudinal studies, with an application to the Multi-Ethnic Study of Atherosclerosis (MESA). Genet Epidemiol 41:801-810
Lim, Elaine T; Uddin, Mohammed; De Rubeis, Silvia et al. (2017) Rates, distribution and implications of postzygotic mosaic mutations in autism spectrum disorder. Nat Neurosci 20:1217-1224
Song, Xiaoyu; Ionita-Laza, Iuliana; Liu, Mengling et al. (2016) A General and Robust Framework for Secondary Traits Analysis. Genetics 202:1329-43

Showing the most recent 10 out of 34 publications