In the historical endeavor striving to understand complex disease via genome-wide association studies (GWAS), the X chromosome (X) has typically been disregarded or incorrectly analyzed due to analytical complications stemming from its unique mode of inheritance and population genetic patterns. This trend has carried over into sequence-based association studies, genome-wide studies of regulatory elements, and studies of gene expression. Beyond comprising 5% of the human genome, X likely contributes to the sex- specific prevalence, symptoms or progression observed in most complex diseases. These include many leading causes of death and disability, such as neurological and psychiatric disorders, cardiovascular diseases, autoimmune diseases, and cancer. This project will support the applicant?s long-term goal of advancing the search for X-linked complex disease genes while elucidating how evolutionary history and natural selection uniquely shaped human genetic variation on X. The objectives of this application are the development of methods and software for analyzing X in GWAS and sequence-based association studies, and their application for discovering X many risk loci underlying complex diseases. The rationale for performing this work is that it will reveal the role of X in the etiology of several diseases, and advance the exploration of sexual dimorphism in disease. This will be achieved by pursuing the following specific aims: 1) Develop new X-specific statistical and computational methods for X-wide association studies (XWAS), expression quantitative trait loci (eQTL) studies of X, and sex-specific, X-tailored analysis of DNase-seq experiments; 2) Facilitate accurate genotype calling and processing of X in sequence data, and develop X-optimized tests for rare variant association studies and identity-by-descent mapping; 3) Discover, replicate, and interpret X-linked associations, based on analysis and meta-analysis of data from hundreds of studies, with a focus on common psychiatric disorders, quantitative risk factors of coronary artery disease, and eQTL; 4) Develop open source, freely available software that implements all methods from Aim 1 and Aim 2, together with existing methods. The proposed research is innovative in that it will develop new approaches and methodologies to accurately analyze X, pioneering the inclusion of X in association studies and related fields. Its contribution will be novel statistical and computational methods tailored specifically for X, and insight into the role of X in several complex diseases and traits. The contribution will be further increased by the availability of software that facilitates analysis by others of the thousands of studies where X remains essentially unexplored. Overall, the proposed research is significant, and relevant to public health, because it will help reveal the role of X in human complex disease etiology, and help advance sex-specific disease diagnosis and treatment.

Public Health Relevance

The proposed research is relevant to public health because it will address a major deficit in methods for analyzing the X chromosome in studies of human complex diseases. Only after overcoming this barrier can substantial progress be made toward understanding the mechanisms underlying sexual dimorphism in disease susceptibility and pathogenesis, which are widespread among leading causes of death and disability. This project is relevant to the missions of NIH and, specifically, NHGRI, NIMH, and NHLBI because it will advance human health through discovery of novel X-linked disease risk loci underlying psychiatric disorders and risk factors of coronary artery disease, and through development of methods and software that facilitate analysis of the X chromosome by the broader medical genetics community.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Li, Rongling
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Cornell University
Biostatistics & Other Math Sci
Earth Sciences/Resources
United States
Zip Code
Lussier, Alexandre A; Keinan, Alon (2018) Crowdsourced genealogies and genomes. Science 360:153-154
Gardner, Eugene J; Lam, Vincent K; Harris, Daniel N et al. (2017) The Mobile Element Locator Tool (MELT): population-scale mobile element discovery and biology. Genome Res 27:1916-1929
Zheng-Bradley, Xiangqun; Streeter, Ian; Fairley, Susan et al. (2017) Alignment of 1000 Genomes Project reads to reference assembly GRCh38. Gigascience 6:1-8
Ye, Kaixiong; Gao, Feng; Wang, David et al. (2017) Dietary adaptation of FADS genes in Europe varied across time and geography. Nat Ecol Evol 1:167
D'Amico, Fabio; Skarmoutsou, Evangelia; Lo, Lauren J et al. (2017) Association between rs2294020 in X-linked CCDC22 and susceptibility to autoimmune diseases with focus on systemic lupus erythematosus. Immunol Lett 181:58-62
Gao, Feng; Keinan, Alon (2016) Explosive genetic evidence for explosive human population growth. Curr Opin Genet Dev 41:130-139
Pinto, Yishay; Gabay, Orshay; Arbiza, Leonardo et al. (2016) Clustered mutations in hominid genome evolution are consistent with APOBEC3G enzymatic activity. Genome Res 26:579-87
Billing-Ross, Paul; Germain, Arnaud; Ye, Kaixiong et al. (2016) Mitochondrial DNA variants correlate with symptoms in myalgic encephalomyelitis/chronic fatigue syndrome. J Transl Med 14:19
Waldman, Yedael Y; Biddanda, Arjun; Dubrovsky, Maya et al. (2016) The genetic history of Cochin Jews from India. Hum Genet 135:1127-43
Gao, Feng; Keinan, Alon (2016) Inference of Super-exponential Human Population Growth via Efficient Computation of the Site Frequency Spectrum for Generalized Models. Genetics 202:235-45

Showing the most recent 10 out of 25 publications