We propose to develop an array of novel statistical methods for association analysis of functional phenotypes arising from high-throughput sequencing assays in genetics and genomics, specifically data from RNA-seq, ChIP-seq, and DNase-seq assays. Our proposed approach is to treat the number of reads mapping to each base along the genome as a highly-multivariate, but also highly-structured, phenotype. Using methods from signal processing (wavelets), we will develop methods to identify regions of the genome where these phenotypes differ significantly between samples, or groups of samples (e.g. cell types, treatment groups, or genotype classes). In contrast to approaches based on sliding windows, the methods will be capable of identifying differences that occur at multiple different scales. The statistical methods we develop will facilitate both small-scale comparisons (e.g. identifying differences in binding, o histone modifications, between two samples or conditions), and larger-scale analyses, such as genetic association analyses that aim to identify genetic variants associated with these phenotypes (expression QTLs, binding QTLs, dsQTLs). As an important special case, our methods will tackle the commonly- encountered problem of identifying differentially expressed genes, including variations in splicing or alternative transcripts, from RNA-seq data. These methods will build on and substantially extend methods for association analyses developed during the current funding cycle of this R01. The result of our research will be a suite of statistical tools that will greatly facilitate the analysis of the wide range of genetic and genomi studies that involve functional phenotypes. We will produce and distribute user-friendly software implementing these methods. We will use our methods to analyze existing data generated by our collaborators, and publicly-available data from the NIH-funded GTeX project, both to compare them with existing analysis methods and to identify regulatory genetic variants responsible for phenotypic variation. The overall objective is for the work to provide software and statistical tools for the genetics and genomics research community, facilitating biological discoveries and insights, and, ultimately, understanding of the genetic basis of common disease.

Public Health Relevance

This project will generate statistical tools for the genetics and genomics research community, and apply them to identify functional genetic variants that affect human phenotypes. These tools will help facilitate biological discoveries and insights, and, ultimately, understanding of the genetic basis of common disease.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Ramos, Erin
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Chicago
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code
Raj, Anil; Stephens, Matthew; Pritchard, Jonathan K (2014) fastSTRUCTURE: variational inference of population structure in large SNP data sets. Genetics 197:573-89
Zhou, Xiang; Stephens, Matthew (2014) Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat Methods 11:407-9
Stephens, Matthew (2013) A unified framework for association analysis with multiple related phenotypes. PLoS One 8:e65245
Zhou, Xiang; Carbonetto, Peter; Stephens, Matthew (2013) Polygenic modeling with bayesian sparse linear mixed models. PLoS Genet 9:e1003264
Mangravite, Lara M; Engelhardt, Barbara E; Medina, Marisa W et al. (2013) A statin-dependent QTL for GATM expression is associated with statin-induced myopathy. Nature 502:377-80
Carbonetto, Peter; Stephens, Matthew (2013) Integrated enrichment analysis of variants and pathways in genome-wide association studies indicates central role for IL-2 signaling genes in type 1 diabetes, and cytokine signaling genes in Crohn's disease. PLoS Genet 9:e1003770
Fledel-Alon, Adi; Leffler, Ellen Miranda; Guan, Yongtao et al. (2011) Variation in human recombination rates and its genetic determinants. PLoS One 6:e20321
Maranville, Joseph C; Luca, Francesca; Richards, Allison L et al. (2011) Interactions between glucocorticoid treatment and cis-regulatory polymorphisms contribute to cellular response phenotypes. PLoS Genet 7:e1002162
Barreiro, Luis B; Marioni, John C; Blekhman, Ran et al. (2010) Functional comparison of innate immune signaling pathways in primates. PLoS Genet 6:e1001249
Engelhardt, Barbara E; Stephens, Matthew (2010) Analysis of population structure: a unifying framework and novel methods based on sparse factor analysis. PLoS Genet 6:

Showing the most recent 10 out of 30 publications