Genome Analysis: Data Accuracy, Haplotyping and Mapping

Stephens, Matthew

Abstract

We propose to develop an array of novel statistical methods for association analysis of functional phenotypes arising from high-throughput sequencing assays in genetics and genomics, specifically data from RNA-seq, ChIP-seq, and DNase-seq assays. Our proposed approach is to treat the number of reads mapping to each base along the genome as a highly-multivariate, but also highly-structured, phenotype. Using methods from signal processing (wavelets), we will develop methods to identify regions of the genome where these phenotypes differ significantly between samples, or groups of samples (e.g. cell types, treatment groups, or genotype classes). In contrast to approaches based on sliding windows, the methods will be capable of identifying differences that occur at multiple different scales. The statistical methods we develop will facilitate both small-scale comparisons (e.g. identifying differences in binding, o histone modifications, between two samples or conditions), and larger-scale analyses, such as genetic association analyses that aim to identify genetic variants associated with these phenotypes (expression QTLs, binding QTLs, dsQTLs). As an important special case, our methods will tackle the commonly- encountered problem of identifying differentially expressed genes, including variations in splicing or alternative transcripts, from RNA-seq data. These methods will build on and substantially extend methods for association analyses developed during the current funding cycle of this R01. The result of our research will be a suite of statistical tools that will greatly facilitate the analysis of the wide range of genetic and genomi studies that involve functional phenotypes. We will produce and distribute user-friendly software implementing these methods. We will use our methods to analyze existing data generated by our collaborators, and publicly-available data from the NIH-funded GTeX project, both to compare them with existing analysis methods and to identify regulatory genetic variants responsible for phenotypic variation. The overall objective is for the work to provide software and statistical tools for the genetics and genomics research community, facilitating biological discoveries and insights, and, ultimately, understanding of the genetic basis of common disease.

Public Health Relevance

This project will generate statistical tools for the genetics and genomics research community, and apply them to identify functional genetic variants that affect human phenotypes. These tools will help facilitate biological discoveries and insights, and, ultimately, understanding of the genetic basis of common disease.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 4R01HG002585-12
Application #: 9102125
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Ramos, Erin

Project Start: 2002-09-20
Project End: 2017-06-30
Budget Start: 2016-07-01
Budget End: 2017-06-30
Support Year: 12
Fiscal Year: 2016
Total Cost
Indirect Cost

Institution

Name: University of Chicago
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 005421136

City: Chicago
State: IL
Country: United States
Zip Code: 60637

Related projects

Publications

Zhu, Xiang; Stephens, Matthew (2018) Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes. Nat Commun 9:4361

Gerard, David; Stephens, Matthew (2018) Empirical Bayes shrinkage and false discovery rate estimation, allowing for unwanted variation. Biostatistics :

Al-Asadi, Hussein; Dey, Kushal K; Novembre, John et al. (2018) Inference and visualization of DNA damage patterns using a Grade of Membership Model. Bioinformatics :

Zhu, Xiang; Stephens, Matthew (2017) BAYESIAN LARGE-SCALE MULTIPLE REGRESSION WITH SUMMARY STATISTICS FROM GENOME-WIDE ASSOCIATION STUDIES. Ann Appl Stat 11:1561-1592

Dey, Kushal K; Hsiao, Chiaowen Joyce; Stephens, Matthew (2017) Visualizing the structure of RNA-seq expression data using grade of membership models. PLoS Genet 13:e1006599

Stephens, Matthew (2017) False discovery rates: a new deal. Biostatistics 18:275-294

Petkova, Desislava; Novembre, John; Stephens, Matthew (2016) Visualizing spatial population structure with estimated effective migration surfaces. Nat Genet 48:94-100

Lu, Mengyin; Stephens, Matthew (2016) Variance adaptive shrinkage (vash): flexible empirical Bayes estimation of variances. Bioinformatics 32:3428-3434

Raj, Anil; Wang, Sidney H; Shim, Heejung et al. (2016) Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling. Elife 5:

Shim, Heejung; Chasman, Daniel I; Smith, Joshua D et al. (2015) A multivariate genome-wide association analysis of 10 LDL subfractions, and their response to statin treatment, in 1868 Caucasians. PLoS One 10:e0120758

Showing the most recent 10 out of 46 publications

Comments

Be the first to comment on Matthew Stephens's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: