Genome analysis:  statistical methods and applications

Stephens, Matthew

Abstract

In recent years new data and technologies have transformed our understanding of transcriptional processes and how they are influenced by genetic variation. The GTEx project has measured both genetic variation and transcriptional variation in 50 tissues across hundreds of individuals, and identified hundreds of thousands of genetic variants that are associated with gene expression (eQTLs). And technological innovations have now made it possible to interrogate transcription, genome-wide, in single cells. The Human Cell Atlas (HCA) project is currently using such technologies to profile millions of cells, with the ambitious goal of providing a comprehensive atlas of the diverse cell types that make up human bodies. However, current analytic tools are limited in their ability to fully exploit the richness of these data. Current analysis tools for identifying eQTLs across 50 tissues perform well for identifying associations ? both tissue- specific effects and those that are broadly shared across tissues ? but are not yet designed for fine-mapping the underlying functional variants that explain these association signals. And methods for summarizing and characterizing transcriptional heterogeneity among single cells are not capable of capturing the complex layered character of this heterogeneity - for example, that cells might cluster into different groups depending on which genes or transcriptional processes are considered. Here we propose to develop novel statistical methods to address these issues. We will develop dimension reduction techniques for single cell analysis, aimed at capturing the complex patterns of heterogeneity that existing methods ignore. We will develop statistical tools for reliably assessing the genes and processes that show transcriptional differences among groups of cells. And we will develop and apply methods to fine-map the functional variants underlying many of the eQTLs in the GTEx project data, fully exploiting the information in the many tissues profiled, and disseminate the results on the internet in a convenient form. The overall goal of the project is to build and apply methods and software to help fully exploit the rich information in projects like GTEx and HCA, and make them available to the broad community of biological and medical scientists who can benefit from the results.

Public Health Relevance

Public health relevance: This project will generate and apply statistical tools for analysing large-scale studies that aim to understand the transcriptional variability of cell types, and the effects of genetic variation on transcriptomes, both of which are fundamental issues in biology. Understanding the way that cells change during disease state, and identifying genetic variants that impact transcription, have the potential to help understand the biology of disease, eventually leading to new treatment strategies.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG002585-15
Application #: 9977226
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Ramos, Erin

Project Start: 2002-09-20
Project End: 2022-06-30
Budget Start: 2020-07-01
Budget End: 2021-06-30
Support Year: 15
Fiscal Year: 2020
Total Cost
Indirect Cost

Institution

Name: University of Chicago
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 005421136

City: Chicago
State: IL
Country: United States
Zip Code: 60637

Related projects

Publications

Zhu, Xiang; Stephens, Matthew (2018) Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes. Nat Commun 9:4361

Gerard, David; Stephens, Matthew (2018) Empirical Bayes shrinkage and false discovery rate estimation, allowing for unwanted variation. Biostatistics :

Al-Asadi, Hussein; Dey, Kushal K; Novembre, John et al. (2018) Inference and visualization of DNA damage patterns using a Grade of Membership Model. Bioinformatics :

Zhu, Xiang; Stephens, Matthew (2017) BAYESIAN LARGE-SCALE MULTIPLE REGRESSION WITH SUMMARY STATISTICS FROM GENOME-WIDE ASSOCIATION STUDIES. Ann Appl Stat 11:1561-1592

Dey, Kushal K; Hsiao, Chiaowen Joyce; Stephens, Matthew (2017) Visualizing the structure of RNA-seq expression data using grade of membership models. PLoS Genet 13:e1006599

Stephens, Matthew (2017) False discovery rates: a new deal. Biostatistics 18:275-294

Lu, Mengyin; Stephens, Matthew (2016) Variance adaptive shrinkage (vash): flexible empirical Bayes estimation of variances. Bioinformatics 32:3428-3434

Raj, Anil; Wang, Sidney H; Shim, Heejung et al. (2016) Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling. Elife 5:

Petkova, Desislava; Novembre, John; Stephens, Matthew (2016) Visualizing spatial population structure with estimated effective migration surfaces. Nat Genet 48:94-100

Shim, Heejung; Chasman, Daniel I; Smith, Joshua D et al. (2015) A multivariate genome-wide association analysis of 10 LDL subfractions, and their response to statin treatment, in 1868 Caucasians. PLoS One 10:e0120758

Showing the most recent 10 out of 46 publications

Comments

Be the first to comment on Matthew Stephens's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: