Removing statistical bottle-necks in data analysis for the ENCODE Consortium

Bickel, Peter

Abstract

The data generated by the ENCODE Consortium constitutes an unprecedented opportunity to make biomedical inferences about the function and structure of the human genome. With nearly 1000 genome-wide assays, the depth of information now publicly available about each base-pair is staggering, and in the next round of consortium work this depth is likely to increase geometrically. In this proposal, we describe statistical challenges that, if met, will substantially enhance the capacity of the Analysis Workin Group (AWG) to make functional biological inferences from ENCODE data. We will tackle these challenges, serving as a statistical Research and Development component with built-in experimental validation capabilities, and will provide the AWG with useful software implementations of the statistical tools we develop, as well with iterative refinements of these tools grounded in experimental validations of our imputed networks. In particular, we will: 1) develop methods of dimension reduction that will aid the AWG in data visualization, summarization, and prediction studies (e.g. the prediction of transcription from chromatin data); 2) develop new quantitative network models of complex biological systems assayed by ENCODE; and 3) conduct targeted biological validation assays designed to interrogate important low-dimensional structures in our models and to feed back to improve both model structure and performance. Our approaches to dimension reduction will aid biologists in interpreting and formulating hypotheses from high-dimensional genomics data, our network models will facilitate the construction of interpretable predictive algorithms that lead directly t testable and quantifiable hypotheses, and our validation assays will ensure that inferences derived from our tools provide meaningful biological insights. As we did as part of the ENCODE and modENCODE data analysis centers, we will work closely with the AWG to ensure that our software implementations are immediately and maximally useful to the consortium, and that the overall course of our work on network inference and dimension reduction is focused around biological questions central to the interests of the consortium.

Public Health Relevance

The ENCODE Consortium has generated, and will continue to generate thousands of data sets that each provide information about the biochemical activity of every base in the human genome. The scope of this data is now so vast that no one researcher can hope to develop a coherent understanding of more than a very small fraction of the total information. Our aim is to provide computational tools that identify important structures and correlations in the data that can be understood and interpreted by researchers; that is, to enable scientists to understand and derive insight from genome-scale biology.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project--Cooperative Agreements (U01)
Project #: 3U01HG007031-03S1
Application #: 9037906
Study Section: Special Emphasis Panel (ZHG1)
Program Officer: Gilchrist, Daniel A

Project Start: 2012-09-17
Project End: 2017-06-30
Budget Start: 2015-07-01
Budget End: 2017-06-30
Support Year: 3
Fiscal Year: 2015
Total Cost
Indirect Cost

Institution

Name: University of California Berkeley
Department: Miscellaneous
Type: Organized Research Units
DUNS #: 124726725

City: Berkeley
State: CA
Country: United States
Zip Code: 94704

Related projects


NIH 2015 U01 HG	Removing statistical bottle-necks in data analysis for the ENCODE Consortium Bickel, Peter J. / University of California Berkeley
NIH 2014 U01 HG	Removing statistical bottle-necks in data analysis for the ENCODE Consortium Bickel, Peter J. / University of California Berkeley	$414,030
NIH 2013 U01 HG	Removing statistical bottle-necks in data analysis for the ENCODE Consortium Bickel, Peter J. / University of California Berkeley	$403,326
NIH 2012 U01 HG	Removing statistical bottle-necks in data analysis for the ENCODE Consortium Bickel, Peter J. / University of California Berkeley	$425,000

Publications

Basu, Sumanta; Kumbier, Karl; Brown, James B et al. (2018) Iterative random forests to discover predictive and stable high-order interactions. Proc Natl Acad Sci U S A 115:1943-1948

Wang, Y X Rachel; Liu, Ke; Theusch, Elizabeth et al. (2018) Generalized correlation measure using count statistics for gene expression data with ordered samples. Bioinformatics 34:617-624

Shi, Funan; Huang, Haiyan (2017) Identifying Cell Subpopulations and Their Genetic Drivers from Single-Cell RNA-Seq Data Using a Biclustering Approach. J Comput Biol 24:663-674

Wu, Siqi; Joseph, Antony; Hammonds, Ann S et al. (2016) Stability-driven nonnegative matrix factorization to interpret spatial gene expression and build local gene networks. Proc Natl Acad Sci U S A 113:4290-5

Stoiber, Marcus H; Olson, Sara; May, Gemma E et al. (2015) Extensive cross-regulation of post-transcriptional regulatory networks in Drosophila. Genome Res 25:1692-702

Li, Jingyi Jessica; Huang, Haiyan; Bickel, Peter J et al. (2014) Comparison of D. melanogaster and C. elegans developmental stages, tissues, and cells by modENCODE RNA-seq data. Genome Res 24:1086-101

Wang, Y X Rachel; Huang, Haiyan (2014) Review on statistical methods for gene network reconstruction using expression data. J Theor Biol 362:53-61

Boley, Nathan; Stoiber, Marcus H; Booth, Benjamin W et al. (2014) Genome-guided transcript assembly by integrative analysis of RNA sequence data. Nat Biotechnol 32:341-6

Alam, Tanvir; Medvedeva, Yulia A; Jia, Hui et al. (2014) Promoter analysis reveals globally differential regulation of human long non-coding RNA and protein-coding genes. PLoS One 9:e109443

Kim, Kyungpil; Bolotin, Eugene; Theusch, Elizabeth et al. (2014) Prediction of LDL cholesterol response to statin using transcriptomic and genetic variation. Genome Biol 15:460

Showing the most recent 10 out of 15 publications

Comments

Be the first to comment on Peter Bickel's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: