Nonparametric methods for functional and translational genomics

Brown, James

Abstract

Next generation sequencing has revealed the molecular landscape of cells in unprecedented detail. However, for the massively large-scale data produced by assays based on these technologies, informativeness is not only a function of wet-lab technology, but is critically also a function of the analytical pipelines that interpret the data. Our group has developed four statistical tools designed maximize the informativeness of these assays: 1) the Genome Structural Correction (GSC), a nonparametric model of genomic annotations used to assess the significance of relationships between features;2) the Irreproducible Discovery Rate (IDR), an analogue of the FDR that leverages information from biological replicates;3) Statmap, a comprehensive analysis pipeline for ChIP-seq and CAGE data that propagates statistical confidence from base-calling to peak-calling;and 4) Sparse Linear Isoform Discovery and abundance Estimation (SLIDE), an integrative statistical framework for the analysis of RNA-seq, cDNA, and other RNA data aimed at obtaining and quantifying de novo transcript models. These tools are designed to identify and characterize functional elements in genomes;they make minimal assumptions about the data they analyze, and therefore draw reliable conclusions and measures of statistical confidence. During the K99, we will expand and integrate our tools to extend the reach of statistical confidence throughout data interpretatoin. During the R00, my research will progress toward the inference and assessment of biological networks. Just as ortholog identification has become an essential step in developing animal models of human disease, multi-species network analysis promises to become a key step in interpreting the relationship between genome variation and phenotype. Many mutations, even gene deletions, do not reveal an obvious phenotype. This is due to network robustness, which often differs between closely related species. To understand these phenomena, we aim to: 1) develop standard statistical tools for network inference, and 2) develop """"""""meta models"""""""" of networks that will permit general measures of network orthology. These two aims are tightly linked: we will need critically to characterize the semantics of biological networks to model them. Currently, some models lack consistent definitions of edges and weights, resulting in untestable representations of genomics data. you've managed to have a relaxing weekend!We will develop testable, quantitative models of biological processes, establishing a uniform semantics leveraging the rich theory of complex systems. Each of the tools above will play a key role, especially Statmap and the GSC, which will be needed to propagate statistical confidence into network analysis. Advances will have a transformative effect on our ability to map animal models of disease onto human biology. Nearly nine out of ten new drugs fail in human trials due to issues (e.g. toxicity) not present in animal models. Understanding the orthology not just of individual genes, but of entire biochemical networks will be essential to infer and correct for differences between models of disease and human biology. Solving this problem will be a major step forward in the march from base-pairs to bedside.

Public Health Relevance

This proposal outlines training and mentoring plans that emphasize modern nonparametric statistical theory, developmental biology, and hands-on wet-lab techniques. The goal is to produce an independent investigator who functions as a nexus of communication between data producers and data analysts;who is able to recognize and to solve otherwise orphan problems: important biological questions that require advances in statistical theory to be well-answered. The statistical tools that the candidate will generate during the award will lead to testable, quantitative models of biological processes, with the ultimate goal of establishing a uniform semantics for biological network analysis that leverages the increasingly rich theory of complex systems.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Transition Award (R00)
Project #: 4R00HG006698-03
Application #: 8898935
Study Section: Special Emphasis Panel (NSS)
Program Officer: Pazin, Michael J

Project Start: 2014-08-25
Project End: 2017-05-31
Budget Start: 2014-08-25
Budget End: 2015-05-31
Support Year: 3
Fiscal Year: 2014
Total Cost: $249,000
Indirect Cost: $110,930

Institution

Name: Lawrence Berkeley National Laboratory
Department: Biochemistry
Type: Organized Research Units
DUNS #: 078576738

City: Berkeley
State: CA
Country: United States
Zip Code: 94720

Related projects


NIH 2016 R00 HG	Nonparametric methods for functional and translational genomics Brown, James Bentley / Lawrence Berkeley National Laboratory
NIH 2015 R00 HG	Nonparametric methods for functional and translational genomics Brown, James Bentley / Lawrence Berkeley National Laboratory
NIH 2014 R00 HG	Nonparametric methods for functional and translational genomics Brown, James Bentley / Lawrence Berkeley National Laboratory	$249,000

Publications

Baillie, J Kenneth; Bretherick, Andrew; Haley, Christopher S et al. (2018) Shared activity patterns arising at genetic susceptibility loci reveal underlying genomic and cellular architecture of human disease. PLoS Comput Biol 14:e1005934

Kvist, Jouni; Gonçalves Athanàsio, Camila; Shams Solari, Omid et al. (2018) Pattern of DNA Methylation in Daphnia: Evolutionary Perspective. Genome Biol Evol 10:1988-2007

Basu, Sumanta; Kumbier, Karl; Brown, James B et al. (2018) Iterative random forests to discover predictive and stable high-order interactions. Proc Natl Acad Sci U S A 115:1943-1948

Orsini, Luisa; Brown, James B; Shams Solari, Omid et al. (2018) Early transcriptional response pathways in Daphnia magna are coordinated in networks of crustacean-specific genes. Mol Ecol 27:886-897

Parra, Marilyn; Booth, Ben W; Weiszmann, Richard et al. (2018) An important class of intron retention events in human erythroblasts is regulated by cryptic exons proposed to function as splicing decoys. RNA 24:1255-1265

Miyano, Masaru; Sayaman, Rosalyn W; Stoiber, Marcus H et al. (2017) Age-related gene expression in luminal epithelial cells is driven by a microenvironment made from myoepithelial cells. Aging (Albany NY) 9:2026-2051

Orsini, Luisa; Gilbert, Donald; Podicheti, Ram et al. (2017) Daphnia magna transcriptome by RNA-Seq across 12 environmental stressors. Sci Data 4:170006

Zhang, Weiguo; Mao, Jian-Hua; Zhu, Wei et al. (2016) Centromere and kinetochore gene misexpression predicts cancer patient survival and response to radiotherapy and chemotherapy. Nat Commun 7:12619

Stoiber, Marcus; Celniker, Susan; Cherbas, Lucy et al. (2016) Diverse Hormone Response Networks in 41 Independent Drosophila Cell Lines. G3 (Bethesda) 6:683-94

Stoiber, Marcus H; Olson, Sara; May, Gemma E et al. (2015) Extensive cross-regulation of post-transcriptional regulatory networks in Drosophila. Genome Res 25:1692-702

Showing the most recent 10 out of 12 publications

Comments

Be the first to comment on James Brown's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: