Many problems in biomedical science are addressed with a combination of experimental and computational techniques. Data integration is a tremendously important and challenging biostatistical problem with many context-dependent issues. This is especially true in the analysis of high- throughput genomic studies, where a recurring challenge is the integration of gene-level experimental data with functional information representing existing knowledge of biological properties of genes and their products. A number of procedures are available for this type of data integration, but they are critically limited by grossly oversimplified assumptions about the functional record. Our long-term objective is to develop powerful statistical tools that reflect biological realities and that allow the user to efficiently identify salient functional signals. To this end, the proposed project develops and evaluates a computational-statistical method for revealing the essential functional content of experimental data. A springboard for the work is a novel generative model for gene-level data written in terms of functional activity. At present, the inferences derived from this role model are limited by their reliance on Markov chain Monte Carlo sampling, approximate marginal posterior rankings, and data format. A central contribution of the proposed project is to adapt advanced techniques from probabilistic graphical modeling and to demonstrate their feasibility in extending and improving inferences from the basic role model. The first specific aim is to express all components of the basic model in the appropriate graphical terms, and to experiment computationally with functional restrictions that will enhance the method.
The second aim i s to advance the underlying generative model, first by expressing it with less complex graphs through a novel gene-centered reparameterization, and secondly by removing limitations of the data format. If successful, the project will enable biomedical scientists to more effectively express the functional content of their experimental results, and it will further the information science of genomic data integration.

Public Health Relevance

Biomedical investigators who measure genomic data cannot fully interpret that data without considering how it relates to the wealth of existing biological knowledge on the same genes being measured. Owing to intrinsic sources of variation and incomplete information, the task of combining experimental and functional data falls in the realm of biostatistical analysis. The proposed project will assess the feasibility of a promising function-centered approach to this problem that will allow genomic scientists to better understand the functional content of their experimental data.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Exploratory/Developmental Grants (R21)
Project #
5R21HG006568-02
Application #
8426087
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Bonazzi, Vivien
Project Start
2012-02-15
Project End
2014-01-31
Budget Start
2013-02-01
Budget End
2014-01-31
Support Year
2
Fiscal Year
2013
Total Cost
$184,085
Indirect Cost
$59,085
Name
University of Wisconsin Madison
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
161202122
City
Madison
State
WI
Country
United States
Zip Code
53715
Henderson, Nicholas C; Newton, Michael A (2016) Making the cut: improved ranking and selection for large-scale inference. J R Stat Soc Series B Stat Methodol 78:781-804
Newton, Michael A; Wang, Zhishi (2015) Multiset Statistics for Gene Set Analysis. Annu Rev Stat Appl 2:95-111
Barger, Jamie L; Anderson, Rozalyn M; Newton, Michael A et al. (2015) A conserved transcriptional signature of delayed aging and reduced disease vulnerability is partially mediated by SIRT3. PLoS One 10:e0120738
Hose, James; Yong, Chris Mun; Sardi, Maria et al. (2015) Dosage compensation can buffer copy-number variation in wild yeast. Elife 4:
Pei, Qinglin; Zuleger, Cindy L; Macklin, Michael D et al. (2014) A conditional predictive p-value to compare a multinomial with an overdispersed multinomial in the analysis of T-cell populations. Biostatistics 15:129-39
Hao, Linhui; Lindenbach, Brett; Wang, Xiaofeng et al. (2014) Genome-wide analysis of host factors in nodavirus RNA replication. PLoS One 9:e95799
Paul Olson, Terrah J; Hadac, Jamie N; Sievers, Chelsie K et al. (2014) Dynamic tumor growth patterns in a novel murine model of colorectal cancer. Cancer Prev Res (Phila) 7:105-13
Hao, Linhui; He, Qiuling; Wang, Zhishi et al. (2013) Limited agreement of independent RNAi screens for virus-required host genes owes more to false-negative than false-positive factors. PLoS Comput Biol 9:e1003235