Many problems in biomedical science are addressed with a combination of experimental and computational techniques. Data integration is a tremendously important and challenging biostatistical problem with many context-dependent issues. This is especially true in the analysis of high- throughput genomic studies, where a recurring challenge is the integration of gene-level experimental data with functional information representing existing knowledge of biological properties of genes and their products. A number of procedures are available for this type of data integration, but they are critically limited by grossly oversimplified assumptions about the functional record. Our long-term objective is to develop powerful statistical tools that reflect biological realities and that allow the user to efficiently identify salient functional signals. To this end, the proposed project develops and evaluates a computational-statistical method for revealing the essential functional content of experimental data. A springboard for the work is a novel generative model for gene-level data written in terms of functional activity. At present, the inferences derived from this role model are limited by their reliance on Markov chain Monte Carlo sampling, approximate marginal posterior rankings, and data format. A central contribution of the proposed project is to adapt advanced techniques from probabilistic graphical modeling and to demonstrate their feasibility in extending and improving inferences from the basic role model. The first specific aim is to express all components of the basic model in the appropriate graphical terms, and to experiment computationally with functional restrictions that will enhance the method.
The second aim i s to advance the underlying generative model, first by expressing it with less complex graphs through a novel gene-centered reparameterization, and secondly by removing limitations of the data format. If successful, the project will enable biomedical scientists to more effectively express the functional content of their experimental results, and it will further the information science of genomic data integration.

Public Health Relevance

Biomedical investigators who measure genomic data cannot fully interpret that data without considering how it relates to the wealth of existing biological knowledge on the same genes being measured. Owing to intrinsic sources of variation and incomplete information, the task of combining experimental and functional data falls in the realm of biostatistical analysis. The proposed project will assess the feasibility of a promising function-centered approach to this problem that will allow genomic scientists to better understand the functional content of their experimental data.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Exploratory/Developmental Grants (R21)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Bonazzi, Vivien
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Wisconsin Madison
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code
Hao, Linhui; Lindenbach, Brett; Wang, Xiaofeng et al. (2014) Genome-wide analysis of host factors in nodavirus RNA replication. PLoS One 9:e95799
Pei, Qinglin; Zuleger, Cindy L; Macklin, Michael D et al. (2014) A conditional predictive p-value to compare a multinomial with an overdispersed multinomial in the analysis of T-cell populations. Biostatistics 15:129-39
Paul Olson, Terrah J; Hadac, Jamie N; Sievers, Chelsie K et al. (2014) Dynamic tumor growth patterns in a novel murine model of colorectal cancer. Cancer Prev Res (Phila) 7:105-13
Hao, Linhui; He, Qiuling; Wang, Zhishi et al. (2013) Limited agreement of independent RNAi screens for virus-required host genes owes more to false-negative than false-positive factors. PLoS Comput Biol 9:e1003235