Many problems in biomedical science are addressed with a combination of experimental and computational techniques. Data integration is a tremendously important and challenging biostatistical problem with many context-dependent issues. This is especially true in the analysis of high- throughput genomic studies, where a recurring challenge is the integration of gene-level experimental data with functional information representing existing knowledge of biological properties of genes and their products. A number of procedures are available for this type of data integration, but they are critically limited by grossly oversimplified assumptions about the functional record. Our long-term objective is to develop powerful statistical tools that reflect biological realities and that allow the user to efficiently identify salient functional signals. To this end, the proposed project develops and evaluates a computational-statistical method for revealing the essential functional content of experimental data. A springboard for the work is a novel generative model for gene-level data written in terms of functional activity. At present, the inferences derived from this role model are limited by their reliance on Markov chain Monte Carlo sampling, approximate marginal posterior rankings, and data format. A central contribution of the proposed project is to adapt advanced techniques from probabilistic graphical modeling and to demonstrate their feasibility in extending and improving inferences from the basic role model. The first specific aim is to express all components of the basic model in the appropriate graphical terms, and to experiment computationally with functional restrictions that will enhance the method.
The second aim i s to advance the underlying generative model, first by expressing it with less complex graphs through a novel gene-centered reparameterization, and secondly by removing limitations of the data format. If successful, the project will enable biomedical scientists to more effectively express the functional content of their experimental results, and it will further the information science of genomic data integration.
Biomedical investigators who measure genomic data cannot fully interpret that data without considering how it relates to the wealth of existing biological knowledge on the same genes being measured. Owing to intrinsic sources of variation and incomplete information, the task of combining experimental and functional data falls in the realm of biostatistical analysis. The proposed project will assess the feasibility of a promising function-centered approach to this problem that will allow genomic scientists to better understand the functional content of their experimental data.