Princeton University is awarded a grant from the Faculty Early Career Development program (CAREER) to develop an integrated computational and experimental approach for modeling biological pathways and networks. This technology will consist of three integrated components: a computational component for generalizable, efficient, and accurate integration of diverse genomic data, an analytical component for network/pathway modeling based on the integrated data, and an experimental component for validation and feedback. The integrated analysis of diverse genomic data and experimental verification will allow iterative refinement of computational methods and lead to highly accurate network-level pathway models that can serve as a scaffold for mechanistic models of complex biological processes. The key contribution of this work is in the tight integration of computational modeling and experimental testing to create a combined approach that uses iterative refinement of predictions to improve both the models and the algorithms. The success of this integrated approach will lead to more accurate and complete models of biological processes and pathways than those created by purely computational methods, and yet it will be substantially faster than study of the same processes by experimentation alone. The interdisciplinary nature of this proposal will further the impact of advanced computer science on biology and will precipitate further interactions between the two fields, both through research and through interdisciplinary education. In concert with this research program, two graduate courses in bioinformatics will be developed. The PI will also continue to participate in development and teaching of a cross-disciplinary genomics curriculum for undergraduates in collaboration with biology, physics, and chemistry faculty at the Lewis-Sigler Institute for Integrative Genomics. Both undergraduate and graduate curricular materials developed at Princeton will be made available via the Internet. In addition, a systems biology symposium at Princeton University will be organized to catalyze collaboration among computational and experimental researchers and to introduce more students to systems biology.
1. Integrated approaches for characterization of gene function and regulation. We developed novel technologies for prediction of protein function from diverse high-throughput data. By integrating computational methods with experiments in an iterative framework, we discovered and experimentally validated 99 yeast proteins previously not known to be involved in mitochondrial biogenesis and inheritance. Over half of these have a human ortholog, including disease-related genes. In metazoans, we demonstrated that tissue-specific gene expression can be accurately predicted from whole-animal data. We experimentally verified novel patterns of tissue-specific expression for several genes and showed that our method is more accurate than most high-throughput experimental studies of tissue-specific expression in C. elegans, when evaluated by single-gene GFP, in situ, or antibody measurements. 2. Identification of biological networks and pathways from diverse functional data. We develped approaches for Bayesian integration for functional genomics data, and introduced the notion of context-specificity to functional genomic data integration. We applied these methods to model organisms and humans, producing public interactive functional network prediction systems. Our regularized Bayesian integration system HEFalMp provides maps of functional activity and interactions in over 200 areas of human cellular biology, summarizing over 30,000 genome-scale experiments from biologically informative perspectives: prediction of protein function and functional modules, cross-talk among biological processes, and novel associations of genes and pathways with genetic disorders. Our extensions of these approaches include tissue-specific network prediction and the study of network dynamics in development (in C. elegans and A. thaliana). To enable pathway-level analysis on the genome scale, we developed methodology for simultaneous inference of physical, genetic, regulatory, and functional pathway components, and used these predictions to characterize systems-level properties of pathways. 3. Computationally-directed experimentation. In addition to routinely coupling computation and experiments, we developed a directed strategy to systematically plan experiments in poorly studied species (e.g. humans or newly sequenced genomes) based on information in closely related model organisms. We demonstrated the efficacy of this approach by designing, performing (with collaborators), and evaluating S. bayanus microarray experiments using the S. cerevisiae data repository. This planning process reduced the labor of microarray experiments by ten-fold, while achieving similar functional coverage. The resulting compendium allowed accurate genome-wide functional characterization of S. bayanus and enabled systematic functional evolution comparisons.