Although the sequencing of the human genome promised to transform basic and translational research, the catalogue of human genes that it delivered remain far from the type of predictive models that would allow us to rapidly develop hypotheses about the response of biological systems to perturbations arising from various sources. One of the emerging principles in biology is the understanding that it is generally not individual genes but rather biological pathways and networks that drive an organism's response and the development of its particular phenotype. We know that there are many biologically significant networks, ranging from metabolic networks to signal transduction networks to transcriptional regulatory networks, among others. In order to fully understand organisms and the manner in which they play out their genetic programs, we must develop tools and approaches that can help us comprehend not only the structure of the networks that exist, but also the rules that govern their behavior and the interactions between elements in each biological system. It was hoped that technologies arising from the Human Genome Project, such as DNA microarrays, would provide data that would allow such network models to be developed. While these technologies have delivered vast quantities of data on a wide range of biological systems and disease models, those datasets, generally analyzed in isolation, have not yet led to the types of predictive models. What most analytical methods have ignored is the best resource we have for developing predictive models: the collection of existing prior knowledge captured in published biomedical literature. Here we are proposing to develop a new method to extract prior knowledge from that published literature, as well as from other sources, and to use that information to develop preliminary network """"""""seeds"""""""" that can jump-start the process of building predictive network models. Using a Bayesian Network framework, we propose to create phenomenological models that allow us to make predictions and to then verify those predictions through direct laboratory experiments. The end goals will not only be to create the framework for creating such predictive models, but to develop software and tools to allow others in the community to create models for their own systems of interest.
The Human Genome Project and technologies that were developed through it have given us vast quantities of data on biological systems, but we have not yet been able to take maximum advantage of it because we lack critical tools and understanding. The published biomedical literature in PubMed represents the collective knowledge we have of biological systems, but methods do not exist to combine it with genomic data in a systematic fashion to use it effectively in analyzing those data. Here we are proposing to develop new approaches to data analysis that would combine what is known in the literature, extracted using advanced text mining techniques, together with genomic data to create predictive models that can be used to address a wide range of questions in basic and translational research directed toward improving human health and treatment of disease.
Showing the most recent 10 out of 17 publications