Although the sequencing of the human genome promised to transform basic and translational research, the catalogue of human genes that it delivered remain far from the type of predictive models that would allow us to rapidly develop hypotheses about the response of biological systems to perturbations arising from various sources. One of the emerging principles in biology is the understanding that it is generally not individual genes but rather biological pathways and networks that drive an organism's response and the development of its particular phenotype. We know that there are many biologically significant networks, ranging from metabolic networks to signal transduction networks to transcriptional regulatory networks, among others. In order to fully understand organisms and the manner in which they play out their genetic programs, we must develop tools and approaches that can help us comprehend not only the structure of the networks that exist, but also the rules that govern their behavior and the interactions between elements in each biological system. It was hoped that technologies arising from the Human Genome Project, such as DNA microarrays, would provide data that would allow such network models to be developed. While these technologies have delivered vast quantities of data on a wide range of biological systems and disease models, those datasets, generally analyzed in isolation, have not yet led to the types of predictive models. What most analytical methods have ignored is the best resource we have for developing predictive models: the collection of existing prior knowledge captured in published biomedical literature. Here we are proposing to develop a new method to extract prior knowledge from that published literature, as well as from other sources, and to use that information to develop preliminary network """"""""seeds"""""""" that can jump-start the process of building predictive network models. Using a Bayesian Network framework, we propose to create phenomenological models that allow us to make predictions and to then verify those predictions through direct laboratory experiments. The end goals will not only be to create the framework for creating such predictive models, but to develop software and tools to allow others in the community to create models for their own systems of interest.

Public Health Relevance

The Human Genome Project and technologies that were developed through it have given us vast quantities of data on biological systems, but we have not yet been able to take maximum advantage of it because we lack critical tools and understanding. The published biomedical literature in PubMed represents the collective knowledge we have of biological systems, but methods do not exist to combine it with genomic data in a systematic fashion to use it effectively in analyzing those data. Here we are proposing to develop new approaches to data analysis that would combine what is known in the literature, extracted using advanced text mining techniques, together with genomic data to create predictive models that can be used to address a wide range of questions in basic and translational research directed toward improving human health and treatment of disease.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZLM1-AP-E (M3))
Program Officer
Ye, Jane
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Dana-Farber Cancer Institute
United States
Zip Code
Quackenbush, John (2014) Learning to share. Sci Am 311:S22
Olsen, Catharina; Fleming, Kathleen; Prendergast, Niall et al. (2014) Inference and validation of predictive gene networks from biomedical literature and gene expression data. Genomics 103:329-36
Adamia, Sophia; Bar-Natan, Michal; Haibe-Kains, Benjamin et al. (2014) NOTCH2 and FLT3 gene mis-splicings are common events in patients with acute myeloid leukemia (AML): new potential targets in AML. Blood 123:2816-25
Adamia, Sophia; Haibe-Kains, Benjamin; Pilarski, Patrick M et al. (2014) A genome-wide aberrant RNA splicing in patients with acute myeloid leukemia identifies novel potential disease markers and therapeutic targets. Clin Cancer Res 20:1135-45
Sharron Lin, Xuanhui; Hu, Lan; Sandy, Kirley et al. (2014) Differentiating progressive from nonprogressive T1 bladder cancer by gene expression profiling: applying RNA-sequencing analysis on archived specimens. Urol Oncol 32:327-36
Schroder, Markus S; Gusenleitner, Daniel; Quackenbush, John et al. (2013) RamiGO: an R/Bioconductor package providing an AmiGO visualize interface. Bioinformatics 29:666-8
Papillon-Cavanagh, Simon; De Jay, Nicolas; Hachem, Nehme et al. (2013) Comparison and validation of genomic predictors for anticancer drug sensitivity. J Am Med Inform Assoc 20:597-602
Correll, Mick; Johnson, Christopher K; Ferrari, Giovanni et al. (2013) Mutational analysis clopidogrel resistance and platelet function in patients scheduled for coronary artery bypass grafting. Genomics 101:313-7
Wang, Zhigang C; Birkbak, Nicolai Juul; Culhane, Aedín C et al. (2012) Profiles of genomic instability in high-grade serous ovarian cancer predict treatment outcome. Clin Cancer Res 18:5806-15
Desmedt, Christine; Majjaj, Samira; Kheddoumi, Naima et al. (2012) Characterization and clinical evaluation of CD10+ stroma cells in the breast cancer microenvironment. Clin Cancer Res 18:1004-14

Showing the most recent 10 out of 17 publications