Metabolic pathway databases provide a biological framework in which relationships among an organism's genes may be revealed. This context can be exploited to boost the accuracy of genome annotation, to discover new targets for therapeutics, or to engineer metabolic pathways in bacteria to produce a historically expensive drug cheaply and quickly. But, knowledge of metabolism in ill-characterized species is limited and dependent on computational predictions of pathways. Our ultimate target is to develop methods for the prediction of novel metabolic pathways in any organism, coupled with robust assessment of the validity of any predicted pathway. We hypothesize that integrating evidence from multiple levels of an organism's metabolic network - from the fit of a pathway within the network to evolutionary relationships between pathways - will allow us to assess pathway validity and to predict novel metabolic pathways. We have successfully applied machine learning methods to the problem of identifying missing enzymes in metabolic pathways and believe similar methods will prove fruitful in this application. Our preliminary studies have identified several properties of predicted metabolic pathways that differ between sets of true positive pathway predictions (i.e., pathways known to occur in an organism) and sets of false positive pathway predictions. We will expand on these features and develop methods to address the following specific aims: 1) Identify features that are informative in distinguishing between correct and incorrect pathway predictions in computationally-generated pathway/genome databases based on predictions for highly-curated organisms (e.g., Escherichia coli and Arabidopsis thaliana). 2) Develop methods for computing the probability that a pathway is correctly predicted. Informative features identified in Specific Aim #1 will be integrated into a classifier that will compute the probability that a predicted pathway is correct given the associated evidence. 3) Extend the Pathologic program (the Pathway Tools algorithm used to infer the metabolic network of an organism) to predict alternate, previously unknown pathways in an organism. We will search the MetaCyc reaction space (comprising almost 6000 reactions) for novel subpathways, explicitly constraining our search using organism-specific evidence (i.e., homology, experimental evidence, etc.) at each step.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM009651-03
Application #
7685518
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2007-09-15
Project End
2011-09-14
Budget Start
2009-09-15
Budget End
2011-09-14
Support Year
3
Fiscal Year
2009
Total Cost
$175,647
Indirect Cost
Name
Sri International
Department
Type
DUNS #
009232752
City
Menlo Park
State
CA
Country
United States
Zip Code
94025
Karp, Peter D; Latendresse, Mario; Caspi, Ron (2011) The pathway tools pathway prediction algorithm. Stand Genomic Sci 5:424-9
Ferrer, Luciana; Shearer, Alexander G; Karp, Peter D (2011) Discovering novel subsystems using comparative genomics. Bioinformatics 27:2478-85
Dale, Joseph M; Popescu, Liviu; Karp, Peter D (2010) Machine learning methods for metabolic pathway prediction. BMC Bioinformatics 11:15
Karp, Peter D; Paley, Suzanne M; Krummenacker, Markus et al. (2010) Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology. Brief Bioinform 11:40-79