Even state-of-the-art homology methods cannot annotate metabolic genes with no or remote sequence identity to known enzymes. This presents a significant obstacle to network reconstruction, as about 30%- 40% (>1500) of known metabolic activities remain orphan, i.e. there are no known proteins catalyzing these activities in any organism. The scale of the orphan activities problem makes it arguably the single biggest challenge of modern biochemistry. We propose to develop, experimentally validate, and make available to the scientific community an efficient computational approach to fill the remaining gaps in metabolic networks. The main idea of the proposed method is to use genes assigned to the network neighbors of the remaining gaps as constraints in assigning genes for orphan activities. We demonstrate that this approach significantly outperforms simpler or existing methods. Our cross-validated results in model organisms demonstrate that the proposed method can predict the correct genes in more than 50% of the cases, without any sequence homology information. The calculations indicate that the prediction accuracy will also remain high in less studied organisms. Using the developed method we have already identified and validated a gene responsible for an E. coli metabolic activity which remained orphan for more than 25 years. There are four specific aims of the proposal: 1.) We will calculate the appropriate context-based descriptors of protein function for the majority of sequenced organisms. Many new functional descriptors will be developed and used for the predictions. 2.) We will investigate the ability of various machine learning approaches and fitness functions to integrate context-based descriptors. Based on the developed methodology we will make predictions for all orphan activities in sequenced organisms. 3.) The predictions will be available through a searchable and constantly updated Web server. We will also develop a method to detect functional misannotations and apply it to all public metabolic databases. 4.) In collaboration with the laboratories of Dr. Uwe Sauer (ETH Zurich) and Dr. George Church (Harvard) we will experimentally test at least 50 of the predicted genes without close sequence homologs in E. coli, B. subtilis, S. cerevisiae.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM079759-03
Application #
7653790
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Anderson, James J
Project Start
2007-09-14
Project End
2011-06-30
Budget Start
2009-07-01
Budget End
2011-06-30
Support Year
3
Fiscal Year
2009
Total Cost
$397,297
Indirect Cost
Name
Columbia University (N.Y.)
Department
Internal Medicine/Medicine
Type
Schools of Medicine
DUNS #
621889815
City
New York
State
NY
Country
United States
Zip Code
10032
Plata, Germán; Henry, Christopher S; Vitkup, Dennis (2015) Long-term phenotypic evolution of bacteria. Nature 517:369-72
Plata, Germán; Vitkup, Dennis (2014) Genetic robustness and functional evolution of gene duplicates. Nucleic Acids Res 42:2405-14
Hu, Jie; Locasale, Jason W; Bielas, Jason H et al. (2013) Heterogeneity of tumor-induced gene expression changes in the human metabolic network. Nat Biotechnol 31:522-9
Gilman, Sarah R; Chang, Jonathan; Xu, Bin et al. (2012) Diverse types of genetic variation converge on functional gene networks involved in schizophrenia. Nat Neurosci 15:1723-8
Plata, German; Fuhrer, Tobias; Hsiao, Tzu-Lin et al. (2012) Global probabilistic annotation of metabolic networks enables enzyme discovery. Nat Chem Biol 8:848-54
Gilman, Sarah R; Iossifov, Ivan; Levy, Dan et al. (2011) Rare de novo variants associated with autism implicate a large functional network of genes involved in formation and function of synapses. Neuron 70:898-907
Plata, Germán; Gottesman, Max E; Vitkup, Dennis (2010) The rate of the molecular clock and the cost of gratuitous protein synthesis. Genome Biol 11:R98
de Hoon, Michiel J L; Eichenberger, Patrick; Vitkup, Dennis (2010) Hierarchical evolution of the bacterial sporulation network. Curr Biol 20:R735-45
Hsiao, Tzu-Lin; Revelles, Olga; Chen, Lifeng et al. (2010) Automatic policing of biochemical annotations using genomic correlations. Nat Chem Biol 6:34-40
Chastanet, Arnaud; Vitkup, Dennis; Yuan, Guo-Cheng et al. (2010) Broadly heterogeneous activation of the master regulator for sporulation in Bacillus subtilis. Proc Natl Acad Sci U S A 107:8486-91

Showing the most recent 10 out of 12 publications