The real-time chemistry of life occurs at the interfaces of proteins and small chemicals present in the cellular environment. For the first time in history, sufficient information on the human proteome and its interacting chemogenome has accumulated in the form of the 3D protein structural database and the NIH Pubchem chemical library. In addition, 'complementarity algorithms', which accurately assess the energetic complementarity, or fit, between any chemical and any protein surface have recently become available, as has the computing power to deploy these tools in a high-throughput manner. Finally, systems biology methods have evolved to the point that the complete network of interactions of the real-time chemistry of life may be visualized and explored once complementarity algorithms have cross-scored all the chemicals to all the proteins. We propose here to build a chemical biology network to unify these data, along with an intuitive web interface for easy use by non-quantitative biomedical investigators. The network and accompanying interface, once deployed, represents a critical infrastructure that could substantially accelerate collaborative, multi- and interdisciplinary basic, translational, and/or clinical research, specifically by enabling new avenues of drug development for personalized medicine and traditional drug development efforts. Personalized medicine seeks to identify the genetic profiles of patients-frequently highly complex probabilistic expression profiles from microarray data--that correlate with optimal therapy responses. Essentially, the proposed chemical biology network may be queried with one of these profiles and a list of biologically matched drug-like chemical compounds would instantly be retrieved. Such a tool would have a profound impact on the rapidity of clinical trial completion and drug approval, as history has shown that clinical trials of drugs matched to biomarkers, such as her2-neu and Herceptin, proceed more rapidly and succeed more often at a much reduced cost. In addition, the proposed chemical biology network may be queried by lead compound and chemically diverse compounds with similar biological activity may be retrieved instantly. This, too, may accelerate drug approval as advancing a diverse portfolio of leads in parallel for a specific therapy is more likely to succeed rapidly than advancing a single chemical class. The challenge addressed by taking on the grand opportunity targeted by this proposal is elegantly articulated by the FDA's Critical Path Initiative (, a forward- looking challenge that faces both the NIH and the FDA jointly, for which there are currently few solutions on the horizon. This specific opportunity benefits from the readiness of all the required elements, so that, despite the enormous potential impact, the milestones are highly achievable within the ARRA timeframe of two years. Once built, this network has the combined advantage of low overhead maintenance for future years, and multiple highly applicable funding opportunities for expansion at the local, commercial and governmental level. The infrastructure, once built, is thus highly sustainable and extensible for continued utility in the U.S. biomedical research enterprise. )

Public Health Relevance

As the RCSB Protein Data Bank of 3D structures nears 60,000 entries (with the human genome estimated to contain approximately 20,000 genes) and the NIH's Pubchem database nears 20 million chemical compounds, the possibility exists that the majority of the 3D structures of life as well as an informative sample of the diversity of chemical space is readily available to us in 2009. The grand opportunity thus presents itself to cross-connect these two elemental bioscientific databases into a single chemical biology network of direct links between genes, protein targets, and potential chemical therapeutics. Using high-throughput computational methods, we will build this network of relationships as a transformative tool for personalized medicine and drug discovery.

National Institute of Health (NIH)
National Library of Medicine (NLM)
High Impact Research and Research Infrastructure Programs (RC2)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-HDM-E (99))
Program Officer
Ye, Jane
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
New York University
Schools of Medicine
New York
United States
Zip Code
Chen, Yu-Chen; Totrov, Max; Abagyan, Ruben (2014) Docking to multiple pockets or ligand fields for screening, activity prediction and scaffold hopping. Future Med Chem 6:1741-55
Gabrielsen, Mari; Kurczab, Rafał; Siwek, Agata et al. (2014) Identification of novel serotonin transporter compounds by virtual screening. J Chem Inf Model 54:933-43
Acharya, Chayan; Kufareva, Irina; Ilatovskiy, Andrey V et al. (2014) PeptiSite: a structural database of peptide binding sites in 4D. Biochem Biophys Res Commun 445:717-23
Gabrielsen, Mari; Wołosewicz, Karol; Zawadzka, Anna et al. (2013) Synthesis, antidepressant evaluation and docking studies of long-chain alkylnitroquipazines as serotonin transporter inhibitors. Chem Biol Drug Des 81:695-706
Clark, Neil R; Dannenfelser, Ruth; Tan, Christopher M et al. (2012) Sets2Networks: network inference from repeated observations of sets. BMC Syst Biol 6:89
Chen, Edward Y; Xu, Huilei; Gordonov, Simon et al. (2012) Expression2Kinases: mRNA profiling linked to multiple upstream regulatory layers. Bioinformatics 28:105-11
Kufareva, Irina; Ilatovskiy, Andrey V; Abagyan, Ruben (2012) Pocketome: an encyclopedia of small-molecule binding sites in 4D. Nucleic Acids Res 40:D535-40
Dannenfelser, Ruth; Clark, Neil R; Ma'ayan, Avi (2012) Genes2FANs: connecting genes through functional association networks. BMC Bioinformatics 13:156
Kou, Yan; Betancur, Catalina; Xu, Huilei et al. (2012) Network- and attribute-based classifiers can prioritize genes and pathways for autism spectrum disorders and intellectual disability. Am J Med Genet C Semin Med Genet 160C:130-42
He, John Cijiang; Chuang, Peter Y; Ma'ayan, Avi et al. (2012) Systems biology of kidney diseases. Kidney Int 81:22-39

Showing the most recent 10 out of 16 publications