High throughput experiments are now frequently part of clinical trials and many institutions establish prospective biospecimen collections from routine patient populations in order to run proteomics and metabolomics experiments. Because these samples are collected from routine clinical populations with co-morbidities and on drug therapies, it is necessary to account for the drugs in the data analysis in a systematic manner. Increasingly multiple high-throughput technologies, such as proteomics and metabolomics, are being used together which requires comprehensive pathway networks that make changes in the metabolome traceable to the proteome and in turn to the gene expression profile. In this study we propose to link drug therapies found in the clinical data with biologic pathways data used for the integrated analysis of high-throughput experimental results. Specifically we will integrate formal knowledge of biological pathways with drug knowledge found in the emergent national standard for the comprehensive knowledge representation for medicinal products, including the Veteran Administrations National Drug File Reference Terminology (NDF-RT), the National Drug Codes and the RxNorm clinical drug vocabulary. The hub of these terminologies is the Structured Product Labeling (SPL) standard for drug knowledge representation. SPL is part of the HL7 version 3 standards and based on the HL7 Reference Information Model (RIM). All U.S. pharmaceutical manufacturers today submit SPL data to the Food and Drug Agency (FDA). Pharmacodynamic knowledge is represented in SPL as a using the NDF-RT mechanism of action (MoA) classes. In this project we will expand these MoA classes with their ontological definitions, and thus link to the biologic pathways described in various pathway network resources including KEGG, Reactome, and the NCI/Nature Protein Interaction Database (PID), all of which use different formats and models. Our methodology for integration consists of (1) transforming the original pathway resource into a common data schema, and (2) purpose-driven reconciliation of overlapping content, by (3) connecting the MoA classes. Because no single pathway data schema exists and because SPL is already the hub of the national federated drug terminology, we propose to integrate the pathway data into the drug knowledge base itself. The resulting integrated data will be evaluated against the frequency of drugs encountered in clinical care and research. This work will yield revisions of existing ontologies and, only where necessary, new ontologies to describe drug- pathway interactions. The project will demonstrate how the HL7/ISO Reference Information Model can represent biological entities and processes taking a realist perspective, and thus set an important precedence for cross-domain data integration between basic sciences and clinical medicine that is essential for the translational research agenda.

Public Health Relevance

The project will combine the national drug catalog with models of the biochemical regulation and metabolism of body functions. This will allow researchers to understand better the effect which drugs have on the results of laboratory tests which measure a large number of proteins and chemicals in the body at the same time.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BST-G (50))
Program Officer
Lyster, Peter
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Indiana University-Purdue University at Indianapolis
Schools of Arts and Sciences
United States
Zip Code