The goal of cancer pharmacoepidemiology is to identify adverse and/or long-term effects of chemotherapeutic agents and determine the impact of drugs on cancer risk, prevention, and response to treatments. Pharmacoepidemiology studies exert strong influence on defining optimal treatments and accelerating translational research. Therefore, it is imperative for these to be done efficiently and leveraging real-world patient data such as electronic health records (EHR). Massive clinical data from EHRs are being tapped into for research in disease-gene associations, comparative effectiveness and clinical outcomes. There is however paucity in pharmacoepidemiological studies using comprehensive EHR data due to the inherent challenges that exist for data abstraction, handling and analysis. The hurdles include heterogeneity of reports, embedding of detailed clinical information in narrative text, differing EHR platforms across different sites and missing data to name a few. In this study, we propose to integrate and extend preexisting tools to build an informatics infrastructure for EHR data extraction, interpretation, management and analysis to advance cancer pharmacoepidemiology research. We will leverage existing tools of natural language processing (NLP), standardized ontologies and clinical data management systems to extract and manipulate EHR data for cancer pharmacoepidemiological research. To achieve our goal we propose four specific aims.
In aim 1, we intend to develop a high-performance, user- centric information extraction framework with advanced features such as active learning (to reduce annotation cost), domain adaptation (to transfer data across multiple sites) and user-friendly interfaces (for non-technical end users).
In aim 2, we plan to improve data harmonization across differing platforms, develop components for seamless data export as well as expand methodologies to address impediments inherent to EHR-based data (such as the missing data problem).
In aim 3, we will conduct demonstration projects of cancer pharmacoepidemiology including pharmacovigilance and pharmacogenomics of chemotherapeutic agents to evaluate, refine and validate the broad uses of our tools. Finally in aim 4, we propose to disseminate the methods and tools developed in this project to the cancer research and pharmacoepidemiology communities.

Public Health Relevance

In this project, we propose to integrate and extend previously developed tools to build an informatics infrastructure for electronic health records (EHR) data extraction, interpretation, management, and analysis, to advance cancer pharmacoepidemiology research. Such methods can efficiently integrate and standardize cancer pharmacoepidemiology specific information from EHRs across different sites, thus advancing research in this field.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Resource-Related Research Projects--Cooperative Agreements (U24)
Project #
5U24CA194215-04
Application #
9774751
Study Section
Special Emphasis Panel (ZCA1)
Program Officer
Friedman, Steve
Project Start
2016-09-01
Project End
2021-08-31
Budget Start
2019-09-01
Budget End
2020-08-31
Support Year
4
Fiscal Year
2019
Total Cost
Indirect Cost
Name
University of Texas Health Science Center Houston
Department
Type
Sch Allied Health Professions
DUNS #
800771594
City
Houston
State
TX
Country
United States
Zip Code
77030
Lee, Hee-Jin; Zhang, Yaoyun; Jiang, Min et al. (2018) Identifying direct temporal relations between time and events from clinical notes. BMC Med Inform Decis Mak 18:49
Malty, Andrew M; Jain, Sandeep K; Yang, Peter C et al. (2018) Computerized Approach to Creating a Systematic Ontology of Hematology/Oncology Regimens. JCO Clin Cancer Inform 2018:
Wang, Liwei; Rastegar-Mojarad, Majid; Ji, Zhiliang et al. (2018) Detecting Pharmacovigilance Signals Combining Electronic Medical Records With Spontaneous Reports: A Case Study of Conventional Disease-Modifying Antirheumatic Drugs for Rheumatoid Arthritis. Front Pharmacol 9:875
Amith, Muhammad; He, Zhe; Bian, Jiang et al. (2018) Assessing the practice of biomedical ontology evaluation: Gaps and opportunities. J Biomed Inform 80:1-13
Soysal, Ergin; Wang, Jingqi; Jiang, Min et al. (2017) CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines. J Am Med Inform Assoc :
Wu, Yonghui; Jiang, Min; Xu, Jun et al. (2017) Clinical Named Entity Recognition Using Deep Learning Models. AMIA Annu Symp Proc 2017:1812-1819
Lee, Hee-Jin; Zhang, Yaoyun; Roberts, Kirk et al. (2017) Leveraging existing corpora for de-identification of psychiatric notes using domain adaptation. AMIA Annu Symp Proc 2017:1070-1079
Lee, Hee-Jin; Wu, Yonghui; Zhang, Yaoyun et al. (2017) A hybrid approach to automatic de-identification of psychiatric notes. J Biomed Inform 75S:S19-S27
Gregg, Justin R; Lang, Maximilian; Wang, Lucy L et al. (2017) Automating the Determination of Prostate Cancer Risk Strata From Electronic Medical Records. JCO Clin Cancer Inform 1:
Huang, Jing; Duan, Rui; Hubbard, Rebecca A et al. (2017) PIE: A prior knowledge guided integrated likelihood estimation method for bias reduction in association studies using electronic health records data. J Am Med Inform Assoc :