MACE2K - Molecular And Clinical Extraction: A Natural Language Processing Tool for Personalized Medicine

Madhavan, Subha

Abstract

The velocity, variety, volume and veracity of data from relevant information sources make it extremely challenging for oncologists to collect and review pertinent data that can support routine personalized treatment for their patients. There is an urgent need to develop data wrangling approaches including Natural Language Processing and information retrieval methods to extract and curate personalized-therapy related publications and clinical trials. Once curated, the structured data can be used by biomedical researchers to generate novel scientific hypotheses, design new studies, obtain a better understanding of biological mechanisms of disease, perform meta-analyses, and create clinical decision support systems. There is an urgent need to develop improved search interfaces specific to the field of personalized therapy, including ways to display, rank, and save results by end users. While several database and web-based keyword search engine algorithms exist, there is a lack of tools that meet the unique challenges of personalized medicine. There is also an urgent need to develop software that allows for verification and validation of information extracted and ranked through computational methods using subject matter expertise to improve the gold standard corpus that can be used for biomedical research into personalized therapies. To address these issues, we will build an innovative software stack (MACE2K) to adapt and extend widely tested Biocreative natural language processing (NLP) tools to automatically retrieve and pre-process targeted therapy information from clinicaltrials.gov, PubMed abstracts as well as open access articles, and conference proceedings. We will build an entity extraction cartridge to accurately parse gene mutations, translocations, gene expression, protein expression, and protein phosphorylation. A marker disambiguation cartridge will be built to assess for trial inclusion or exclusion criteria and to determine marker-related primary endpoints. We will include a ranking cartridge that uses the disambiguated information on markers, drugs and trials to provide a rigorous scoring of trials and studies according to their relevance for personalized medicine. A novel gamification cartridge will be built to allow subject matter experts to verify and validate the information corpus. Our research leverages National Cancer Institute's investments in several programs (many of which we are involved in) including the NCI drug dictionary, National Cancer Informatics Program (NCIP), I-SPY trials, and Center for cancer systems biology (CCSB) to efficiently accomplish our aims.

Public Health Relevance

This project will develop new computational methods and software to retrieve targeted molecular and drug therapy information from multiple sources of big data including: clinicaltrials.gov, PubMed abstracts, open access articles, and conference proceedings. The software can be used by biomedical researchers to generate new hypotheses for research on personalized cancer treatment decisions based on enormous volumes of public data already in existence. A novel gamification component will be built to allow subject matter experts to verify and validate the information corpus to enhance accuracy of the software.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project--Cooperative Agreements (U01)
Project #: 1U01HG008390-01
Application #: 8874546
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Sofia, Heidi J

Project Start: 2015-09-22
Project End: 2018-05-31
Budget Start: 2015-09-22
Budget End: 2016-05-31
Support Year: 1
Fiscal Year: 2015
Total Cost
Indirect Cost

Institution

Name: Georgetown University
Department: Internal Medicine/Medicine
Type: Schools of Medicine
DUNS #: 049515844

City: Washington
State: DC
Country: United States
Zip Code: 20057

Related projects


NIH 2017 U01 HG	MACE2K - Molecular And Clinical Extraction: A Natural Language Processing Tool for Personalized Medicine Madhavan, Subha / Georgetown University
NIH 2016 U01 HG	MACE2K - Molecular And Clinical Extraction: A Natural Language Processing Tool for Personalized Medicine Madhavan, Subha / Georgetown University
NIH 2016 U01 HG	MACE2K - Molecular And Clinical Extraction: A Natural Language Processing Tool for Personalized Medicine Madhavan, Subha / Georgetown University	$150,865
NIH 2015 U01 HG	MACE2K - Molecular And Clinical Extraction: A Natural Language Processing Tool for Personalized Medicine Madhavan, Subha / Georgetown University

Publications

Madhavan, Subha; Ritter, Deborah; Micheel, Christine et al. (2018) ClinGen Cancer Somatic Working Group - standardizing and democratizing access to cancer molecular diagnostic data to drive translational research. Pac Symp Biocomput 23:247-258

Mahmood, A S M Ashique; Rao, Shruti; McGarvey, Peter et al. (2017) eGARD: Extracting associations between genomic anomalies and drug responses from text. PLoS One 12:e0189663

Rao, Shruti; Beckman, Robert A; Riazi, Shahla et al. (2017) Quantification and expert evaluation of evidence for chemopredictive biomarkers to personalize cancer treatment. Oncotarget 8:37923-37934

Wang, Qinghua; Ross, Karen E; Huang, Hongzhan et al. (2017) Analysis of Protein Phosphorylation and Its Functional Impact on Protein-Protein Interactions via Text Mining of the Scientific Literature. Methods Mol Biol 1558:213-232

Ritter, Deborah I; Roychowdhury, Sameek; Roy, Angshumoy et al. (2016) Somatic cancer variant curation and harmonization through consensus minimum variant level data. Genome Med 8:117

Bhuvaneshwar, Krithika; Sulakhe, Dinanath; Gauba, Robinder et al. (2015) A case study for cloud based high throughput analysis of NGS data using the globus genomics system. Comput Struct Biotechnol J 13:64-74

Madhavan, Subha; Gauba, Robinder; Song, Lei et al. (2013) Platform for Personalized Oncology: Integrative analyses reveal novel molecular signatures associated with colorectal cancer relapse. AMIA Jt Summits Transl Sci Proc 2013:118

Madhavan, Subha; Gusev, Yuriy; Natarajan, Thanemozhi G et al. (2013) Genome-wide multi-omics profiling of colorectal cancer identifies immune determinants strongly associated with relapse. Front Genet 4:236

Gusev, Yuriy; Riggins, Rebecca B; Bhuvaneshwar, Krithika et al. (2013) In silico discovery of mitosis regulation networks associated with early distant metastases in estrogen receptor positive breast cancers. Cancer Inform 12:31-51

Madhavan, Subha; Gusev, Yuriy; Harris, Michael et al. (2011) G-DOC: a systems medicine platform for personalized oncology. Neoplasia 13:771-83

Comments

Be the first to comment on Subha Madhavan's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: