Capturing and linking genomic and clinical information

Friedman, Carol

Abstract

The long-term aim of this project is to use natural language processing (NLP) to build a high throughput tool for facilitating cancer research by automatically extracting and organizing clinical and genetic information from the Electronic Medical Record (EMR) and from journal articles. Our research involves advanced NLP techniques to: 1) enable the mining of phenotypic and genotypic data in the EMR; 2) automatically amass knowledge concerned with cancer and biomolecular relationships from journals; 3) develop a WEB-enabled visualization tool for researchers that will present diverse views of the knowledge; and 4) develop an Infrastructure that will link to the clinical data warehouse at New York Presbyterian Hospital (NYPH) and to GeneWays, a related project that allows researchers to visualize pathways. More specifically, MedLEE (the NLP system we developed that extracts and encodes clinical and environmental information from the EMR) will be extended to extract genetic information contained in the EMR; subsequently, twelve years of patient reports will be processed and the extracted data added to the warehouse. In addition, a new system, PhenoGenes, will be developed based on MedLEE and GeneWays (which contains another NLP system we developed that extracts and codifies biomolecular relations from journal articles). PhenoGenes will capture biomolecular interactions directly associated with the treatment, diagnosis, and prognosis of cancer. It will also generate an XML knowledge base that will integrate and organize the information that will be captured, and a Web-enabled tool that will allow users to browse and view the knowledge clustered according to different orientations (e.g. gene, disease, tissue, interaction, etc.). The knowledge base will be linked to the GeneWays system, so that relevant pathways can be visualized. MedLEE is utilized operationally at NYPH. It also has been demonstrated that both NLP systems are highly effective. This current project builds upon our experience and success with these systems. The availability of related compatible clinical and biomolecular NLP systems, provide an exceptional opportunity to pave the way for capture, integration and organization of phenotypic and genotypic data and knowledge that will be used to radically improve patient care.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 5R01LM007659-04
Application #: 7110256
Study Section: Special Emphasis Panel (ZLM1-MMR-C (O1))
Program Officer: Ye, Jane

Project Start: 2003-08-01
Project End: 2008-07-31
Budget Start: 2006-08-01
Budget End: 2008-07-31
Support Year: 4
Fiscal Year: 2006
Total Cost: $478,733
Indirect Cost

Institution

Name: Columbia University (N.Y.)
Department: Internal Medicine/Medicine
Type: Schools of Medicine
DUNS #: 621889815

City: New York
State: NY
Country: United States
Zip Code: 10032

Related projects


NIH 2006 R01 LM	Capturing and linking genomic and clinical information Friedman, Carol / Columbia University (N.Y.)	$478,733
NIH 2005 R01 LM	Capturing and linking genomic and clinical information Friedman, Carol / Columbia University (N.Y.)	$478,937
NIH 2004 R01 LM	Capturing and linking genomic and clinical information Friedman, Carol / Columbia University (N.Y.)	$468,590
NIH 2003 R01 LM	Capturing and linking genomic and clinical information Friedman, Carol / Columbia University (N.Y.)	$464,049

Publications

Van Vleck, Tielman T; Elhadad, Noémie (2010) Corpus-Based Problem Selection for EHR Note Summarization. AMIA Annu Symp Proc 2010:817-21

Borlawsky, Tara B; Li, Jianrong; Shagina, Lyudmila et al. (2010) Evaluation of an Ontology-anchored Natural Language-based Approach for Asserting Multi-scale Biomolecular Networks for Systems Medicine. AMIA Jt Summits Transl Sci Proc 2010:6-10

Morrison, Frances P; Li, Li; Lai, Albert M et al. (2009) Repurposing the clinical record: can an existing natural language processing system de-identify clinical notes? J Am Med Inform Assoc 16:37-9

Hripcsak, George; Soulakis, Nicholas D; Li, Li et al. (2009) Syndromic surveillance using ambulatory electronic health records. J Am Med Inform Assoc 16:354-61

Wang, Xiaoyan; Hripcsak, George; Markatou, Marianthi et al. (2009) Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. J Am Med Inform Assoc 16:328-37

Wang, Xiaoyan; Hripcsak, George; Friedman, Carol (2009) Characterizing environmental and phenotypic associations using information theory and electronic health records. BMC Bioinformatics 10 Suppl 9:S13

Morrison, Frances P; Sengupta, Soumitra; Hripcsak, George (2009) Using a pipeline to improve de-identification performance. AMIA Annu Symp Proc 2009:447-51

Xu, Hua; Stetson, Peter D; Friedman, Carol (2009) Methods for building sense inventories of abbreviations in clinical notes. J Am Med Inform Assoc 16:103-8

Sam, Lee T; Mendonça, Eneida A; Li, Jianrong et al. (2009) PhenoGO: an integrated resource for the multiscale mining of clinical and biological data. BMC Bioinformatics 10 Suppl 2:S8

Fan, Jung-Wei; Friedman, Carol (2009) Generating quality word sense disambiguation test sets based on MeSH indexing. AMIA Annu Symp Proc 2009:183-7

Showing the most recent 10 out of 53 publications

Comments

Be the first to comment on Carol Friedman's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: