Automated Knowledge Extraction for Biomedical Literature

Pustejovsky, James

Abstract

It is becoming increasingly difficult for biologists to keep pace with information being published within their own fields, let alone biology as a whole. The ability to rapidly access specific and current biomedical information as well as to quickly gain an overview of current knowledge in a given field is becoming more difficult while at the same time more important. Traditional methods of keeping up with advances are therefore becoming inadequate. Here we propose to continue to develop our Medstract Project to apply recent advances in the computational analysis of text to organize and structure the biological literature. The Medstract project will reduce the time required for biomedical researchers to find information of interest and should facilitate the development of new research insights. This project is the result of a unique collaboration between a computational linguistics lab at Brandeis University and a molecular biology lab at Tufts University School of Medicine. Previously we have developed an extensive set of tools for analyzing and processing biomedical text. We have used these tools to develop databases of biomedical acronyms, inhibitors, regulators, and interactors from Medline abstracts and have made these available on the web. These resources are currently used by hundreds of investigators every day. In addition we have generated and made available gold standard markup files for several biological terms and relations for use as testing standards by other groups developing knowledge extraction engines for the biomedical domain. Here we propose to extend and enhance our current Medstract databases as well to generate new databases using the tools that we have developed. New databases will include protein modifications, domains and motifs, and tissue and cellular localization information. In addition, we will use the bio-relation databases as the foundation for constructing a system allowing point-to-point regulatory pathway identification. We will enhance the robustness of these databases by utilizing algorithms that we have developed for rerendering the semantic ontologies for the biomedical lexicon. Furthermore, by applying coreference resolution algorithms to the text, we will improve precision and recall of knowledge extraction for populating the database

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 5R01LM006649-05
Application #: 6896406
Study Section: Biomedical Library and Informatics Review Committee (BLR)
Program Officer: Ye, Jane

Project Start: 2004-06-01
Project End: 2007-05-31
Budget Start: 2005-06-01
Budget End: 2006-05-31
Support Year: 5
Fiscal Year: 2005
Total Cost: $403,171
Indirect Cost

Institution

Name: Brandeis University
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 616845814

City: Waltham
State: MA
Country: United States
Zip Code: 02454

Related projects


NIH 2006 R01 LM	Automated Knowledge Extraction for Biomedical Literature Pustejovsky, James / Brandeis University	$398,762
NIH 2005 R01 LM	Automated Knowledge Extraction for Biomedical Literature Pustejovsky, James / Brandeis University	$403,171
NIH 2004 R01 LM	Automated Knowledge Extraction for Biomedical Literature Pustejovsky, James / Brandeis University	$411,436
NIH 2003 R01 LM	Automated Knowledge Extraction for Biomedical Literature Pustejovsky, James / Brandeis University	$191,306
NIH 2001 R01 LM	Automated Knowledge Extraction for Biomedical Literature Pustejovsky, James / Brandeis University	$306,158
NIH 2000 R01 LM	Automated Knowledge Extraction for Biomedical Literature Pustejovsky, James / Brandeis University	$297,119
NIH 1999 R01 LM	Automated Knowledge Extraction for Biomedical Literature Pustejovsky, James / Brandeis University

Publications

Pustejovsky, J; Castano, J; Zhang, J et al. (2002) Robust relational parsing over biomedical literature: extracting inhibit relations. Pac Symp Biocomput :362-73

Pustejovsky, J; Castano, J; Cochran, B et al. (2001) Automatic extraction of acronym-meaning pairs from MEDLINE databases. Medinfo 10:371-5

Comments

Be the first to comment on James Pustejovsky's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: