Automated Literature Mining for Validation of High-Throughput Function Prediction

Hunter, Lawrence

Abstract

The function of millions of proteins remains unknown, and automated protein function prediction systems have a poor record of performance. We will test hypotheses about protein functional sites by validating high-throughput predictions derived from computational biology techniques through a novel automated system that will mine the literature for targeted information relevant to those predictions. The impact of our work will be to enable large-scale, validated, annotation of protein function and in turn to facilitate progress in tackling drug discovery for treatment of diseases. High-throughput experiments and bioinformatics techniques are creating an exploding volume of data with which we hope to transcribe the genetic blueprints of life. Targeted experiments are required to validate biomedical discoveries from these sources. Fortunately, the information to confirm or refute a prediction is often already available in an existing publication and the biologist can take advantage of this supporting evidence for validation. However, the sheer volume of predictions from high throughput methods exceeds the capacity of researchers to perform even the necessary literature searches. This gap in capacity must be addressed using automated literature mining methods that perform comparably to a human expert;indeed, development of such methods is a grand challenge of modern Biology. We will mine the full text literature to validate computational predictions of functional sites in proteins. The innovations in our approach include: (1) using computational predictions as the context for a literature search;(2) information extraction of protein functional sites from full text journal publications;(3) high-throughput text mining;and (4) using primary information in protein databases to evaluate the methods. Understanding of protein function is a critical bottleneck in the progress of biomedical research. It is time to truly integrate the biological literature into the protein function prediction problem. By doing so, we will enable a critical advance in high-throughput protein function prediction

Public Health Relevance

The goals of this research are to test hypotheses about protein functional sites by validating high-throughput predictions derived from computational biology techniques. Our approach is to develop a revolutionary system that will automatically mine the literature for targeted information relevant to those predictions. We will produce reliable protein functional site predictions that can in turn be exploited for in silico high- throughput drug design.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 5R01LM010120-02
Application #: 7843633
Study Section: Special Emphasis Panel (ZLM1-AP-E (M3))
Program Officer: Ye, Jane

Project Start: 2009-07-01
Project End: 2012-06-30
Budget Start: 2010-07-01
Budget End: 2012-06-30
Support Year: 2
Fiscal Year: 2010
Total Cost: $711,389
Indirect Cost

Institution

Name: University of Colorado Denver
Department: Pharmacology
Type: Schools of Medicine
DUNS #: 041096314

City: Aurora
State: CO
Country: United States
Zip Code: 80045

Related projects


NIH 2010 R01 LM	Automated Literature Mining for Validation of High-Throughput Function Prediction Hunter, Lawrence E. / University of Colorado Denver	$711,389
NIH 2010 R01 LM	Automated Literature Mining for Validation of High-Throughput Function Prediction Verspoor, Karin Maria / University of Colorado Denver	$80,113
NIH 2009 R01 LM	Automated Literature Mining for Validation of High-Throughput Function Prediction Verspoor, Karin Maria / University of Colorado Denver	$721,448

Publications

Verspoor, Karin; Mackinlay, Andrew; Cohn, Judith D et al. (2013) Detection of protein catalytic sites in the biomedical literature. Pac Symp Biocomput :433-44

Verspoor, Karin M; Cohn, Judith D; Ravikumar, Komandur E et al. (2012) Text mining improves prediction of protein functional sites. PLoS One 7:e32171

Wall, Michael E; Raghavan, Sindhu; Cohn, Judith D et al. (2011) Genome majority vote improves gene predictions. PLoS Comput Biol 7:e1002284

Lu, Zhiyong; Kao, Hung-Yu; Wei, Chih-Hsuan et al. (2011) The gene normalization task in BioCreative III. BMC Bioinformatics 12 Suppl 8:S2

Verspoor, Karin; Roeder, Christophe; Johnson, Helen L et al. (2010) Exploring species-based strategies for gene normalization. IEEE/ACM Trans Comput Biol Bioinform 7:462-71

Cohen, K Bretonnel; Johnson, Helen L; Verspoor, Karin et al. (2010) The structural and content aspects of abstracts versus bodies of full text journal articles are different. BMC Bioinformatics 11:492

Comments

Be the first to comment on Lawrence Hunter's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: