The function of millions of proteins remains unknown, and automated protein function prediction systems have a poor record of performance. We will test hypotheses about protein functional sites by validating high-throughput predictions derived from computational biology techniques through a novel automated system that will mine the literature for targeted information relevant to those predictions. The impact of our work will be to enable large-scale, validated, annotation of protein function and in turn to facilitate progress in tackling drug discovery for treatment of diseases. High-throughput experiments and bioinformatics techniques are creating an exploding volume of data with which we hope to transcribe the genetic blueprints of life. Targeted experiments are required to validate biomedical discoveries from these sources. Fortunately, the information to confirm or refute a prediction is often already available in an existing publication and the biologist can take advantage of this supporting evidence for validation. However, the sheer volume of predictions from high throughput methods exceeds the capacity of researchers to perform even the necessary literature searches. This gap in capacity must be addressed using automated literature mining methods that perform comparably to a human expert;indeed, development of such methods is a grand challenge of modern Biology. We will mine the full text literature to validate computational predictions of functional sites in proteins. The innovations in our approach include: (1) using computational predictions as the context for a literature search;(2) information extraction of protein functional sites from full text journal publications;(3) high-throughput text mining;and (4) using primary information in protein databases to evaluate the methods. Understanding of protein function is a critical bottleneck in the progress of biomedical research. It is time to truly integrate the biological literature into the protein function prediction problem. By doing so, we will enable a critical advance in high-throughput protein function prediction
The goals of this research are to test hypotheses about protein functional sites by validating high-throughput predictions derived from computational biology techniques. Our approach is to develop a revolutionary system that will automatically mine the literature for targeted information relevant to those predictions. We will produce reliable protein functional site predictions that can in turn be exploited for in silico high- throughput drug design.
Verspoor, Karin; Mackinlay, Andrew; Cohn, Judith D et al. (2013) Detection of protein catalytic sites in the biomedical literature. Pac Symp Biocomput :433-44 |
Verspoor, Karin M; Cohn, Judith D; Ravikumar, Komandur E et al. (2012) Text mining improves prediction of protein functional sites. PLoS One 7:e32171 |
Wall, Michael E; Raghavan, Sindhu; Cohn, Judith D et al. (2011) Genome majority vote improves gene predictions. PLoS Comput Biol 7:e1002284 |
Lu, Zhiyong; Kao, Hung-Yu; Wei, Chih-Hsuan et al. (2011) The gene normalization task in BioCreative III. BMC Bioinformatics 12 Suppl 8:S2 |
Verspoor, Karin; Roeder, Christophe; Johnson, Helen L et al. (2010) Exploring species-based strategies for gene normalization. IEEE/ACM Trans Comput Biol Bioinform 7:462-71 |
Cohen, K Bretonnel; Johnson, Helen L; Verspoor, Karin et al. (2010) The structural and content aspects of abstracts versus bodies of full text journal articles are different. BMC Bioinformatics 11:492 |