Linking Text Mining and Data Mining for Biomedical Knowledge Discovery

Wu, Cathy

Abstract

Systems integration is becoming the driving force for the 21st century biology. Researchers are systematically tackling gene functions and complex regulatory processes by studying organisms at different levels of organization, from genomes, transcriptomes and proteomes to metabolomes and interactomes. To fully realize the value of such high-throughput data requires advanced bioinformatics for integration, mining, comparative analysis, and functional interpretation. Furthermore, with an ever-increasing volume of biomedical literature now available electronically, there is both a pressing need and a great opportunity to fully utilize text mining tools for knowledge extraction. However, despite recent advancements, text mining tools are not being broadly used by biologists. Such a gap is partly due to the lack of close interactions between the text mining and the biological user communities. The goal of this application is to develop a digital research infrastructure that links text mining with data mining in the systems biology context for biomedical knowledge discovery, with a special focus on the utility and usability of the system for real world scientific applications. Building upon the bioinformatics framework we have already developed, as well as our close interactions with the biomedical research community, the specific aims are to: (i) integrate existing text mining tools to identify and extract protein and network information from scientific literature, (ii) connect text mining and data mining with omics data integration and web interface to capture and visualize network knowledge, and (iii) conduct user studies, develop scientific use cases, provide training and outreach, and disseminate the system to the broad biomedical user community. The digital information resource proposed herein will serve as an enabling environment for biomedical researchers to decipher knowledge from a plethora of information available in the literature and public databases, gaining a better understanding of biological and disease processes as a key to the basic understanding of human health and disease.

Public Health Relevance

The proposed digital information resource will serve as an enabling environment for biomedical researchers to gain a better understanding of biological and disease processes in the systems biology context, thereby facilitating target discovery and disease diagnosis, and translating bench knowledge into bedside benefits.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Resources Project Grant (NLM) (G08)
Project #: 1G08LM010720-01
Application #: 7886453
Study Section: Special Emphasis Panel (ZLM1-AP-G (J2))
Program Officer: Sim, Hua-Chuan

Project Start: 2010-08-19
Project End: 2013-08-18
Budget Start: 2010-08-19
Budget End: 2011-08-18
Support Year: 1
Fiscal Year: 2010
Total Cost: $150,000
Indirect Cost

Institution

Name: University of Delaware
Department: Biostatistics & Other Math Sci
Type: Schools of Engineering
DUNS #: 059007500

City: Newark
State: DE
Country: United States
Zip Code: 19716

Related projects


NIH 2012 G08 LM	Linking Text Mining and Data Mining for Biomedical Knowledge Discovery Wu, Cathy H. / University of Delaware	$143,942
NIH 2011 G08 LM	Linking Text Mining and Data Mining for Biomedical Knowledge Discovery Wu, Cathy H. / University of Delaware	$144,000
NIH 2010 G08 LM	Linking Text Mining and Data Mining for Biomedical Knowledge Discovery Wu, Cathy H. / University of Delaware	$150,000

Publications

Boutet, Emmanuel; Lieberherr, Damien; Tognolli, Michael et al. (2016) UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View. Methods Mol Biol 1374:23-54

Pundir, Sangya; Martin, Maria J; O'Donovan, Claire et al. (2016) UniProt Tools. Curr Protoc Bioinformatics 53:1.29.1-15

UniProt Consortium (2015) UniProt: a hub for protein information. Nucleic Acids Res 43:D204-12

Pundir, Sangya; Magrane, Michele; Martin, Maria J et al. (2015) Searching and Navigating UniProt Databases. Curr Protoc Bioinformatics 50:1.27.1-10

Torii, Manabu; Arighi, Cecilia N; Li, Gang et al. (2015) RLIMS-P 2.0: A Generalizable Rule-Based Information Extraction System for Literature Mining of Protein Phosphorylation Information. IEEE/ACM Trans Comput Biol Bioinform 12:17-29

Holliday, Gemma L; Bairoch, Amos; Bagos, Pantelis G et al. (2015) Key challenges for the creation and maintenance of specialist protein resources. Proteins 83:1005-13

Peng, Yifan; Torii, Manabu; Wu, Cathy H et al. (2014) A generalizable NLP framework for fast development of pattern-based biomedical relation extraction systems. BMC Bioinformatics 15:285

Poux, Sylvain; Magrane, Michele; Arighi, Cecilia N et al. (2014) Expert curation in UniProtKB: a case study on dealing with conflicting and erroneous data. Database (Oxford) 2014:bau016

Famiglietti, Maria Livia; Estreicher, Anne; Gos, Arnaud et al. (2014) Genetic variations and diseases in UniProtKB/Swiss-Prot: the ins and outs of expert manual curation. Hum Mutat 35:927-35

Torii, Manabu; Li, Gang; Li, Zhiwen et al. (2014) RLIMS-P: an online text-mining tool for literature-based extraction of protein phosphorylation information. Database (Oxford) 2014:

Showing the most recent 10 out of 24 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: