Systems integration is becoming the driving force for the 21st century biology. Researchers are systematically tackling gene functions and complex regulatory processes by studying organisms at different levels of organization, from genomes, transcriptomes and proteomes to metabolomes and interactomes. To fully realize the value of such high-throughput data requires advanced bioinformatics for integration, mining, comparative analysis, and functional interpretation. Furthermore, with an ever-increasing volume of biomedical literature now available electronically, there is both a pressing need and a great opportunity to fully utilize text mining tools for knowledge extraction. However, despite recent advancements, text mining tools are not being broadly used by biologists. Such a gap is partly due to the lack of close interactions between the text mining and the biological user communities. The goal of this application is to develop a digital research infrastructure that links text mining with data mining in the systems biology context for biomedical knowledge discovery, with a special focus on the utility and usability of the system for real world scientific applications. Building upon the bioinformatics framework we have already developed, as well as our close interactions with the biomedical research community, the specific aims are to: (i) integrate existing text mining tools to identify and extract protein and network information from scientific literature, (ii) connect text mining and data mining with omics data integration and web interface to capture and visualize network knowledge, and (iii) conduct user studies, develop scientific use cases, provide training and outreach, and disseminate the system to the broad biomedical user community. The digital information resource proposed herein will serve as an enabling environment for biomedical researchers to decipher knowledge from a plethora of information available in the literature and public databases, gaining a better understanding of biological and disease processes as a key to the basic understanding of human health and disease.

Public Health Relevance

The proposed digital information resource will serve as an enabling environment for biomedical researchers to gain a better understanding of biological and disease processes in the systems biology context, thereby facilitating target discovery and disease diagnosis, and translating bench knowledge into bedside benefits.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Resources Project Grant (NLM) (G08)
Project #
Application #
Study Section
Special Emphasis Panel (ZLM1-AP-G (J2))
Program Officer
Sim, Hua-Chuan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Delaware
Biostatistics & Other Math Sci
Schools of Engineering
United States
Zip Code
Boutet, Emmanuel; Lieberherr, Damien; Tognolli, Michael et al. (2016) UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View. Methods Mol Biol 1374:23-54
Pundir, Sangya; Martin, Maria J; O'Donovan, Claire et al. (2016) UniProt Tools. Curr Protoc Bioinformatics 53:1.29.1-15
UniProt Consortium (2015) UniProt: a hub for protein information. Nucleic Acids Res 43:D204-12
Pundir, Sangya; Magrane, Michele; Martin, Maria J et al. (2015) Searching and Navigating UniProt Databases. Curr Protoc Bioinformatics 50:1.27.1-10
Torii, Manabu; Arighi, Cecilia N; Li, Gang et al. (2015) RLIMS-P 2.0: A Generalizable Rule-Based Information Extraction System for Literature Mining of Protein Phosphorylation Information. IEEE/ACM Trans Comput Biol Bioinform 12:17-29
Holliday, Gemma L; Bairoch, Amos; Bagos, Pantelis G et al. (2015) Key challenges for the creation and maintenance of specialist protein resources. Proteins 83:1005-13
Poux, Sylvain; Magrane, Michele; Arighi, Cecilia N et al. (2014) Expert curation in UniProtKB: a case study on dealing with conflicting and erroneous data. Database (Oxford) 2014:bau016
Peng, Yifan; Torii, Manabu; Wu, Cathy H et al. (2014) A generalizable NLP framework for fast development of pattern-based biomedical relation extraction systems. BMC Bioinformatics 15:285
Famiglietti, Maria Livia; Estreicher, Anne; Gos, Arnaud et al. (2014) Genetic variations and diseases in UniProtKB/Swiss-Prot: the ins and outs of expert manual curation. Hum Mutat 35:927-35
Torii, Manabu; Li, Gang; Li, Zhiwen et al. (2014) RLIMS-P: an online text-mining tool for literature-based extraction of protein phosphorylation information. Database (Oxford) 2014:

Showing the most recent 10 out of 24 publications