Systems integration is becoming the driving force for the 21st century biology. Researchers are systematically tackling gene functions and complex regulatory processes by studying organisms at different levels of organization, from genomes, transcriptomes and proteomes to metabolomes and interactomes. To fully realize the value of such high-throughput data requires advanced bioinformatics for integration, mining, comparative analysis, and functional interpretation. Furthermore, with an ever-increasing volume of biomedical literature now available electronically, there is both a pressing need and a great opportunity to fully utilize text mining tools for knowledge extraction. However, despite recent advancements, text mining tools are not being broadly used by biologists. Such a gap is partly due to the lack of close interactions between the text mining and the biological user communities. The goal of this application is to develop a digital research infrastructure that links text mining with data mining in the systems biology context for biomedical knowledge discovery, with a special focus on the utility and usability of the system for real world scientific applications. Building upon the bioinformatics framework we have already developed, as well as our close interactions with the biomedical research community, the specific aims are to: (i) integrate existing text mining tools to identify and extract protein and network information from scientific literature, (ii) connect text mining and data mining with omics data integration and web interface to capture and visualize network knowledge, and (iii) conduct user studies, develop scientific use cases, provide training and outreach, and disseminate the system to the broad biomedical user community. The digital information resource proposed herein will serve as an enabling environment for biomedical researchers to decipher knowledge from a plethora of information available in the literature and public databases, gaining a better understanding of biological and disease processes as a key to the basic understanding of human health and disease.

Public Health Relevance

The proposed digital information resource will serve as an enabling environment for biomedical researchers to gain a better understanding of biological and disease processes in the systems biology context, thereby facilitating target discovery and disease diagnosis, and translating bench knowledge into bedside benefits.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Resources Project Grant (NLM) (G08)
Project #
Application #
Study Section
Special Emphasis Panel (ZLM1-AP-G (J2))
Program Officer
Sim, Hua-Chuan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Delaware
Biostatistics & Other Math Sci
Schools of Engineering
United States
Zip Code
Torii, Manabu; Li, Gang; Li, Zhiwen et al. (2014) RLIMS-P: an online text-mining tool for literature-based extraction of protein phosphorylation information. Database (Oxford) 2014:
Peng, Yifan; Tudor, Catalina O; Torii, Manabu et al. (2014) iSimp in BioC standard format: enhancing the interoperability of a sentence simplification system. Database (Oxford) 2014:
Poux, Sylvain; Magrane, Michele; Arighi, Cecilia N et al. (2014) Expert curation in UniProtKB: a case study on dealing with conflicting and erroneous data. Database (Oxford) 2014:bau016
Peng, Yifan; Torii, Manabu; Wu, Cathy H et al. (2014) A generalizable NLP framework for fast development of pattern-based biomedical relation extraction systems. BMC Bioinformatics 15:285
Famiglietti, Maria Livia; Estreicher, Anne; Gos, Arnaud et al. (2014) Genetic variations and diseases in UniProtKB/Swiss-Prot: the ins and outs of expert manual curation. Hum Mutat 35:927-35
UniProt Consortium (2014) Activities at the Universal Protein Resource (UniProt). Nucleic Acids Res 42:D191-8
Arighi, Cecilia N; Carterette, Ben; Cohen, K Bretonnel et al. (2013) An overview of the BioCreative 2012 Workshop Track III: interactive text mining task. Database (Oxford) 2013:bas056
UniProt Consortium (2013) Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res 41:D43-7
Comeau, Donald C; Islamaj Do?an, Rezarta; Ciccarese, Paolo et al. (2013) BioC: a minimalist approach to interoperability for biomedical text processing. Database (Oxford) 2013:bat064
Pedruzzi, Ivo; Rivoire, Catherine; Auchincloss, Andrea H et al. (2013) HAMAP in 2013, new developments in the protein family classification and annotation system. Nucleic Acids Res 41:D584-9

Showing the most recent 10 out of 16 publications