Studying the primary research literature is a universal, primary activity for biomedical scientists. It underlies scientists'understanding of their subject and strengthens their capability to plan, execute, and interpret experiments. This proposal is concerned with the maintenance and continued development of software that supports scientists in their scholarly work. Our goal is to develop a knowledge engineering platform (called `BioScholar') to permit a single graduate student or postdoctoral worker to design, build, curate, and maintain a Knowledge Base (KB) for the literature of interest to a specific laboratory. This continues a previous software development project that was funded by the National Library of Medicine (LM 07061). We will continue to maintain the software using modern software engineering tools and approaches, whilst making it fully interoperable with a widely used ontology engineering platform (Protege /OWL). We will also develop the systems'existing capabilities to assist scientists with management of bibliographic data (citation information and full-text PDF articles). We will further develop tools to allow researchers to annotate PDF files with highlights, simple comments and with structured data. We will then use this annotation framework to drive the process of constructing knowledge bases using Protege/OWL (a widely used ontology editor). We will then incorporate Information Extraction (IE) techniques from modern Natural Language Processing (NLP) to improve the efficiency of this curation process. The NLP methods we use are based on the Conditional Random Fields (CRF) model which is considered state-of-the-art amongst NLP researchers. Finally, the most research-oriented component of this proposal is the development of a new methodology for knowledge representation and reasoning in biomedicine based on experimental design, involving experimental controls, independent and dependent variables, statistical significance and correlation between variables. This representation will be (a) understandable to experimental scientists, (b) lightweight, (c) versatile, and (d) capable of supporting inference between experiments. During the course of this project, we will build a KB for the world-leading neuroendocrinology laboratory of Prof. Alan Watts at University Southern California. Prof. Watts'work is concerned with the study of catecholaminergic control of the stress response, drawing on research from a large number of different fields (anatomy, physiology, molecular biology, etc.). After developing this KB, we will test its validity using subjective methods (questionnaires and interviews), and objective experiments (`mock exams'to see if students'performance with test questions based on comprehension of the primary literature). We will release all findings and tools to the biomedical community as research papers and open-source software. Narrative This project will help biomedical scientists manage, understand and communicate the complex information they must learn from scientific papers in multiple biomedical disciplines. As a demonstration of this work, we will build a comprehensive summary of research underlying brain circuits involved in stress. Stress and anxiety disorders are estimated to affect 19.1 million people in the USA, costing $42 billion in health costs per year (source: Anxiety Disorders Association of America).

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BST-Q (01))
Program Officer
Lyster, Peter
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Southern California
Biostatistics & Other Math Sci
Schools of Engineering
Los Angeles
United States
Zip Code
Burns, Gully A P C; Turner, Jessica A (2013) Modeling functional Magnetic Resonance Imaging (fMRI) experimental variables in the Ontology of Experimental Variables and Values (OoEVV). Neuroimage 82:662-70
Ramakrishnan, Cartic; Patnia, Abhishek; Hovy, Eduard et al. (2012) Layout-aware text extraction from full-text PDF of scientific articles. Source Code Biol Med 7:7
Sansone, Susanna-Assunta; Rocca-Serra, Philippe; Field, Dawn et al. (2012) Toward interoperable bioscience data. Nat Genet 44:121-6
Hirschman, Lynette; Burns, Gully A P C; Krallinger, Martin et al. (2012) Text mining for the biocuration workflow. Database (Oxford) 2012:bas020
Tallis, Marcelo; Thompson, Richard; Russ, Thomas A et al. (2011) Knowledge synthesis with maps of neural connectivity. Front Neuroinform 5:24
Helmer, Karl G; Ambite, Jose Luis; Ames, Joseph et al. (2011) Enabling collaborative research using the Biomedical Informatics Research Network (BIRN). J Am Med Inform Assoc 18:416-22
Russ, Thomas A; Ramakrishnan, Cartic; Hovy, Eduard H et al. (2011) Knowledge engineering tools for reasoning with scientific observations and interpretations: a neural connectivity use case. BMC Bioinformatics 12:351