Studying the primary research literature is a universal, primary activity for biomedical scientists. It underlies scientists' understanding of their subject and strengthens their capability to plan, execute, and interpret experiments. This proposal is concerned with the maintenance and continued development of software that supports scientists in their scholarly work. Our goal is to develop a knowledge engineering platform (called `BioScholar') to permit a single graduate student or postdoctoral worker to design, build, curate, and maintain a Knowledge Base (KB) for the literature of interest to a specific laboratory. This continues a previous software development project that was funded by the National Library of Medicine (LM 07061). We will continue to maintain the software using modern software engineering tools and approaches, whilst making it fully interoperable with a widely used ontology engineering platform (Protege /OWL). We will also develop the systems' existing capabilities to assist scientists with management of bibliographic data (citation information and full-text PDF articles). We will further develop tools to allow researchers to annotate PDF files with highlights, simple comments and with structured data. We will then use this annotation framework to drive the process of constructing knowledge bases using Protege/OWL (a widely used ontology editor). We will then incorporate Information Extraction (IE) techniques from modern Natural Language Processing (NLP) to improve the efficiency of this curation process. The NLP methods we use are based on the Conditional Random Fields (CRF) model which is considered state-of-the-art amongst NLP researchers. Finally, the most research-oriented component of this proposal is the development of a new methodology for knowledge representation and reasoning in biomedicine based on experimental design, involving experimental controls, independent and dependent variables, statistical significance and correlation between variables. This representation will be (a) understandable to experimental scientists, (b) lightweight, (c) versatile, and (d) capable of supporting inference between experiments. During the course of this project, we will build a KB for the world-leading neuroendocrinology laboratory of Prof. Alan Watts at University Southern California. Prof. Watts' work is concerned with the study of catecholaminergic control of the stress response, drawing on research from a large number of different fields (anatomy, physiology, molecular biology, etc.). After developing this KB, we will test its validity using subjective methods (questionnaires and interviews), and objective experiments (`mock exams' to see if students' performance with test questions based on comprehension of the primary literature). We will release all findings and tools to the biomedical community as research papers and open-source software. Narrative This project will help biomedical scientists manage, understand and communicate the complex information they must learn from scientific papers in multiple biomedical disciplines. As a demonstration of this work, we will build a comprehensive summary of research underlying brain circuits involved in stress. Stress and anxiety disorders are estimated to affect 19.1 million people in the USA, costing $42 billion in health costs per year (source: Anxiety Disorders Association of America). ? ? ?

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM083871-01
Application #
7426246
Study Section
Special Emphasis Panel (ZRG1-BST-Q (01))
Program Officer
Lyster, Peter
Project Start
2008-04-01
Project End
2012-03-31
Budget Start
2008-04-01
Budget End
2009-03-31
Support Year
1
Fiscal Year
2008
Total Cost
$266,213
Indirect Cost
Name
University of Southern California
Department
Biostatistics & Other Math Sci
Type
Schools of Engineering
DUNS #
072933393
City
Los Angeles
State
CA
Country
United States
Zip Code
90089
Burns, Gully A P C; Turner, Jessica A (2013) Modeling functional Magnetic Resonance Imaging (fMRI) experimental variables in the Ontology of Experimental Variables and Values (OoEVV). Neuroimage 82:662-70
Ramakrishnan, Cartic; Patnia, Abhishek; Hovy, Eduard et al. (2012) Layout-aware text extraction from full-text PDF of scientific articles. Source Code Biol Med 7:7
Sansone, Susanna-Assunta; Rocca-Serra, Philippe; Field, Dawn et al. (2012) Toward interoperable bioscience data. Nat Genet 44:121-6
Hirschman, Lynette; Burns, Gully A P C; Krallinger, Martin et al. (2012) Text mining for the biocuration workflow. Database (Oxford) 2012:bas020
Russ, Thomas A; Ramakrishnan, Cartic; Hovy, Eduard H et al. (2011) Knowledge engineering tools for reasoning with scientific observations and interpretations: a neural connectivity use case. BMC Bioinformatics 12:351
Tallis, Marcelo; Thompson, Richard; Russ, Thomas A et al. (2011) Knowledge synthesis with maps of neural connectivity. Front Neuroinform 5:24
Helmer, Karl G; Ambite, Jose Luis; Ames, Joseph et al. (2011) Enabling collaborative research using the Biomedical Informatics Research Network (BIRN). J Am Med Inform Assoc 18:416-22