Burns, Gully A. Abstract In primary research articles, scientists make claims based on evidence from experiments, and report both the claims and the supporting evidence in the results section of papers. However, biomedical databases de- scribe the claims made by scientists in detail, but rarely provide descriptions of any supporting evidence that a consulting scientist could use to understand why the claims are being made. Currently, the process of curating evidence into databases is manual, time-consuming and expensive; thus, evidence is recorded in papers but not generally captured in database systems. For example, the European Bioinformatics Institute's INTACT database describes how different molecules biochemically interact with each other in detail. They characterize the under- lying experiment providing the evidence of that interaction with only two hierarchical variables: a code denoting the method used to detect the molecular interaction and another code denoting the method used to detect each molecule. In fact, INTACT describes 94 different types of interaction detection method that could be used in conjunction with other experimental methodological processes that can be used in a variety of different ways to reveal different details about the interaction. This crucial information is not being captured in databases. Although experimental evidence is complex, it conforms to certain principles of experimental design: experimentally study- ing a phenomenon typically involves measuring well-chosen dependent variables whilst altering the values of equally well-chosen independent variables. Exploiting these principles has permitted us to devise a preliminary, robust, general-purpose representation for experimental evidence. In this project, We will use this representation to describe the methods and data pertaining to evidence underpinning the interpretive assertions about molecular interactions described by INTACT. A key contribution of our project is that we will develop methods to extract this evidence from scienti?c papers automatically (A) by using image processing on a speci?c subtype of ?gure that is common in molecular biology papers and (B) by using natural language processing to read information from the text used by scientists to describe their results. We will develop these tools for the INTACT repository but package them so that they may then also be used for evidence pertaining to other areas of research in biomedicine.

Public Health Relevance

Burns, Gully A. Narrative Molecular biology databases contain crucial information for the study of human disease (especially cancer), but they omit details of scienti?c evidence. Our work will provide detailed accounts of experimental evidence supporting claims pertaining to the study of these diseases. This additional detail may provide scientists with more powerful ways of detecting anomalies and resolving contradictory ?ndings.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM012592-03
Application #
9772541
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2017-09-01
Project End
2021-08-31
Budget Start
2019-09-01
Budget End
2020-08-31
Support Year
3
Fiscal Year
2019
Total Cost
Indirect Cost
Name
University of Southern California
Department
Biostatistics & Other Math Sci
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
072933393
City
Los Angeles
State
CA
Country
United States
Zip Code
90089