Biological assays are the foundation for developing chemical probes and drugs, but new Big Data approaches ? which have revolutionized other areas of biomedical science ? have not yet advanced this early step of biomedical research: analysis of assay data. The obstacle is that scientists specify their assays through text descriptions written in scientific English, which need to be translated into standardized annotations readable by computers. This lack of standardized and machine-readable assay descriptions is a major impediment to manage, find, aggregate, compare, re-use, and learn from the ever-growing corpus of assays (e.g., >1.2 million in PubChem). Thus, there is a critical need for better annotation and curation tools for drug discovery assays. However, the process to go from a simple text protocol to highly detailed machine-readable semantic annotations is not trivial. Multiple tools and technologies are required: ontologies or the structured controlled vocabularies; templates that map specific vocabularies to properties that are to be captured; and software tools to actually apply these ontologies to a given text. Currently, each of these exists in isolation; yet, a bottleneck in any one tool or technology, or a gap between the different pieces, disrupts the overall process, resulting in poor or no annotation of the datasets. Here we propose a project to combine and integrate these three technologies (which are also the core competencies of the three groups collaborating on this proposal). We will deliver a novel, comprehensive, user-friendly data annotation and curation system that is highly interconnected, encompassing the full cycle, and real-world practice, of required tasks and decisions, by all parties within the `bioassay annotation ecosystem' (researchers performing curation, dedicated curators, IT specialists, ontology owners, and librarians/repositories). The alliance between academic and commercial collaborators, who already work together, will greatly benefit the project and minimize execution risk.
Our specific aims are to: (1) Develop a bioassay-specific template editor and templates by adopting the Stanford (Center for Expanded Data Annotation and Retrieval, CEDAR) data model to the machine learning-based curation tool BioAssay Express, to exploit the broad functionality of its data structures, tools and interfaces; (2) Define and create an ontology update process and tool (`OntoloBridge') to support rapid feedback between curators/users and ontology experts and enable semi-automated incorporation of suggestions for updates to existing published ontologies; (3) Develop new tools to export annotated data into public repositories such as PubChem; and (4) Evaluate our solution across diverse audiences (pharma, academia, repositories). The system will improve bioassay curation efficiency, quality, and effectiveness, enabling scientists to generate standardized annotations for their experiments to make these data FAIR (Findable, Accessible, Interoperable, Reusable). We envision this suite of tools will encourage annotation earlier in the data lifecycle while still supporting annotation at later stages (e.g., submission to repositories or to journals).

Public Health Relevance

Biological assays are the foundation for developing drugs, but new Big Data approaches ? which have revolu- tionized other areas of biomedical science ? have not yet advanced this early step of biomedical research: analysis of assay data. The obstacle is that assays are written in scientific English, which need to be translated into standardized descriptions readable by computers. This lack of machine-readable annotations is a major impediment to manage, find, compare, re-use, and learn from the millions of assays. This project will develop a formal process and integrated tools to support the complete cycle of tasks and decisions required for bioassay annotation, enabling expedited (and more cost-effective) drug discovery.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project--Cooperative Agreements (U01)
Project #
5U01LM012630-04
Application #
9979969
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Florance, Valerie
Project Start
2017-08-01
Project End
2021-07-31
Budget Start
2020-08-01
Budget End
2021-07-31
Support Year
4
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Miami School of Medicine
Department
Pharmacology
Type
Schools of Medicine
DUNS #
052780918
City
Coral Gables
State
FL
Country
United States
Zip Code
33146