Increasingly large and diverse data sets are being generated by publically funded screening centers using various high- and low-throughput screening technologies. Much of this data is accessible. The largest public repository of small molecule screening results is PubChem, currently covering over 1,500 assays for 370,000 compounds. The number of publically available assays is expected to grow more than 10 fold during the next five years. The utility of this invaluable resource is currently limited, because the knowledge contained in complex and diverse bioassay data sets is not formalized and therefore cannot be accessed for comprehensive computational analysis or integration with other data sources. This proposal is to attack this limitation. For the past ten years ontologies have been developed by biologists to facilitate the analysis and discussion of the massive amounts of information emerging from the various genome projects. An ontology is a controlled vocabulary representation of the objects and concepts and their properties and relationships. The purpose is to model and share domain-specific knowledge so that software agents can automatically extract and associate information.
The aim of this proposal is to develop a bioassay ontology, software tools, and to demonstrate their utility. The bioassay ontology will coherently describe diverse biological assays (such as those in PubChem) with a focus on complex cell-based assays and in particular high-content screening data. Software support and development includes modules to build ontology terms and to curate data sets, tools to map the ontology onto screening experiments and other ontologies, and tools to standardize, reformat, and aggregate data sets in the context of the ontology. We will demonstrate the utility of our approach by creating a PubChem-derived database and making it available to the community via a search interface. The ontology and software tools will facilitate the analysis of bioassay screening data in various contexts, for example signaling or metabolic pathways and indirectly human disease. The tools will enable one to extract data sets for modeling specific interactions between perturbing agents and biological targets (or pathways), or to model assay technology-dependent interferences. End user software needs to provide ease of use for biologists and chemical biologists to utilize the ontology in the context of their own and external data sets. It will be modular and open source. We will develop various collaborations to disseminate the bioassay ontology and software in the community and to facilitate their ongoing development.

Public Health Relevance

This project will develop a bioassay ontology to coherently describe the hundreds of different assays used to study how perturbing agents, such as drugs, alter cell function. Along with new software to search existing assay databases, this will enable scientists to more effectively identify and prioritize chemicals for further development into chemical probes or starting points for therapeutics.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
High Impact Research and Research Infrastructure Programs (RC2)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1-HGR-N (O1))
Program Officer
Ajay, Ajay
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Miami School of Medicine
Schools of Medicine
Coral Gables
United States
Zip Code
Zander Balderud, Linda; Murray, David; Larsson, Niklas et al. (2015) Using the BioAssay Ontology for analyzing high-throughput screening data. J Biomol Screen 20:402-15
Lemmon, Vance P; Ferguson, Adam R; Popovich, Phillip G et al. (2014) Minimum information about a spinal cord injury experiment: a proposed reporting standard for spinal cord injury experiments. J Neurotrauma 31:1354-61
Lemmon, Vance P; Abeyruwan, Saminda; Visser, Ubbo et al. (2014) Facilitating transparency in spinal cord injury studies using data standards and ontologies. Neural Regen Res 9:6-7
Abeyruwan, Saminda; Vempati, Uma D; Küçük-McGinty, Hande et al. (2014) Evolving BioAssay Ontology (BAO): modularization, integration and applications. J Biomed Semantics 5:S5
Przydzial, Magdalena J; Bhhatarai, Barun; Koleti, Amar et al. (2013) GPCR ontology: development and application of a G protein-coupled receptor pharmacology knowledge framework. Bioinformatics 29:3211-9
Motti, Dario; Bixby, John L; Lemmon, Vance P (2012) MicroRNAs and neuronal development. Semin Fetal Neonatal Med 17:347-52
Inoue, Masashi; Ogihara, Mitsunori; Hanada, Ryoko et al. (2012) Gestural cue analysis in automated semantic miscommunication annotation. Multimed Tools Appl 61:7-20
Vempati, Uma D; Przydzial, Magdalena J; Chung, Caty et al. (2012) Formalization, annotation and analysis of diverse drug and probe screening assay datasets using the BioAssay Ontology (BAO). PLoS One 7:e49198
Visser, Ubbo; Abeyruwan, Saminda; Vempati, Uma et al. (2011) BioAssay Ontology (BAO): a semantic description of bioassays and high-throughput screening results. BMC Bioinformatics 12:257
Lemmon, Vance P; Jia, Yuanyuan; Shi, Yan et al. (2011) Challenges in small screening laboratories: implementing an on-demand laboratory information management system. Comb Chem High Throughput Screen 14:742-8

Showing the most recent 10 out of 11 publications