Automated Molecular Identity Disambiguator (AutoMID)

Schrer, Stephan; Bunin, Barry

Abstract

Small molecules are one of the most important classes of therapeutics alleviating suffering and in many cases death for hundreds of millions of people worldwide. Small molecules also serve as invaluable tools to study biology, often with the goal to validate novel targets for the development of future therapeutic drugs. Reproducibility of experimental results and the interoperability and reusability of resulting datasets depend on accurate descriptions of associated research objects, and most critically on correct representations of small molecules that are tested in biological assays. For example, it is not possible to develop predictive models of protein target - small molecule interactions if their chemical structure representations are not correct. Many factors contribute to errors in reported chemical structures in small molecule screening and omics reference databases, scientific publications, and many other web-based resources and documents. Because of the complexity of representing small molecules chemical structure graphs and the lack of thorough curation, errors are frequently introduced by non-experts and error propagation across different digital research assets is a pervasive problem. To address this challenging problem via a scalable approach, we propose the Automated Molecular Identity Disambiguator (AutoMID). AutoMID will be usable in batch mode at scale via an API, for example to assist chemical structure standardization and registration by maintainers of digital research assets, and also via interactive (UI) mode for everyday researchers to quickly and easily validate or correct their small molecule representations. AutoMID will leverage extensive highly standardized linked databases of chemical structures and associated information including names, synonyms, biological activity and physical properties and their sources / provenance and leverage expert rules and AI to enable reliable disambiguation of chemical structure identities at scale.

Public Health Relevance

Small molecules are one of the most important types of drugs. They also serve as invaluable tools to study biology. The complexity of representing chemical graphs and the lack of thorough curation leads to frequent small molecule structure errors, which propagate across digital research assets, impeding their interoperability and reusability. To address this challenging problem, we propose the Automated Molecular Identity Disambiguator (AutoMID). Built on expert knowledge and AI, AutoMID will enable researchers and maintainers of data repositories to reliably identify and resolve ambiguities in chemical structures at scale.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 1R01LM013391-01
Application #: 9987129
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Vanbiervliet, Alan

Project Start: 2020-05-01
Project End: 2024-02-29
Budget Start: 2020-05-01
Budget End: 2021-02-28
Support Year: 1
Fiscal Year: 2020
Total Cost
Indirect Cost

Institution

Name: University of Miami School of Medicine
Department: Pharmacology
Type: Schools of Medicine
DUNS #: 052780918

City: Coral Gables
State: FL
Country: United States
Zip Code: 33146

Related projects


NIH 2021 R01 LM	Automated Molecular Identity Disambiguator (AutoMID) Schrer, Stephan C.; Bunin, Barry A. / University of Miami School of Medicine
NIH 2020 R01 LM	Automated Molecular Identity Disambiguator (AutoMID) Schrer, Stephan C.; Bunin, Barry A. / University of Miami School of Medicine

Comments

Be the first to comment on Stephan Schrer's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: