Advancing Protein Engineering Using Artificial Intelligence and the ProtaBank Mutation Database

Olafson, Barry

Abstract

Therapeutic antibodies, specialized enzymes for drug manufacturing, small molecule drug screening agents, and other proteins have been instrumental in advancing biotechnology and medicine. Protein therapeutics alone represents a rapidly growing $100+ billion market with broad applications in the treatment of cancer, inflammatory and metabolic diseases, and numerous other disorders. Most of the antibodies and other protein therapeutics developed in the last several years have been engineered, leading to improvements in important properties such as efficacy, binding affinity, expression, stability, and immunogenicity. However, improving protein properties through sequence modification remains a challenging task. Artificial intelligence (AI), which has been enormously successful in several fields (e.g., image recognition, self-driving cars, natural language processing), is now being applied to protein engineering and has the potential to transform this field as well. AI and machine learning (ML) can take advantage of large and diverse datasets to identify correlations, predict beneficial mutations, and explore novel protein sequences in ways that are not possible using other techniques. Other advantages include the ability to simultaneously optimize multiple protein properties and explore sequence space more efficiently. In Phases I and II of this project, we developed the ProtaBank database as a central repository to store, organize, and annotate protein mutation data spanning a broad range of properties. ProtaBank is the largest and only database actively collecting such a comprehensive set of sequence mutation data and is growing rapidly due to the wealth of data being generated with advanced automation and next-generation sequencing techniques. ProtaBank's depth and breadth makes it an ideal data source to train ML models. This proposal aims to create the ProtaBank AI Platform to enable the use of AI and ML tools to apply the data in ProtaBank to engineer proteins. The platform will provide fully customizable computational tools and will invoke protein-specific knowledge to properly prepare data for use with ML models. An interface to popular ML frameworks will be provided so that scientists can use these techniques to discover new predictive algorithms and enhance their ability to design proteins with the desired properties.
Specific aims i nclude: (1) integrating peer validated ML methods and proprietary technology for protein engineering into the ProtaBank AI Platform, (2) developing dynamic ML dataset creation tools, (3) expanding and improving the ProtaBank database by reaching out to scientists to contribute data, (4) enhancing our data deposition tools, and (5) integrating ProtaBank with the Protein Data Bank structure database and other databases. !

Public Health Relevance

Protein engineering has enabled significant advances in health care by playing a key role in the development of antibodies and other protein therapeutics (e.g., for the treatment of cancer, inflammatory and metabolic diseases, and other disorders), highly selective enzymes for drug manufacturing, and novel proteins for use in diagnostics and the identification of new small molecule drugs. This project will enable the power of artificial intelligence (AI) to be applied to accelerate the engineering of proteins with new and improved properties. AI approaches can capitalize on the large amounts of protein mutation data being generated and stored in our recently developed ProtaBank protein mutation database to transform the way in which protein therapeutics and reagents are discovered and developed.!

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #: 5R44GM117961-05
Application #: 9994932
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Wu, Mary Ann

Project Start: 2016-06-01
Project End: 2021-02-28
Budget Start: 2020-09-01
Budget End: 2021-02-28
Support Year: 5
Fiscal Year: 2020
Total Cost
Indirect Cost

Institution

Name: Protabit, LLC
Department
Type
DUNS #: 883426434

City: Pasadena
State: CA
Country: United States
Zip Code: 91106

Related projects


NIH 2020 R44 GM	Advancing Protein Engineering Using Artificial Intelligence and the ProtaBank Mutation Database Olafson, Barry D. / Protabit, LLC
NIH 2019 R44 GM	Advancing Protein Engineering Using Artificial Intelligence and the ProtaBank Mutation Database Olafson, Barry D. / Protabit, LLC
NIH 2018 R44 GM	PEBank: A database for protein engineering data Olafson, Barry D. / Protabit, LLC
NIH 2017 R44 GM	PEBank: A database for protein engineering data Olafson, Barry D. / Protabit, LLC
NIH 2016 R44 GM	PEBank: A database for protein engineering data Olafson, Barry D. / Protabit, LLC	$219,294

Comments

Be the first to comment on Barry Olafson's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: