About 1500 of the ~20,000 protein-coding genes of the human genome can bind drug-like molecules, and yet only about 600 are currently targeted by FDA-approved drugs. Therefore, at least 930 proteins are potential drug targets that are not yet being utilized for human medicine and, given our incomplete state of knowledge about the human genome, the actual number could be much higher. There is therefore a substantial unmet need to improve our understanding of this so-called genomic dark matter in order to develop novel classes of drugs to improve treatment of disease. Comprehensive experimental investigation of these proteins in the context of hundreds of thousands of compounds and thousands of diseases would be prohibitively expensive, but computational approaches could significantly refine the list. In this project we will apply two sophisticated computational approaches to the task of predicting the most promising novel drug targets. We will integrate the knowledge bases DrugCentral and other resources with the disease and phenotype knowledge base of the Monarch Initiative into a semantically harmonized knowledge graph (KG). This will result in a KG with comprehensive coverage of diseases, genes, gene functions, phenotypic abnormalities, drugs, drug mechanisms, and drug targets. Machine learning (ML) identifies patterns from training sets and applies the patterns to predict entities and relations in new data. ML using KGs has become a hot new research area in computer science, but remains difficult to use for real-world applications, owing to the lack of adequate software packages. We will therefore implement state-of-the art learning algorithms based on deep learning on KGs by extending and adapting selected algorithms to the task of drug and drug target discovery. We will develop an easy-to-use software library and demonstrate its use by means of notebooks that will be designed to serve as starting points for future computational research by other scientists, since they will contain the analysis workflow along with documentation about each step. The human genome codes more than 500 protein kinases, which are enzymes that add a phosphate group to specific amino acid residues and thereby transmit a biological signal. There are currently 35 FDA approved protein kinase modulators acting on 38 protein kinases, which are thus one of the most important groups of druggable proteins encoded by our genome. We will perform a detailed computational study of this group and experimentally validate our top, novel candidate using a patient-derived xenograft model system.

Public Health Relevance

/ RELEVANCE TO PUBLIC HEALTH The human genome codes for over 20,000 proteins, roughly 3,000 of which are thought to be ?druggable?, meaning that in principle they could be targeted by medications to treat disease. However, only about 600 of these proteins are currently targeted by an approved medication, and our knowledge about most of the remaining proteins is limited. Our goal is to develop bioinformatic software that will combine information about these proteins, diseases, clinical manifestations, medications, and other relevant data into a network of information (i.e., a graph), and to use sophisticated machine learning algorithms to predict novel drugs and drug targets. We will focus our computational strategy on the class of protein kinase inhibitors, which are a leading target for cancer drugs. The human genome encodes an estimated 555 kinases, 37 of which have been successfully targeted by anti-cancer medications. By using machine learning to predict which of the remaining 500+ protein kinases are most likely to be amenable to medical treatment, there is a potential to accelerate the pace of research into novel treatments. We will perform a pilot study of the top predicted candidates using a mouse model of cancer.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Research Project--Cooperative Agreements (U01)
Project #
1U01CA239108-01
Application #
9742135
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Zenklusen, Jean C
Project Start
2019-03-01
Project End
2021-02-28
Budget Start
2019-03-01
Budget End
2020-02-29
Support Year
1
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Jackson Laboratory
Department
Type
DUNS #
042140483
City
Bar Harbor
State
ME
Country
United States
Zip Code
04609