The goal of this research proposal is to develop new in-silico approaches for accurate functional annotation of genetic and post-transcriptional variants. The rapid growth of Next-Generation Sequencing (NGS) and high- throughput -omics data have brought us one step closer towards mechanistic understanding of the complex genetic disease, such as cancer, neurological disorders, diabetes, and others at the molecular level. In particular, these data revealed that complex diseases commonly manifest changes at the genetic and post- transcriptional levels. Bot of these types of changes often affect structure and function of the corresponding genes and their products. Understanding the functional implications of the genetic and post-transcriptional variation is an important task as it can provide critical insights into the molecular mechanisms underlying the disease. Here, we propose to leverage novel machine learning paradigms to design computational methods for predicting the effect of genetic and alternative splicing variants on the macromolecular interactions. Macromolecular interactions underlie many cellular functions in a healthy organism. The disease-induced changes in the genes, such as single nucleotide variations (SNVs) and alternative splicing variations (ASVs) have been recently reported to cause the protein-protein interaction network rewiring. Unfortunately, the experimental high-throughput techniques that characterize the large-scale effects of SNVs or ASVs on PPIs are expensive, time-consuming, and far from being comprehensive. The current in-silico methods either suffer from the limited applicability, or are less accurate when compared with the experimental methods. To overcome these challenges, we will use two recent machine learning paradigms, learning under privileged information (LUPI) and semi-supervised learning. If successful, we expect for the proposed methods to provide the critical advancement in the two main challenges of the current computational approaches, the limited coverage and lower than the experimental accuracy. The methods will be freely available to the community as the stand-alone tools as well as web- servers.

Public Health Relevance

The goal of this proposal is to build computational tools that discover the links between the disease-associated mutations as well as alternatively spliced protein isoforms and the protein-protein interactions mediated by the disease proteins. The tools use advanced machine learning methods to find such links in a fast and inexpensive way. These tools will be useful in elucidating the molecular mechanisms implicated in the complex genetic disorders.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Exploratory/Developmental Grants (R21)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Worcester Polytechnic Institute
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code