Modern computational techniques based on machine-learning (ML) and, more recently, deep-learning (DL) are playing a critical role in realizing the precision medicine initiative. However, there is a critical need to systematically combine these powerful data-driven techniques with prior molecular network knowledge to make more accurate predictive models while also satisfactorily explaining their predictions in terms of mechanisms underlying complex traits and diseases. I propose to use domain specific knowledge from biology and computing to tackle three outstanding problems: 1) how to predict missing labels associated with millions of publicly available samples? 2) what molecular/cellular function can be attached to these samples and 3) how can we translate the findings from human data to a model species and back? ?Network-constrained Deep Learning for Metadata Imputation: ??Most multifactorial phenotypes are tissue dependent and manifest differently depending on age, sex, and ethnicity. However, a majority of publicly-available genomic data lack these labels. I will develop a network-guided approach to predict missing metadata of samples based on their expression profiles by designing novel data-driven models where the model architecture and/or structure of the input data are constrained by an underlying gene network. ?Network-guided Functional Analysis of Genomic Data: ??High-throughput experiments often generate lists of genes of interest that are hard to interpret. Functional enrichment analysis (FEA) is a powerful tool that attaches functional meaning to an experimental set of genes by summarizing them into sets of pathways/processes. However, standard FEA analysis is limited by incomplete knowledge of gene function, lack of context of the underlying gene network, and noise in expression data. I will address these limitations by developing a network-guided approach that jointly captures genes, their interactions, and their known biological pathways/processes into a common, low-dimensional space that facilitates deriving biological meaning by comparing the distance between the experimental gene set and the pathway/process of interest. ?Joint Multi-Species Genomic Data Analysis and Knowledge Transfer: ??In particular, finding the optimal model system to use in a follow-up study based on genetic signatures derived from human experiments is challenging because genetic networks can be quite different from species to species. I propose to use data-driven models to embed heterogeneous networks comprised of human genes and model species genes into a common, low-dimensional space to better compare genetic signatures between two (or even multiple) species. I will apply these methods to three specific tasks, but I emphasize that the results of this study will be transferable to any other biological problem where complex gene/protein interactions are a major component. I have surrounded myself with a great support team and developed a strong professional development plan. The freedom and support provided by the F32 fellowship will be instrumental in achieving my goal of becoming a professor with an independent research group.

Public Health Relevance

This proposal aims to develop novel computational approaches that systematically combine prior molecular network knowledge, powerful data-driven computational techniques, and large transcriptome data collections to answer three critical questions in biomedicine: 1) how to predict missing labels associated with millions of publicly available samples? 2) what molecular/cellular function can be attached to these samples and 3) how can we translate the findings from human data to a model species and back? The core goal of my fellowship is to achieve this by infusing prior-knowledge into state-of-the-art data-driven statistical/machine learning methods so that we can overcome two major hurdles in studying complex, multifactorial traits and diseases: a) complex genetic interactions underlie multi-factorial traits and diseases, and b) these traits and diseases often differ in how they manifest from patient to patient.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Postdoctoral Individual National Research Service Award (F32)
Project #
1F32GM134595-01
Application #
9835005
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Sakalian, Michael
Project Start
2019-09-01
Project End
2022-08-31
Budget Start
2019-09-01
Budget End
2020-08-31
Support Year
1
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Michigan State University
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
193247145
City
East Lansing
State
MI
Country
United States
Zip Code
48824