Modern computational techniques based on machine-learning (ML) and, more recently, deep-learning (DL) are playing a critical role in realizing the precision medicine initiative. However, there is a critical need to systematically combine these powerful data-driven techniques with prior molecular network knowledge to make more accurate predictive models while also satisfactorily explaining their predictions in terms of mechanisms underlying complex traits and diseases. I propose to use domain specific knowledge from biology and computing to tackle three outstanding problems: 1) how to predict missing labels associated with millions of publicly available samples? 2) what molecular/cellular function can be attached to these samples and 3) how can we translate the findings from human data to a model species and back? ?Network-constrained Deep Learning for Metadata Imputation: ??Most multifactorial phenotypes are tissue dependent and manifest differently depending on age, sex, and ethnicity. However, a majority of publicly-available genomic data lack these labels. I will develop a network-guided approach to predict missing metadata of samples based on their expression profiles by designing novel data-driven models where the model architecture and/or structure of the input data are constrained by an underlying gene network. ?Network-guided Functional Analysis of Genomic Data: ??High-throughput experiments often generate lists of genes of interest that are hard to interpret. Functional enrichment analysis (FEA) is a powerful tool that attaches functional meaning to an experimental set of genes by summarizing them into sets of pathways/processes. However, standard FEA analysis is limited by incomplete knowledge of gene function, lack of context of the underlying gene network, and noise in expression data. I will address these limitations by developing a network-guided approach that jointly captures genes, their interactions, and their known biological pathways/processes into a common, low-dimensional space that facilitates deriving biological meaning by comparing the distance between the experimental gene set and the pathway/process of interest. ?Joint Multi-Species Genomic Data Analysis and Knowledge Transfer: ??In particular, finding the optimal model system to use in a follow-up study based on genetic signatures derived from human experiments is challenging because genetic networks can be quite different from species to species. I propose to use data-driven models to embed heterogeneous networks comprised of human genes and model species genes into a common, low-dimensional space to better compare genetic signatures between two (or even multiple) species. I will apply these methods to three specific tasks, but I emphasize that the results of this study will be transferable to any other biological problem where complex gene/protein interactions are a major component. I have surrounded myself with a great support team and developed a strong professional development plan. The freedom and support provided by the F32 fellowship will be instrumental in achieving my goal of becoming a professor with an independent research group.
This proposal aims to develop novel computational approaches that systematically combine prior molecular network knowledge, powerful data-driven computational techniques, and large transcriptome data collections to answer three critical questions in biomedicine: 1) how to predict missing labels associated with millions of publicly available samples? 2) what molecular/cellular function can be attached to these samples and 3) how can we translate the findings from human data to a model species and back? The core goal of my fellowship is to achieve this by infusing prior-knowledge into state-of-the-art data-driven statistical/machine learning methods so that we can overcome two major hurdles in studying complex, multifactorial traits and diseases: a) complex genetic interactions underlie multi-factorial traits and diseases, and b) these traits and diseases often differ in how they manifest from patient to patient.