Transcription factors and microRNAs are essential regulatory molecules (RM) that control messenger RNAs (mRNA) and are known to be dysregulated in human diseases. Each RM may affect multiple pathways of the cell which is both a blessing and a curse. If a therapy targets the proper RM, it can attack the disease from multiple fronts and increase efficacy. On the other hand, targeted therapy may result in serious adverse effects due to our limited knowledge of the downstream causal effect of RM manipulation. Although the local bindings between single RMs and their targets have been studied computationally and experimentally, the intensity of functional consequences of such bindings on the transcriptome is unclear. Here, I propose statistical machine learning techniques and causal inference methods to predict the observed variability of gene expression using only regulatory molecules and estimate their downstream causal effect on the entire transcriptome. To achieve this goal, I start in Aim 1 by building a multi-response predictive model to predict the whole transcriptome using only RMs. This goal is challenging because the dimension of the response vector is more than the number of samples and I will use techniques from high-dimensional statistics to address this issue.
In Aim 2, I will go beyond predictive modeling by estimating the causal effect of RMs on the transcriptome using invariant causal prediction. I will leverage the rapidly growing literature which connects causal inference to invariant prediction accuracy across heterogeneous data sources to infer the causal effect of RMs on mRNA. Having developed both predictive and causal models of RMs contribution to gene regulation, in Aim 3 during the R00 phase, I will focus on the most recent advances in double/debiased machine learning which allows the use of scalable machine learning methods for reliable estimation of causal effect of RMs on transcription. My proposed research will bring the most advanced statistical machine learning and causal inference techniques to genomics research and help design more effective targeted therapies by providing insights into the global role of RMs in gene expression regulation. During the training phase of the award, with the support of my outstanding mentoring team and scientific advisory committee, I will gain expertise in molecular biology and genomics while perfecting my knowledge of causal inference and machine learning. The Ohio State University Comprehensive Cancer Center ? James Hospital and the Mathematical Biosciences Institute will provide me with the ideal interdisciplinary environment to bridge data science and genomics and will help me achieve my career development goals and transition to a tenure-track faculty position.

Public Health Relevance

Targeting regulatory elements has the advantage of concurrently attacking multiple damaged pathways, with the possible pitfall of toxicity due to adverse unintended effects. Understanding the functional global impact of therapy on the entire transcriptome will guide us in designing drugs with higher efficacy and lower toxicity. I propose to leverage advanced causal inference and machine learning methods to elucidate the functional role of regulatory molecules across tissue types.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Career Transition Award (K99)
Project #
1K99HG011367-01
Application #
10040882
Study Section
National Human Genome Research Institute Initial Review Group (GNOM)
Program Officer
Pillai, Ajay
Project Start
2020-08-20
Project End
2022-07-31
Budget Start
2020-08-20
Budget End
2021-07-31
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Ohio State University
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
832127323
City
Columbus
State
OH
Country
United States
Zip Code
43210