Complex biological processes, including organ development, immune response and disease progression, are inherently dynamic. Learning their regulatory architecture requires understanding how components of a large system dynamically interact with each other and give rise to emergent behavior. Recent experimental advances have made ii possible to investigate these biological systems in a data-driven fashion al high temporal resolution, allowing identification of new genes and their regulatory interactions. Longitudinal omics data sets are becoming increasingly common in clinical practice as well. Information on these collections of interacting genes can be integrated to gain systems-level insights into the roles of biological pathways and processes, including progression of diseases. Consequently, developing interpretable methods for learning functional relationships among genes, proteins or metabolites from high-dimensional time series data has become a timely research problem. The nature of these time-course data sets presents exciting opportunities and interesting challenges from a statistical perspective. Typical time-course omics data sets are challenging because of their high-dimensionality and non-linear relationships among system components. To tackle these challenges, one needs sophisticated dimension-reduction techniques that are biologically meaningful, computationally efficient and allow uncertainty quantification. Methods that incorporate prior biological information (e.g., pathway membership, protein-protein interactions) into the data analysis are good candidates for analyzing such high-dimensional systems using small samples. Here, we will develop three core methods to address the above challenges - (Aim 1): an empirical Bayes framework for clustering high-dimensional omics time-course data using prior biological knowledge;
(Aim 2) : a quantile-based Granger causality framework for learning interactions among genes or metabolites from their lead-lag relationships;
and (Aim 3) : a decision tree ensemble framework for searching cascades of interactions among genes from their temporal expression profiles. Our interdisciplinary team of statisticians and scientists will analyze time-course ornics data from three research projects: (i) innate immune response systems in Drosophila, (ii) developmental process in mouse models, and (ii) longitudinal metabolite profiling of TB patients. These insights will be used to build and validate our methodology, which will be implemented in a publicly available software. This proposal is innovative in its incorporation of prior biological knowledge in the framework of novel dimension reduction techniques for interrogating high-dimensional time-course omics data. This research is significant in that it will impact basic sciences by elucidating data-driven, testable hypotheses on the regulatory architecture of biological processes, and clinical practice by monitoring disease progression and prognosis.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZGM1)
Program Officer
Resat, Haluk
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Cornell University
United States
Zip Code