The goal of most high-throughput biological experiments is to generate specific hypotheses about how genes, proteins, and other entities involved in a biological process control cellular functions and other aspects of cellular phenotypes. Time series algorithms have proven to be a powerful computational strategy for understanding transcriptional regulation from gene expression data by using the ordering of events to help identify regulatory interactions. However, there are limits to the quality and types of interactions that can be inferred from bulk gene expression data, which are averaged over many different cells and do not reflect protein post-translational modifications. Single-cell RNA-seq measures gene expression in individual cells, capturing diverse cellular states and fine-grained progression through biological processes. Mass spectrometry-based phosphoproteomics data reveal the rapid and widespread changes in protein phosphorylation that occur over time during cellular signaling responses. These two types of data have great potential for discovering condition-specific transcriptional regulatory and signaling mechanisms. This project will design novel time series algorithms to improve the prediction of transcriptional and signaling network interactions. These methods will be developed as open source software so that the computational approaches can be adopted in a wide variety of biological systems. Time series data analysis will also be incorporated into local public outreach programs and new training workshops designed to teach computational thinking to biology students.
Biological processes are dynamic, and single-cell gene expression data can provide noisy snapshots of temporal behaviors. Existing computational techniques can estimate each cell's progression through a temporal process from single-cell data, producing cell-specific pseudo-times. These pseudo-time series data are imprecise and irregularly-spaced but contain far more time points than a traditional time series experiment. This project will use causality analysis techniques that require a large number of time points to infer causal gene-gene regulatory relationships. Kernel methods will adapt the causality algorithms to accommodate the heterogeneity of single-cell data and combinatorial nature of transcriptional regulation. To study cellular signaling, computational analysis of time series phosphorylation changes will determine not only which proteins are active in a signaling response but also the temporal intervals in which they are involved. The project will assess both non-stationary Gaussian processes and hidden Markov models for modeling temporal changes in protein phosphorylation. A novel signaling pathway reconstruction algorithm will then flexibly integrate many types of data-derived constraints (for example, early responders cannot be activated by late responders, protein-protein interaction network connectivity relationships, etc.) using statistical relational learning techniques for probabilistic reasoning. Additional information about the project will be available at www.biostat.wisc.edu/~gitter/.