In living cells, signaling pathways composed of protein-protein interactions communicate information about extracellular conditions from the cell wall to the nucleus, leading to changes in the expression of genes and their protein products that enable the cell to adapt and survive in diverse environments. Biologists have uncovered portions of certain signaling pathways, but the current understanding of full signaling network structures is far from complete. Due to intrinsic difficulties associated with in vivo measurement, this research considers the problem of inferring the structure of a cellular signaling network using data generated by existing high-throughput experiments that indicate which proteins are utilized in each signaling pathway. Cell signaling networks underlie the growth, development and survival of living cells, and therefore the results of this project may advance the state of knowledge in the critical areas of human disease, biosensor development, and biofuel manufacturing.
This project investigates a new technique for the reconstruction of cell signaling networks that is based on data generated by existing high-throughput experiments that indicate which proteins are utilized in each signaling pathway, but do not directly reveal the structure/order of the pathways. The cell signaling networks and the experimental data are mathematically modeled by a shuffled Markov process, which accounts for the fact that the data do not reveal the pathway structure/order. The shuffled Markov model reduces the network reconstruction problem to the task of inferring the Markov transition matrix. Computationally efficient inference algorithms, based on expectation-maximization and importance sampling techniques, are developed for this task. Computational experiments using real and synthetic biological data, as well as mathematical analysis techniques, demonstrate the capabilities of the model and algorithms.