The proposed research aims at developing new mathematical and statistical results needed to efficiently analyze biochemical network models based on data arriving from the new molecular technology of "deep" DNA sequencing. The project will focus on developing likelihood-based estimates of the biochemical network parameters and structure from the data consisting of longitudinal species counts. With respect to parameter estimation, we shall (i) derive conditions on the data process which guarantee identifiability and estimators consistency as well as (ii) consider ways of approximating the likelihood of a partially observed biochemical network with certain other likelihoods (e.g., Gaussian) for which inference problem is simplified. With respect to network structure discovery, we shall develop methods of analyzing algebraic varieties associated with the geometry of chemical reaction networks in order to find stoichometry structure most consistent with given data. The theoretical results obtained will be used to develop a flexible framework for statistical analysis of deep sequencing data. The resulting algorithms will be implemented with software and their performance tested in real DNA sequencing experiments.
The deep (or next-gen) sequencing technology is a revolutionary, up-and-coming tool of modern molecular biology, allowing for very precise high-throughput measuring of DNA and RNA molecular counts in cellular systems. The next-gen technology will make it possible for biologists to formulate and test very specific hypothesis about biochemical interactions of various molecular species, provided that the proper mathematical modeling and statistical analysis tools (and their software implementations) will be broadly available. Due to his scientific background and an interdisciplinary nature of his work, the proposer is in a unique position to develop and then test such tools on data of biological relevance, ensuring that the mathematical results of this research are broadly disseminated to the scientific community of experimental biologists. By transforming the methodology for data analysis in DNA-sequencing, the proposed mathematical research will have broad influence on experimental high-throughput methodology in many different areas of modern genetics, ecology, and population studies. The project will also result in further promotion, both statewide and nationally, of the fields of mathematics and statistics in the context of biological research and the interdisciplinary training of young researchers.