Recent technological advances in massively parallel mutagenesis and deep DNA sequencing are enabling researchers to discover essential genetic networks in complex cellular systems and under what conditions those genetic networks are essential. But identifying such conditionally essential networks (CENs) has been challenging for computational and statistical reasons. The goal of this project is to elucidate and validate CENs in the protein homeostasis system by developing computationally efficient and statistically accurate methods for analyzing deep DNA sequencing data from massively parallel mutagenesis experiments. Dysregulation of the protein homeostasis system leads to imbalances in the proteome which can cause neurodegenerative pathologies such as Alzheimer's, Huntington's, or Parkinson's disease. Developing a method for learning CENs in the protein homeostasis system will lead to a better fundamental understanding of this complex system and will inform combination therapeutics for neurodegenerative diseases. More broadly, a statistically rigorous tool for analyzing massively parallel mutagenesis experiments would allow researchers to discover CENs in other complex molecular systems. The team is well-prepared to complete the specific aims of this project because of their preliminary nonparametric Bayesian model development, their preliminary experimental data from the the protein homeostasis system, their experience with developing statistical models for learning from genomic data, their track record of collaborative research together, and the computational and experimental enviromnent at their institution. To complete the overall objective, the team will accomplish the following specific aims: (1) develop and validate a nonparametric Bayesian model for identifying CENs from massively parallel mutagenesis deep sequencing experiments, and (2) identify and validate protein homeostasis CENs using transposon sequencing experiments. This project will create new statistical methods, models, and software for analyzing DNA sequencing data from bulk and purified samples from massively parallel mutagenesis experiments to discover latent conditionally essential networks.
The research aims of this project will advance understanding of nonparametric Bayesian statistical analysis and protein homeostasis molecular biology, and those research aims connect directly to broader impacts that advance the full participation of women and minorities in STEM fields and improve well-being of individuals in society. In partnership with Girls, Inc of Holyoke, MA, a workshop titled My DNA, My Medicine will be developed to encourage participation of middle and high school students in statistics, computer science, and genetics.