The long-term goal of this research is to reveal the key regulators that determine the usually ordered development of an animal from undifferentiated pluripotent cells to specialized cells that carry out all of the functions in our body. The coordinated expression of the genome underlies these processes and is orchestrated by networks of interacting genes that we are only beginning to unveil. Cell circuitry is complex but the discovery of the Yamanaka factors demonstrates that even less than a handful of transcription factors can exert profound changes on cell and tissue fates. Thus, the combinations of genes needed to unlock cell determinants seem tantalizingly parsimonious. Large-scale projects are underway to catalog the genomic, epigenomic, and functional genomic landscapes of many different cells in multiple different organisms. As high- throughput techniques such as DNA and RNA sequencing mature, there is an increase in demand for integrative approaches to elucidate the rules underlying intrinsic, adaptive, and programmed phenotypic changes that cells undergo that can be inferred from such data. Our starting point will be to extend the pathway integrative framework developed over the past several years for the interpretation of cancer genomics datasets for the Cancer Genome Atlas project. Extensions to the input pathways used, and advances in the model to enrich the formal representation, will be developed so that a breadth of datasets in human and model organisms can be analyzed. The approach will culminate in the combining of machine-learning classification with probabilistic graphical models. The classifiers will identify predictive pathway features for cell state distinctions in a large database. Genetic manipulations among these features can then be proposed, in any combination, as formal interventions on the graphical model of the resulting classifiers, a major advantage of this work. The pathway models will be applied to the prediction of factors that can confer differentiation and de-differentiation queues to human cortical neurons. Computationally predicted gene perturbations in this system will be tested in living cells. Identifying critical modulators of the cell fate decisions underlying the conversion of stem cells to neural progenitors to mature neural cell types will advance our understanding of neural development. These same regulators may also play an important role in glioma, a disease where the tumor cells appear to be in a neural progenitor-like state. Taken together, the proposed theoretical and applied informatics approaches will contribute powerful tools for interpreting and predicting both routine and aberrant cellular responses. Researchers will be able to query the complex networks with computer algorithms as high fidelity surrogates. In the not so distant future, our hope is to advance our understanding of normal differentiation and shed light on how the regulation of these programs breaks down in disease processes like cancer, shedding light on diagnostic, prognostic, and therapeutic strategies.
This project aims to extend machine-learning and probabilistic graphical modeling approaches developed in the field of cancer genomics to the analysis of a broad range of human and model organism datasets. Novel methods for proposing genetic perturbations using a formal computational analysis will be developed and tested for their ability to suggest pluripotent and lineage-committing factors in a neural progenitor differentiation assay. The methods developed will contribute significant theoretical advances as well as reveal common mechanisms of stem cells and tumor biology to shed light on new treatment options for cancer.