Obtaining the genome sequence of an organism provides a blueprint, but not one that we yet know how to read. The instructions encoded by regulatory DNA remain particularly obscure. This CAREER project will advance the mapping of biological networks. Specific interactions between protein transcription factors and DNA regulatory elements will be predicted purely from genome sequence and inferred protein structure using all-atom, explicit solvent simulations of transcription factors bound to DNA. Advances in molecular force-fields and simulation algorithms, increases in computer speed, and homologous binding modes within transcription factor families make this a feasible goal. New algorithms will be developed for protein-DNA simulations and binding site predictions. Binding sites for members of selected gene families will be determined, with the vision of enabling a prediction for every family member within a genome. In a complementary effort, the human protein-protein interaction network will be analyzed at the domain level and the topological organization of protein subunits within protein complexes will be predicted. This project will have broader impact through dissemination of algorithms and data sets generated by the research plan, including public databases of protein-DNA and protein-protein interactions. Scientific outgrowths of this project include improved priors for Bayesian prediction of transcriptional regulatory networks, synergy with experimental methods for binding site analysis, anticipated advances in organism-specific research, and release of a large-scale human protein interaction network. Undergraduate, graduate and postdoctoral researchers will be mentored and trained; new course material will be developed at the undergraduate and graduate levels, essential for the rapidly evolving area of computational biology and bioinformatics; and outreach programs will provide science enrichment and mentoring to public high school students and professional development opportunities for their teachers. All results will be disseminated through appropriate channels, including peer-reviewed publications, conferences, workshops, freely available software and databases, and on-line course material.
Genome projects have revealed the sequence of nucleotides in our DNA and the proteins they encode, but they provide little information about how proteins function together to become living systems. The intellectual merit of this project has been to develop computational methods that predict how protein components interact with each other to form information-carrying networks and dynamic molecular machines. We created 3D molecular models of transcription factor proteins and DNA to predict how transcription factors recognize specific DNA sequences to turn genes on and off. Proteins interact with each other to form networks, and many of the methods that have been used to analyze social networks between people are valuable for analyzing protein networks as well. One of the challenges of analyzing protein networks is that interactions have many different purposes: physical binding within a complex; post-translational modifications; enzymatic or metabolic transformations. Other interactions describe whether proteins function upstream, downstream, or in parallel with other pathways. We have developed methods that combine many different types of interaction data to provide a more complete picture of how a cell works. These methods have been used to understand how cells in the immune system engulf bacterial prey and how cells respond to DNA damage. Our current work extends network analysis into the dynamic domain. We are exploring how protein complexes are reorganized or reprogrammed in response to different cellular needs. These methods could reveal how other types of networks, including human social networks, change over time. This work is having broader impact through application to specific biological problems, such as improving crop yield by understanding plant signaling pathways, uncovering the mechanism of genetic disorders, and revealing how pathogens exploit weaknesses in the gene and protein networks of a host. A new course, Systems Bioengineering from Genes to Cells, has been developed to teach network biology to undergraduates. The lab has participated in science outreach to a public magnet school, and several high school students have been trainees.