This project seeks to discover how the biology of the cell shapes the design rules of multidomain proteins. Multidomain proteins are mosaics of sequence fragments that encode structural or functional modules, called domains. The modular nature of a multidomain protein is integral to its function because different constituent domains play different functional roles. For example, in signaling proteins, some domains are responsible for the recognition and others for the transmission of an environmental signal. These modular proteins allow cells to interact with their world, via cell-cell signaling, cellular adhesion, and cellular migration. In human health, multidomain families are fundamental to apoptosis, innate immunity, inflammation response, and tissue repair. The multidomain architectures that are observed in nature represent a tiny fraction of possible domain combinations. These domain combinations are the product of the mutational processes that give rise to new sequence mosaics and the selective forces that promote or discourage their retention. In a given species, mutation and selection are both dependent on genome organization, mechanisms of DNA replication, transcription and repair, and the interaction of the cell with its environment. Multidomain architectures vary substantially across species, as do genomic and cellular properties. This project exploits this comparative framework to investigate how the biology of the cell shapes the processes of multidomain evolution. This research has the potential to transform our understanding of protein evolution by identifying multidomain design rules that may provide a foundation for predictive models linking evolution and function, with concrete applications for human health and protein engineering. This project advances research infrastructure through the development and distribution of computational tools that may contribute to national scientific resources. This project also contributes to building a broadly inclusive scientific work force through research experiences for women in Carnegie Mellon's undergraduate program in computational biology.

This project uses a three-pronged approach to investigate the universal and lineage-specific design rules of multidomain proteins. First, computational tools will be developed to infer evolution on the domain, gene, and species levels, by modeling a multidomain family as a set of domains that are co-evolving with the associated genes and species. Each entity is represented by an evolutionary tree. The history of evolutionary events is inferred using topological comparison of the domain, gene, and species trees. Combining information from three levels of biological organization reveals when domain events occurred relative to events in gene, genome, and organismal evolution, providing the information required to investigate how changes in domain architecture correlate with changes in genomic and cellular properties. Second, these methods are applied to reconstruct multidomain evolution in vertebrate and proteobacterial genomes, revealing shared and lineage-specific evolutionary patterns. Third, comparison of these evolutionary patterns with differences in genome organization and cellular machinery in vertebrate and proteobacterial cells will support the inference of design rules for multidomain evolution across the tree of life. The resulting data and computational tools will be available at www.cs.cmu.edu/??durand/Lab/multidomain.html.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Type
Standard Grant (Standard)
Application #
1838344
Program Officer
Peter McCartney
Project Start
Project End
Budget Start
2018-08-01
Budget End
2021-07-31
Support Year
Fiscal Year
2018
Total Cost
$299,853
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213