Project Summary: The now well-known vision and challenge in post-genomics biology is to make the entire process of research scalable to large networks using high-throughput techniques and large-scale computation. Computational biology and bioinformatics have focused attention on the need for sophisticated methods for handling large databases and tools for modeling and simulating complex networks. Not as widely recognized is that the scalability of the more subtle processes of drawing meaningful and reliable scientific, medical, and biological inferences from the wealth of data and computation is equally important and requires the development of fundamentally new theory and software.

The research objective of this project is to develop the theoretical foundation and information technology infrastructure necessary to accelerate progress in systems biology, with concrete demonstrations on a variety of biological experiments. This ambitious goal requires augmenting bioinformatics and current modeling and simulationapproaches with greater understanding of the organizational principles underlying network complexity, including connections with molecular details, and exploiting this understanding to advance mainstream experimental biology. Building on recent breakthroughs in theory and scalable algorithms for systematic robustness analysis and model (in)validation of nonlinear network models with uncertain rate constants, the project maps out a research path that will (1) develop the necessary rigorous and practical mathematical theory; (2) embody it in a software environment that supports the complex iterative processes involved in going from raw data to modeling, analysis, and inference, with tight feedback to experimentation and modeling throughout; and (3) apply the theory and software to specific experimental studies in biology as a way of grounding the entire endeavor.

The intellectual merit combines immediate practical impact and conceptual depth. Automating and computationally augmenting scientific and mathematical inference from noisy and incomplete data for uncertain models has long been an elusive goal. Achieving it in the context of complex biological systems is for the first time both a necessity and an achievable goal. To do this, data and modeling assertions and questions must be described in a common framework that is biologically natural, yet can be stored, manipulated, shared, and ultimately turned over to powerful algorithms for resolution. Our objective is to create tools which make it possible to systematically answer questions such as: Is a proposed model consistent with experimental data? If so, is it robust to additional perturbations that are plausible but untested? Are different models at multiple scales of resolution consistent? What is the most promising experiment to refute or confirm a model? Traditionally, such network-level questions that arise naturally in biology have been considered computationally intractable, since they are typically stochastic, nonlinear, nonequilibrium, uncertain, involve multiple scales, and hybrid (mixing continuous and discrete mathematics), limiting approaches to heuristic and brute-force methods, or to extreme simplification. Recently this situation changed profoundly, based on new methods developed by the research team and their collaborators. A crucial insight is that evolution favors high robustness to uncertain environments and components, yet allows severe fragility to novel perturbations, and this "robust yet fragile" feature must be exploited explicitly in scalable algorithmic approaches.

The broader impact lies in the synergistic links this work forges with similar challenges that exist throughout science and technology, such as the Internet, aerospace systems design, materials science, multiscale physics, stochastic multiscale chemistry, and disturbance ecology. The theoretical foundations build broadly on robust control theory, dynamical systems, numerical analysis, operator theory, real algebraic geometry, computational complexity theory, duality and optimization, and semi-definite programming. The results will be made accessible to the broadest possible audience, both with representative and challenging experimental biology and the connections with other examples of complex systems. The preliminary progress already made by this team is striking and has been applied to understanding, for example, the robustness of complex control systems, the performance of internet protocols, and bacterial chemotaxis and stress response. The work is creating new mathematics and algorithms, beginning to appear in the highest-impact journals, and concretely demonstrating that this research can help experimental biologists. Diversity and breadth appear at every level. In the research group of the lead PI (Doyle), 6 of 11 graduate students and 2 of 4 postdoctoral scholars are women, and include a broad racial and ethnic diversity. The other 5 co- PIs are from a broad spectrum of disciplines and diverse but elite academic institutions, 3 are women, and all PIs have strong and very concrete commitments to integrative, multidisciplinary research, diversity, educational innovation, and outreach at every level including K-12. The team members are frequent featured speakers at integrative conferences and in interdisciplinary colloquia at premier universities, and speakers and organizers of workshops and short courses in systems biology. This program both directly involves leading mainstream biology, and has broad contact with it through additional collaborations, creating conduits to broad dissemination of the research results in biology. The team's algorithms and software infrastructure are becoming de facto standard tools empowering research in multiple disciplines, and forming a solid foundation upon which this program builds.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
0326635
Program Officer
Mitra Basu
Project Start
Project End
Budget Start
2003-12-15
Budget End
2008-11-30
Support Year
Fiscal Year
2003
Total Cost
$1,300,000
Indirect Cost
Name
California Institute of Technology
Department
Type
DUNS #
City
Pasadena
State
CA
Country
United States
Zip Code
91125