The biochemical reactions within cells follow the principle of mass action--the rate of a reaction is the product of the concentrations of the reactants--so that their behavior can be represented by a polynomial dynamical system. The steady states of a network of reactions are therefore solutions to a set of polynomial equations and form an algebraic variety. Despite the powerful mathematical tools that have been developed to analyze algebraic varieties, this feature has never previously been exploited in studying biological systems. The research team recently showed that for a particular class of biological networks, arising in multisite protein phosphorylation, the steady-state variety has remarkable geometric properties, which lead to new biological predictions as well as a method for drastically reducing the complexity of calculating steady states. In this project, these geometric ideas will be used to analyze a much broader class of cellular networks, especially the key modules that are repeatedly used in biological processes, including cascades, scaffolds and feedbacks. While the analysis will initially be at steady state, the team will also examine whether the unexpected benefits of polynomial dynamics can be extended to more complex dynamical behaviors, such as limit cycles. If successful, these studies will provide new methods for overcoming molecular complexity as well as a new geometrical language in which to formulate the principles of cellular regulation.

The molecular networks found within biological cells exhibit extraordinary complexity. Not only are there many components--there are around 23,000 genes in the human genome--but the proteins encoded by the genes may themselves become modified in multiple different ways by other proteins in a dynamic fashion. Although the researchers have developed excellent mathematical tools for dealing with genes, which are static entities, there are few such tools for dealing with the dynamics of proteins and their modifications. It is these dynamical processes that do the work of cellular physiology and which become destabilized in disease. For instance, a characteristic feature of Alzheimer's disease is uncontrolled modification of a protein called tau, while similar imbalances in protein modification have been found in many cancers. Unfortunately, the number of different protein states that can arise in such processes is enormous, making it hard to even understand what the processes are capable of doing, let alone to devise the right experiments to unravel them. At present, much of this complexity has to be glossed over. In this project, the investigators develop new mathematical tools for representing and analyzing such cellular processes that enable them to overcome some of the complexity rather than ignore it. In this way, the team hopes to characterize how cellular networks exploit molecular complexity to implement the physiological processes of life. Studies of this kind will help lay the foundation for a better molecular understanding of health and disease.

Project Report

The Human Genome Project revealed to us the fundamental genetic repertoire of humans. One of the great surprises from this effort was that the number of protein-coding genes in humans is about 20,000, which is almost the same number as in the lowly, soil-dwelling roundworm Caenorhabditis elegans. The much greater complexity of humans as compared to worms is revealed not in the number of genes but in the variety of different proteins and in their states of modification. For example, the human gene p53, which plays an essential role in maintaining the integrity of DNA and which is frequently misregulated in cancer, has millions of potential states of modification. When such complexity for a single protein is considered across the network of all proteins, the amount of molecular complexity in human cells is seen to be astronomical. One of the biggest challenges we face in exploiting the Human Genome Project is to rise above this molecular complexity and to elicit the biological principles that are hidden behind it. This will be essential if we are to understand the molecular basis of complex diseases like cancer, diabetes and Alzheimer's, which afflict an increasing proportion of the population, and thereby develop novel preventive and therapeutic strategies against them. This National Science Foundation project has introduced into biology two new mathematical methods for getting on top of molecular complexity. The first is a systematic method for undertraking "time-scale separation", an idea borrowed from physics and engineering. In this approach, we assume that some part of a complex system is operating sufficiently fast that it can be assumed to have reached a steady state and the steady-state equations are then used to eliminate that part of the system from consideration. This can greatly simplify the overall description of the system. The problem is that the process of elimination can be very complicated and it has previously been undertaken by ad-hoc methods. We have introduced a graph-based "linear framework" in which such elimination can be systematically carried out. We have shown that this new framework has wide applications in biology. For instance, it allows us to analyse how genes behave when they are being regulated by epigenomic mechanisms, of the kind that are emerging in follow-up projects to the Human Genome Project, such as ENCODE, the NIH Epigenomics Roadmap Project and the Interational Human Epigenome Consortium. The Human Genome Project showed how the structure of genes can modelled by sequences (in the four base pairs, A, T, C, G); our methodology shows how the function of genes can be modelled by graphs. The second method allows us to focus on a few components within a complex system and to determine how these components interact with each other, while distilling out the influence of all the other components. The relationship between the few components is described by what we call an "invariant". Mathematics offers powerful tools for calculating such invariants and we have shown that invariants based on just a couple of components, can encode the essential biological properties of highly complex systems. For instance, we have used invariants to show that molecular switches, no matter how complicated their internal workings, suffer from a fundamental trade off between being an efficient switch and being robust to changes in the switching environment. The invariants also reavel how this trade off can be circumvented and we have found that this is what happens in one the key molecular switches that regulates how the liver metabolises glucose. Our methods offer new capabilities for biology. They also open the door to new interactions between mathematics and biology and raise questions for which new kinds of mathematical tools witll be needed. An important aspect of this project was the involvement of undergraduate students, including those from under-represented groups. We organised a program of summer internships and were able to recruit a talented and diverse group of students from the mathematical sciences (mathematics, physics, computer science, engineering) to undertake research in biology. The program has been a great success and several important breakthroughs have actually been undertaken by our students. Several of the students have become co-authors, and in some cases first authors, on papers from the project. This has encouraged nearly all our students towards biology for their subsequent graduate work. We believe that training young people in this way offers the best hope for addressing the great scientific challenges that lie ahead in understanding how biology and human disease emerge from molecular complexity.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
0856285
Program Officer
Mary Ann Horn
Project Start
Project End
Budget Start
2009-09-01
Budget End
2013-08-31
Support Year
Fiscal Year
2008
Total Cost
$1,356,266
Indirect Cost
Name
Harvard University
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02138