Living entities of all sizes, from single cells to multi-cellular organisms, process energy, such as light or chemical energy, for use in growth, development and repair. They also process information, adapting their behavior to changing circumstances in their environment and to changing internal states. For example, a cell must adjust its production of specific proteins in response to changes in its environment, including signals from nearby cells, as well as new instructions from its own DNA. These responses amount to information processing by "biological computers," which suggests that we can advance our understanding of complex biological systems using the tools of information technology and computer science, as long as the distinctions between human-designed computers and naturally-occurring biological systems are respected. This research addresses three central points of difference. In direct contrast to current engineered systems, which are designed to minimize noise, are deployed with circuitry fixed at time of fabrication, and rely on a small number of identical subunits (for example, transistors on a two-dimensional chip), biological systems appear to: (1) use noise to their advantage; (2) dynamically adjust their processing methods; and (3) exploit an unusually diverse set of underlying mechanisms. Aided by computer simulations, the investigators will generalize classical results in the theory of computation, such as the Shannon-Lyupanov bounds for circuit size, to account for the constraints and differences known to obtain in biology. In addition they will bring into a mathematical framework the great variety of actual processing mechanisms, such as the parallel epigenetic regulation of gene expression, that are being continuously uncovered in large-scale laboratory investigations. Such work will aid in applying the vast investment made in the theory of computation over the last sixty years to the study of biological systems.

Broader Impacts: The project is expected to produce results useful to a wide scientific and engineering audience. It is anticipated that the new algorithms to be developed for the "reverse engineering" of biological systems will be applicable in other domains; therefore, a portion of the project resources is devoted to making techniques developed for the extraction of logical structures available for use by the broader community. Project resources will also be devoted to the training and mentoring of undergraduate and graduate students at the Santa Fe Institute in New Mexico, which provides a unique, research-focused, interdisciplinary educational experience. Students, recruited through the Institute's NSF REU (Research Experiences for Undergraduates) program and through the Investigators' network of collaborators at graduate programs, will conduct research, and publish their results in peer-reviewed literature, under the guidance of the Investigators. Finally, the Investigators will carry out outreach activities to public school teachers who teach in under-served rural areas in New Mexico and reach large numbers of students from minority groups under-represented in STEM fields. The outreach will be conducted in collaboration with Irene Lee, head of the NSF GUTS (Growing Up Thinking Scientifically) program. At the GUTS Summer Teacher Institutes in Socorro, New Mexico, the PI will introduce middle and high school teachers to the concepts of information processing in biological systems and work with them to develop ways to incorporate this content into lesson plans, brainstorming activities and games for use in science classrooms. The Investigators will also, with guidance from Lee, provide mentorship and coaching to secondary students involved in the New Mexico Supercomputing Challenge Program.

Project Report

This report reflects the culmination of three years of research, at the Santa Fe Institute in New Mexico and at Indiana University, involving over a dozen scientists drawn from a range of Universities in the United States and overseas. Working in collaboration, and aided by high-performance computing, we made new discoveries at the intersection of biology and computer science. Biological systems survive and flourish in large part because they behave intelligently. Rather than passively being given resources, they actively monitor their environment and process information. We're most used to seeing this happen on large scales (animals seeking food), but this happens even at the smallest scales and even with the DNA in our cells. This kind of decision-making can happen rapidly (over the course of a few seconds) or extremely slowly (over the course of generations, as cells adapt to their environments and "invent" new ways of acting). The computers we build for ourselves also process information. We've known for a long time that intelligent behavior and use of information is critical to life, but we have not yet understood the connection between this kind of decision-making and the ways we have programmed computers to learn and make decisions. The more we understand this connection, the more we can leverage our knowledge from engineering to advance biology. The research funded by this grant has made important advances in connecting the worlds of biology and of computer science. We have learned new ways to apply our understanding of how computers, and computer programs, operate to how living organisms operate. We have learned new things about how information is processed and channeled in the networks inside the human body, including the ways in which biological systems might bottleneck information in subsections of a network, or, conversely, try to share it out throughout the system as a whole. One of the key concepts we have relied on, and extended, is called "coarse graining". Coarse graining refers to how a decision-maker -- whether that be an individual or even a cell within an individual -- simplifies a description of its environment. By throwing out irrelevant facts in the right way -- coarse graining -- an organism can make better decisions, more quickly and more efficiently. Engineered systems do this all the time, but our work provided new ways to understand how biological systems might do the same thing. We created a new picture of how coarse-graining can help us (as scientists) understand a system, and how it can help a biological system such as a cell or organism communicate with other members of its kind. Another key concept is "uncertainty". In the absence of information, an organism will have only statistical knowledge of what could happen. As we discovered, a good way to track information is to track how uncertainty is reduced. In the past, such studies has been a primarily theoretical, but in a series of papers we demonstrated how to measure flows of uncertainty in the real world. We made new advances in the statistical inference of biological systems. One of our major results in this area concerns the structure of RNA, a crucial molecule related to the DNA that makes up our genetic code. By use of a set of tools -- known as "maximum entropy methods" -- we were able to discover patterns imprinted in the underlying sequences that make up RNA and that relate to the ways in which it arranges and folds itself up. Our work has had numerous spin-offs, including new algorithms for image processing, new computer code to study information flows in biological systems, and new insights into the social world. We conducted educational activities for talented youth at the middle and high school level, and developed new resources, including video lectures and online courses, to make our technical advances accessible at zero cost. We put effort into communicating with journalists and the public, and over a dozen articles on results and work made possible by this grant have appeared in the local, national, and international press, including the New York Times, Wired Magazine, Newsweek, the Chronicle of Higher Education, Ha'aretz, Discover Magazine, Science News, Forbes, the Santa Fe New Mexican, the Indianapolis Star, Quanta Magazine, Nautilus Magazine, and National Public Radio. Project leaders made special investments in the training of junior scientists, including both undergraduates and graduate students. This enabled us to pursue multiple, independent lines of research in an extremely cost-effective fashion. Because students could be compensated in part through investments in training and career development, our use of young researchers had the side-benefit of further building our nation's pool of talent. Students mentored in this project have gone on to graduate education, postdoctoral research, and into high-tech industries that use the algorithms and skills they developed during their time on this project.

Agency
National Science Foundation (NSF)
Institute
Emerging Frontiers (EF)
Type
Standard Grant (Standard)
Application #
1440458
Program Officer
Saran Twombly
Project Start
Project End
Budget Start
2014-03-01
Budget End
2014-08-31
Support Year
Fiscal Year
2014
Total Cost
$55,663
Indirect Cost
Name
Indiana University
Department
Type
DUNS #
City
Bloomington
State
IN
Country
United States
Zip Code
47401