A living system is distinguished from most of its non-living counterparts by the way it stores and transmits information. It is just this biological information that is the key to the biological functions. It is also at the heart of the conceptual basis of what we call systems biology. Much of the conceptual structure of systems biology can be built around the fundamental ideas concerning the storage transmission and use of biological information. Biological information resides, of course, in digital sequences in molecules like DNA and RNA, but also in 3-dimensional structures, chemical modifications, chemical activities, both of small molecules and enzymes, and in other components and properties of biological systems at many levels. The information depends critically on how each unit interacts with, and is related to, other components of the system. Biological information is therefore inherently context-dependent, which raises significant issues concerning its quantitative measure and representation. An important and immediate issue for the effective theoretical treatment of biological systems then is: how can context-dependent information be usefully represented and measured? This is important both to the understanding of the storage and flow of information that occurs in the functioning of biological systems and in evolution. This work involves both new ideas and the integration of new ideas. It represents new mathematical methods as well as a novel integration of approaches that are focused on the very real and practical problems of biological data analysis. The PI as developed a new conceptual approach that is novel and mathematically well-defined, exploring the relationships between graph properties and set complexity and considering new approaches to network analysis. New interaction distance measures are considered with a new way of dealing with especially large data sets, especially the maximal information coefficient, for which a general approach may be possible, certainly for a small number of variables, and possibly in the general case. The ideas will be tested on a number of diverse biological data sets, especially around gene expressions, and other variants. Current methods often fail in the face of truly complex dependencies in large data sets, and powerful new methods would be of high value. This work involves both new ideas and the integration of new ideas. It represents new mathematical methods as well as a novel integration of approaches that are focused on the very real and practical problems of biological data analysis.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1340619
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2013-07-01
Budget End
2015-06-30
Support Year
Fiscal Year
2013
Total Cost
$299,740
Indirect Cost
Name
Pacific Northwest Research Institute
Department
Type
DUNS #
City
Seattle
State
WA
Country
United States
Zip Code
98122