All branches of modern science demand data, and modern data acquisition technologies in biomedical research have adapted to increase the volume of data by at least three orders of magnitude in the past three decades. Much of this data is rather complex. Complex data has three properties: high volume (e.g., genome data, gigabyte per person), multimodal (e.g., Time x Genes x Control), heterogeneous (e.g., different time scales, data feeds).How these data are analyzed now presents a major challenge in scientific discovery. Although a series of powerful algorithms for statistical learning can yield accurate predictions for pattern recognition problems, the increased power of these learning algorithms has come at the expense of a considerable increase in complexity for adjusting a large number of parameters in a high dimensional space. Yet, such mathematically sound models that are computationally stable and statistically meaningful still demand more data than most biology laboratories and clinics can provide. This workshop focus fits into the initiative between NSF and NIH to investigate developing science and technology for merging computational, physics based models with biology and medicine. This two day workshop at RPI fits into the initiative between NSF and NIH to investigate developing science and technology for merging computational, physics based models with biology and medicine. There will be a panel at the end of each day on a particular topic, e.g., how to integrate scales, how to deal with uncertainty and missing data, how to use data semantics in multi-scale modeling. The panel discussions will be organized into a report and will be made publicly available. The seminars and panel discussions will be recorded and made it available to public.

Project Report

We organized a two and a half day a data centric workshop which will examine multiscale approaches to modeling, control, computation, simulation, knowledge extraction, and visualization of complex systems with a focus on knowledge extraction and complexity management. The workshop considered solutions to address the large volume, heterogeneity, multidimensionality, and multimodality of data acquired primarily from biomedical domains. The structure/function relationship paradigm is central to understanding how biological systems function. The idea is deceptively simple: understanding the structural organization of biological systems, from massive ecological systems to the shape of a single protein, reveals the function of the system. The concept is powerful enough to have inspired a 200+ year-long effort to describe the components of our biological universe in ever finer detail, beginning with the Linnean taxonomic system of cataloging organisms based on their structural similarities, and culminating with microscale descriptions such as the complete genomes of several organisms, including humans. The reductionist approach to biological research has thus reigned supreme for generations, and as a result we now understand how the linear arrangement of nucleotides encodes the linear arrangement of amino acids, how proteins interact to form functional groups such as signal transduction and metabolic pathways, etc. But at each level of biological organization, we reach a wall- having reduced the complex biological universe to a myriad of minute parts, we encounter new forms of complexity: data overload and the "curse of dimensionality." Simply put, we’ve taken our biological machines apart but can’t put them back together again- our ability to accumulate reductionist data has outstripped our ability to understand it. Thus, we encounter a gap in the structure/function relationship: having accumulated an extraordinary amount of detailed information about biological structures, we can’t assemble it in a way that explains the correspondingly complex biological functions these structures perform. This gap is especially evident at the level of tissues, where most diseases and injuries are manifest. Heart disease and cancer remain the top two causes of death in the United States. One fundamental characteristic of both diseases is tissue failure: namely, errors in the structural organization and function of cells in the affected tissues. Likewise, it is estimated that one in six US residents requires medical treatment for an injury each year, yet the process of wound healing is so complex it is difficult to accurately predict how quickly most serious wounds will heal. Existing models of wound healing rely on clinically relevant, but somewhat superficial, measures of tissue state such as reduction in wound area, linear advancement of wound edge, pain, and ease of use. In fact, despite a multitude of genetic screens, biochemical assays, and imaging techniques, the "gold standard" for diagnosis and evaluation remains the expert opinion of highly trained pathologists who scan samples of the tissues in histopathology slides. In other words, the human eye is currently the most accurate tool we have available for identifying telltale alterations in the structure and function of diseased and damaged tissues. And it is clear that human judgment is not fail-proof: thousands of diseases are misdiagnosed every year, costing hundreds of millions of dollars in wasted or ineffective medical treatment. To improve diagnosis and treatment of diseases and wounds, we need a better understanding of how the tremendous number of cellular and subcellular parts is organized into functional tissues. One strategy for achieving this is to employ robust methods for describing complex systems, adapted from math and engineering disciplines far outside traditional biomedical fields. Viewed from this perspective, tissue organization and function can be treated as a design optimization problem: e.g., what is the optimal arrangement of cellular constituents that achieves the best tissue performance? This workshop considered methods to address this data driven modeling problem.

National Science Foundation (NSF)
Division of Information and Intelligent Systems (IIS)
Standard Grant (Standard)
Application #
Program Officer
Sylvia J. Spengler
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Rensselaer Polytechnic Institute
United States
Zip Code