The focus of the proposed research is to develop a computational framework that allows for the systematic exploration of the growing human "physiome" for clues to predict and prevent major diseases. This research is motivated by the observation that despite recent progress in medicine, the disease burden of many important clinical conditions remains unacceptably high because of a failure to promptly match patients to treatments that are appropriate to their current condition or their individual risk. This situation is true of many different areas of medicine: in cardiology, over 300,000 deaths take place each year due to fatal arrhythmias; in psychiatry, 34% of patients with bipolar disorder have an interval longer than 10 years before they first receive a diagnosis; in the setting of intensive care, there are over 650,00 cases of unanticipated sepsis each year. These are only some examples, where the therapies to significantly improve patient outcomes and reduce healthcare costs exist, but are often not applied in a timely fashion due to the absence of adequate information-based tools to guide their use.

There is a critical need in this setting for novel biomarkers to guide decision-making. Particularly striking is the lack of good biomarkers that exploit recent advances in acquiring large physiological datasets continuously from patients over long periods. The existing practice of discovering disease markers in such data is highly dependent on human input and subjective abilities; inappropriate for large datasets and subtle markers; and unable to generalize for different systems and diseases. The goal of this research is to bridge this gap, through computational methods for the structured discovery of novel, highly-discriminative risk markers from terabytes and even potentially petabytes of physiological data.

Inspired by the translational impact of computational biology in extracting valuable insights from large volumes of genomic and proteomic data, this research endeavors to lay the foundations of a complementary body of research focused around a vision of "computational physiomics". The PI proposes a computational framework where large volumes of waveform data are first abstracted into a uniform string representation, and the resulting physiological text is then studied for characteristics associated with risk. The abstraction of physiological signals into text creates the opportunity to study these signals in a fundamentally different manner from earlier efforts (and to thereby discover new insights). As part of this work, the PI will address the challenges associated with transforming the many different kinds of physiological time-series signals (e.g., quasi-periodic, aperiodic, non-uniformly sampled, multi-channel) into symbolic sequences, registering these symbols across patients, formulating problems relating to risk stratification in the context of textual data, and developing algorithms to efficiently and accurately solve these problem statements. In addition, through extensive collaborations with clinical colleagues, he will rigorously evaluate the clinical utility of the research on real-world datasets drawn from different high-impact clinical applications.

Early investigation of the ideas that form the basis of the proposed work have shown great promise for cardiovascular applications. Clinical studies in two separate cohorts with nearly 6,000 patients show that the computational framework enables the discovery of risk markers from ECG signals that identify patients at an 8-9 fold increased risk of death within three months of a heart attack, and moreover, that this information is independent of other generally accepted risk variables (e.g., demographics, comorbidities, imaging results, biomarkers, other ECG variables etc.). The research should enable continued progress in the case of cardiovascular disease, as well as similar progress for other focus applications (e.g., psychiatry, critical care, neurology, and obstetrics) through clinical collaborations. In addition to impact in these specific cases, the research also lays the foundation of a broader body of research (i.e., the vision of "computational physiomics") that can transform medical data analysis by including large volumes of continuous physiological signals in a rubric that today can only handle discrete data. This research represents a central piece in this context that connects advances in continuous patient monitoring to advances in classification methods. While the research is motivated by clinical applications, the computational questions addressed by the work and the techniques will also advance existing work on time-series prediction and on sequential data mining and machine learning more generally. The challenge of extracting insights from large volumes of time-series data is increasingly important across many different disciplines. The research promises to significantly advance both the broad goal of time-series analysis, as well as individual sub-problems in the areas of motif discovery, long-term signal comparison, anomaly detection, characterizing complexity, and identifying structure in apparently noisy signals.

This research will help to establish a strong inter-disciplinary program in computer science and medicine at the University of Michigan. This program will be inherently translational, and have a significant education component that provides graduate and undergraduate students with coursework and research opportunities exposing them to real-world medical problems, complex and large clinical datasets, computational methods, and the design of experiments. The PI will use the methods and materials related to this proposal both in the development of new courses (such as the biomedical machine learning course that the PI has introduced at Michigan), and to enrich existing courses in algorithms and data structures (which the PI teaches) and artificial intelligence and machine learning (which are taught by other faculty in the CSE department). Replacing some of the traditional applications covered in these courses by applications of clear importance to human health should engender a sense of excitement among students as they understand first-hand the role computation can have in improving the human condition. Developing, implementing, and evaluating our computational framework will also generate several undergraduate research positions that will allow students to experience and learn about multi-disciplinary research. The PI is also committed to making the datasets that form the basis of the proposed research available to the broader research and educational communities for use in a de-identified manner. This will enable researchers who do not have established clinical partners to enter the research area and educators to construct laboratory activities, and facilitate the uniform assessment of risk stratification algorithms on a common set of signals.

Project Report

This project supported educational and research activities in the area of computational medicine. The research contributions of this project focussed on the exploration of a framework analogous to computational biology for application to EHR data with the goal of improving patient care. Some of the topics studied as part of this work included producing symbolic abstractions of physiological data and developing problem formulations on these symbolic data relevant to clinical applications. The results from clinical studies on cardiovascular data (reported in the literature) show that this research holds the potential to substantially improve the management of patients with acute coronary syndrome and in doing so provide an example of how computation has the ability to positively impact health. The educational contributions of this project included the training of graduate and undergraduate students in problems at the intersection of computer science and medicine, and the development of coursework opportunities and conference venues empowering medical informatics. The project also provided the basis for outreach to students with the goal of inspiring greater participation in science and engineering by offering new opportunities at the boundaries of engineering; and giving a sense of how careers in science and engineering can improve the human condition.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1054419
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2011-02-01
Budget End
2013-05-31
Support Year
Fiscal Year
2010
Total Cost
$87,538
Indirect Cost
Name
Regents of the University of Michigan - Ann Arbor
Department
Type
DUNS #
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109