A 2000 report by the Institute of Medicine indicated that 100,000 patients die in the U.S. each year due to suboptimal treatment decisions made by healthcare professionals who operate under pressure of time while processing overwhelming amounts of data, particularly in the operating rooms and intensive care units. Corrective measures and new regulations have not led to improvements yet. In developed countries, about 6% of the economy is spent on upkeep of infrastructure, and about 50% of the original acquisition cost is spent on maintenance of equipment. Managers of, e.g., fleets of aircraft, are too often unable to maintain the required levels of equipment availability due to unexpected, but identifiable in data, crises in maintenance and logistics: only about 2/3 of the U.S. military aircraft can be flown at once. Expediting solutions often causes hundreds of millions in avoidable expenses. These two examples, as well as many other societally, economically, and scientifically important domains of human activity, involve large amounts of multi-stream data, which may carry information helpful in mitigating some of the adversities. Existing research efforts produce algorithms that extract useful information from individual sources of data. Substantial new benefits, however, could be realized by exploiting relationships between streams of data. This research program will comprehensively and pragmatically explore that opportunity, and impact communities of healthcare practitioners, equipment managers, as well as users in other domains wherever multiple streams of corroborative evidence are available, it will also benefit students and trainees, and the scientific community at large.
This research project will develop and extensively evaluate new algorithms to identify informative correlations between multiple and diverse streams of large, multivariate, numeric and symbolic, potentially sparse data. These algorithms will identify subsets of features and records of data that follow distinct cross-stream relationship patterns, enable descriptive and predictive analytics in static and temporal settings, and allow robust forecasts. The proposed work will build on prior efforts towards detection of complex anomalous patterns in single streams of multidimensional data, and expand it towards cross-stream analysis of multiple data sources. Expected results will allow, e.g., detection of patterns of change in relationships between vital signs measured at the bedside of intensive care patients and their records of treatment or medication, and outcomes. These patterns may be indicative of non-standard responses of a patient to a treatment, or signal emergence of a health crisis. In a broader perspective, this effort will substantially expand capabilities of current techniques of cross-stream analytics such as Canonical Correlation Analysis, it will develop a new variant of Gaussian Processes framework to model inter-stream dynamics, produce new information-theoretical modeling of correlations between streams of symbolic variables (with an extension to handle datasets with a mix of numeric and symbolic features), and it will provide a framework for identifying multimodal structures of cross-stream relationships, including disjunctive and conjunctive-disjunctive patterns. It will take on highly challenging intellectual endeavors while aiming for significant benefits of societal importance, with the primary impact demonstration areas in bed-side informatics and equipment health management. Resulting software implementations of the new algorithms and illustrative examples of their use will be shared with the general research community. The bulk of the proposed work will be performed by graduate students and medical fellows who will immediately use the acquired knowledge in their careers. The PIs will include the results in outreach activities and in their training and teaching course materials.