Building scalable distributed heterogeneous systems of the future with easy-to-program software is broadly acknowledged to be a grand challenge. It is widely recognized that a major disruption is currently under way in the design of computer systems as processors strive to extend, and go beyond, the end-game of Moore’s Law. This disruption is manifest in new forms of heterogeneous and distributed processors and memories at all scales (on-chip, on-die, on-node, on-rack, on-cluster, and on-data-center), rendering scalability as a fundamental challenge at all levels. Healthcare analytics offers a unique opportunity to explore scalable system design for the 21st century because there has been a tectonic shift in the ability of medical institutions to capture and store medical data, and to even stream data in real time. This shift has already contributed to an ecosystem of Machine Learning (ML) models being trained for a variety of clinical tasks. A new distributed heterogeneous architecture is required to build systems that can develop and deploy ML models based on distributed healthcare data that must necessarily be accessed with privacy-preserving constraints. Further, the proposed architecture must be accompanied by a software framework that can address the needs of domain-specific data scientists to develop and augment ML models being deployed in their hospitals.

This planning grant project is exploring the foundational principles necessary in building integrated scalable distributed systems of the future, so as to prepare for submitting a full proposal to the PPoSS program. It uses the domain of healthcare analytics to motivate and concretize the research agenda, but the principles developed in this research should be applicable to other application domains as well. The exploration focuses on demonstrating an integrated platform that spans multiple levels of distribution and heterogeneity of computation and storage, while also obeying important privacy constraints. While recent progress on the use of ML in healthcare applications has been encouraging, current approaches do not a) scale to the degrees of parallelism, heterogeneity, and distribution that will be required in future systems, or b) support the soft real-time responsiveness to streaming data that is needed in many clinical situations. The originality of this project can be seen in the integration of distribution, heterogeneity, and privacy considerations in a single unified software/hardware stack, which includes adaptive resource management that spans privacy-preserving federated continuous learning, automatic specialization of ML models at individual sites, and automatic selection of ML models best suited for specific clinical tasks that maximize accuracy subject to different latency and soft real-time constraints.

This project’s end-to-end approach to develop foundational scalability principles will impact multiple areas of computer science through publications, tutorials and courses, thereby benefiting other researchers working on scalability challenges in future distributed heterogeneous systems. The use of healthcare analytics as a driving application has the potential to result in significant benefits to society, by demonstrating how knowledge distilled from multiple sources of data can be embodied in recommendation systems that can run onsite to provide time-critical decision support to physicians. As a further impact, the project will contribute to the training of Highly Qualified Personnel (HQP) at the intersection of Systems for ML and ML for Healthcare — two emerging inter-disciplinary communities that are currently growing independent of each other. Finally, this research will leverage existing activities at the PIs’ institutions that contribute to broadening participation of underrepresented groups in computing.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
2020-10-01
Budget End
2021-09-30
Support Year
Fiscal Year
2020
Total Cost
$50,000
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Type
DUNS #
City
Champaign
State
IL
Country
United States
Zip Code
61820