In recent years, the availability of clinically relevant datasets has grown enormously. A good understanding of how to organize, process, and transform these data into actionable knowledge is crucial. This research aims to unlock the potential of these data through the exploration of new fundamental research directions and approaches in machine learning. Targeting patients identified as high-risk by through computational data-driven models could reduce the burden of disease in a cost-effective manner. While machine learning opportunities in medicine continue to grow, there have been relatively few successes regarding translation to practice. Clinicians still base the bulk of their daily decisions on relatively small amounts of patient-specific data. The technical contributions made here will enable the meaningful use of complex medical data. Beyond the long-term societal impact, this work will provide valuable student training through research projects related to the proposed objectives. Targeted outreach activities that focus on the societal impacts of computational research will attract a diverse set of graduate students to the field. In addition, this work will help lay the foundation for a new project-based course focusing on applications of machine learning in clinical care. As the field continues to grow, such courses will become critical for equipping the next generation of students with the required tools and insights. Finally, critical inter-departmental collaborations between computer science and engineering and medicine will grow as a result of this work, leading to the enrichment of both fields.
The primary research objective of this proposal is to increase the utility of machine learning in clinical care, through the exploration of new fundamental research directions and approaches in ML. For data-driven predictive models to become widely and safely adopted in clinical care, there remain several key research challenges that the ML community must address: poor adaptability to complex unexpected changes in patient populations and clinical protocols, insufficient intelligibility of accurate but uninterpretable models, and absence of actionability, with accuracy overcoming actionability. The PI proposes the development of new transfer learning techniques for learning robust and adaptable models in a wide range of scenarios. Experiments and evaluations with large-scale clinical datasets will offer insight into how these data change over time, and a better understanding of when and how models should adapt. Clinical decision models and software are seldom incorporated into practice because they are either black-box or the output (while accurate) does not offer any insight into how to act. One way to increase the intelligibility of models is to focus on building clinically meaningful features. Another way to increase intelligibility is through sparsity. The PI will investigate feature engineering/selection methods for learning useful abstractions that automatically leverage expert knowledge and for learning models based on actionable features. The PI will explore structured regularization techniques to select modifiable features. To gain a better understanding of how different actions affect patient risk, the PI will address the limitations of causal inference in the context of high-dimensional observational datasets. This research will yield methods for producing clinically meaningful inputs, and methods for jointly optimizing sparsity and actionability. The proposed work will yield novel techniques for extracting and building adaptable, intelligible, and actionable models from patient data. An emphasis on adaptable solutions will ensure that such techniques can be safely adopted long-term. The study of techniques for dealing with the inherent heterogeneity of the data (e.g., different patient populations from across multiple sites) will not only increase the utility of the data but will lead to more general advances in the field of transfer learning. A focus on intelligibility - a quality that is often overlooked by the machine learning community - promises to increase the utility of such models, since clinicians are more likely to adopt a model they can check and understand. Prioritizing actionable models will yield new strategies for causal analysis in high-dimensional observational settings. This, in turn, will enable the generation of new hypotheses regarding causal relationships in clinical medicine.