Across all sciences, researchers hope to use algorithms and machine learning to derive reliable insights from data, but often research findings turn out to be false or hard to replicate. Indeed, assessing the validity of insights suggested by data is presently a difficult and error-prone task. Even in industry, where machine learning has fueled dramatic advances, more principled ways of benchmarking and improving the performance of a machine learning system would make a major difference. What often work best in practice are poorly understood heuristics, leading to much guesswork with varying results. This inscrutable behavior of machine learning also has repercussions on society at large as more and more people struggle with the implications of algorithmic decisions in their daily lives. Fairness, interpretability, and transparency have become major talking points as algorithms increasingly aid or replace human judgment.
The PI aims to build guiding theory alongside scalable algorithms that make the practice of machine learning more reliable, transparent, and aligned with societal values. Focusing on algorithmic stability as a unifying technical framework, this proposal targets several foundational challenges including the design of a robust methodology to address the reliability crisis in data science, a working theory for why and when large artificial neural networks train and generalize well, and a universal framework to reason about generalization in unsupervised learning as is presently lacking. A particular emphasis is on application domains of societal impact. The PI has long been invested in topics such as privacy, fairness, accountability and transparency in machine learning not only through academic publications, but also through workshops, mentorship, teaching, and interdisciplinary engagements.