There exist many settings in which trying out a new decision might be costly, but logs of past decisions and outcomes might help to inform this new decision. For example, health records track clinical decisions and outcomes; online courses may track different teaching and engagement strategies and final performance; factories may track different process choices and output quality. Information from past logs may prevent us from making the same mistakes and improve outcomes. However, learning from such logged data is not easy: not all possible decisions may have been tried, and not all relevant information will have been recorded: for example, a health record may accurately contain what lab tests a patient received but lack potentially relevant information about their home and work environment. These challenges make it hard for systems to reason about the effect of following a different decision-making strategy than current practice. Current approaches fall into two main types: statistical methods, which have strong theoretical foundations but require many assumptions; and those based on human expertise, which can be strong but also fallible. This work brings together the strengths of statistical and human-based approaches to validation to help identify promising decision-making strategies from logged data.
Specifically, the project focuses on integrating human and statistical inputs for two major tasks. The first is the task of converting the raw inputs (histories of measurements) into human-understandable representations, where statistical methods are used to proposed representations that will be useful for defining or summarizing a policy and human input is used to ensure that the representation is intuitive, or at least understandable. The second is the task of estimating differences in outcomes if a different decision is made. Here, statistical methods are used to form the initial estimate as well as identify what data are most influential to that estimate, and human input is used to determine whether the estimate is reliable given that it relies particularly on those data. These two building blocks, which allow us to summarize the data and a policy, as well as estimate outcomes, are then used for both evaluating a given policy and proposing new policies.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.