Prescriptive analytics, and constrained optimization in particular, is central to decision making over a broad range of domains, including finance, transportation, manufacturing, and healthcare, and has applications to scientific research as well. Typically, decision makers have relied on application-specific solutions to model and solve these problems. Such solutions are often complex and do not generalize. Moreover, the usual workflow requires that data be extracted from a database and then reformatted and fed into a separate optimization package, after which the output must be reformatted and inserted back into the database; this process is slow, cumbersome, and error prone. Finally, modern data-intensive optimization problems are of unprecedented size. A domain-independent, declarative, and scalable approach is needed, supported and powered by the system where the data relevant to these problems typically resides: the database. Then modeling becomes less ad hoc, and the overall optimization process, from data preparation through solution and exploration of results, becomes much more efficient. Desirable data management functionality --- such as consistency, persistence, fault tolerance, access control, and data-integration capability --- become an integral part of the system "for free". This project will develop algorithms and systems to provide general-purpose in-database support for prescriptive analytics applications over the sort of large scale uncertain data that is commonly encountered in practice.

Specifically, the project will develop extensions to the SQL relational query language to allow specification of ``stochastic package queries'', a class of database queries that selects an optimal set ("package") of tuples that satisfy both per-tuple and global constraints. Such queries correspond to stochastic integer linear programs. Novel solution algorithms will focus on the scaling challenges caused both by uncertainty in the data and by large data volumes. The system will provide exact solutions when possible, and otherwise provide scalable Monte-Carlo-based solutions with rigorous approximation guarantees. The project will radically re-design the prior PackageBuilder system for deterministic package queries, incorporating techniques from probabilistic databases, to create a complete end-to-end system. The project will impact a broad set of domains with applications that boil down to modeling and solving constrained optimization problems over uncertain data, including finance, healthcare, and transportation.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
2019-10-01
Budget End
2021-09-30
Support Year
Fiscal Year
2019
Total Cost
$299,317
Indirect Cost
Name
University of Massachusetts Amherst
Department
Type
DUNS #
City
Hadley
State
MA
Country
United States
Zip Code
01035