Modern techniques for data gathering?arising from medicine and bioinformatics, internet applications such as web-search, physics and astronomy, mobile data gathering platforms?have yielded an explosion in the mass and diversity of data. Concurrently, statistics, decision theory, and machine learning have successfully laid a groundwork for answering questions about our world based on analysis of this data. As more information is collected, classical approaches for inference and learning are insufficient, as additional concerns arise?computational resources, privacy considerations, storage limitations, network communication constraints? outside of statistical accuracy. This prompts a basic question: how can multiple criteria be balanced while maintaining statistical performance?

To bring statistics and machine learning into closer contact with other desiderata, this research involves the development of procedures that trade between scarce resources in principled and optimal ways. Such trade-offs have been difficult to characterize, as current tools for providing fundamental limits (such as information theory in communication) do not connect disparate areas. Three concrete sub-areas serve as bases for this research. The investigators study the interplay of computing with learning, estimation, and optimization by connecting notions of computation?such as memory accesses or synchronization in distributed systems?to data analysis tasks. Second, the research investigates adaptive and robust procedures?and associated statistical costs?that will become more important given increasingly long-tailed and messy data. Thirdly, the investigators study privacy in estimation, using information and decision-theoretic tools to characterize the tensions between statistical accuracy and sensitive data disclosures. Combined, these lay the groundwork for a theory on the use of data in the face of constraints, along with a functional and practical understanding of procedures that balance scarce resources against statistical accuracy.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
1553086
Program Officer
Phillip Regalia
Project Start
Project End
Budget Start
2016-02-15
Budget End
2021-01-31
Support Year
Fiscal Year
2015
Total Cost
$497,033
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Stanford
State
CA
Country
United States
Zip Code
94305