The US continued achievement at the forefront of science and technology requires a significant investment in new research in information technology to tackle the most challenging problems created by the vast data footprint created by digital recording of human activity. This project develops novel models and methods for forecasting human activity in time and space using sparse, heterogeneous data. The goals are very general and are focused on predicting and filling in missing data. An example of the type of data this project addresses would be a year's worth of geotagged Twitter data from a major city along with other informative geospatial information from that region. This project combines expertise of senior scientists in both Mathematics and Anthropology. The project develops analytical tools for understanding a diverse array of cyber-geospatial-temporal datasets. While focused on basic research, the project has tremendous potential to impact national security. This three-year project trains postdocs, graduate students, and undergraduate researchers. The mentees will be trained in research, in presentation of their work in written and spoken formats, with an emphasis on refereed journal publications and conference presentations. They will also be connected to future employers and will be given career advice throughout the length of their training.

The project focuses on information technology at the interface between large-scale cultural, social and behavioral processes and the situational conditions that lead to the expression of specific behaviors. This work extends a general conceptualization of text-based topic modeling to handle diverse collections of data types. The project develops methods to detect situational probabilistic effects through spatially-explicit topic modeling. One goal is to organize situational effects into different categories: (a) relatively stationary (e.g., the spatially discrete, but temporally stable role that the physical airport plays in driving airport related topics), (b) intermittent (e.g., discrete holidays) and (c) ephemeral (e.g., Foursquare). Another goal is temporal forecasting while a third goal is filling in missing information from a latent space. The research approach focuses on algorithms that are flexible enough to extend to a variety of datasets. The work interweaves several very useful models and algorithms for large data including self-exciting point process models for temporal information, soft topic modeling such as nonnegative matrix factorization and latent Dirichlet allocation for linear mixture models of data, hard clustering methods built around total variation minimization on graphs and graph Laplacians, and data fusion methods to combine these ideas in which latent space information is studied for forecasting and filling in missing information.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1737770
Program Officer
Leland Jameson
Project Start
Project End
Budget Start
2017-09-15
Budget End
2021-08-31
Support Year
Fiscal Year
2017
Total Cost
$603,944
Indirect Cost
Name
University of California Los Angeles
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90095