The growing awareness of the importance of data and data analysis, coupled with the unprecedented growth in the amount of data in recent years, has led to concerted efforts by researchers in the fields now collectively referred to as data sciences, to develop new models capable of handling big complex datasets. The vast majority of the available data is unlabeled, which makes the modeling problem more challenging. This project will advance the field of modeling big complex unlabeled data. The focus will be on learning from network data as well as learning dependency structures from regular data. Some of the concrete problems investigated are: What can an epidemic spreading over a network tell us about the structure of the network and the origin of the epidemic? What can the structure of the network tell us about the latent features of the nodes, for example, their grouping, or community in the case of social networks? Are there more refined structures in real networks beyond simple grouping or community structure? Can we learn complex networks from regular data that tell us about the nature of the dependency among the underlying variables (for example, what variables are the causes of a given variable)? How well do these often complex models fit the real data? Advancing on these questions has a direct impact on many scientific domains dealing with data. For example, genomics and computational biology, neuroscience, epidemiology, network security, social sciences and marketing, all benefit from advances in network analysis. Advances in dependency structure learning can improve causal inference procedures with impact on all scientific fields. This project on network epidemics has the potential to be transformative with immediate applications to the public health domain.

This project advances the state-of-the art in inferring complex relations from data in an unsupervised fashion. As a result, network inference and graphical modeling will play prominent roles in our approach. We will consider four main tasks: 1) Developing goodness-of-fit tests for structured network models, in particular those used in community detection and clustering. Despite advances in network modeling, there are concerns that current models are not capturing the complexity of real networks. A first step toward realistic network modeling is developing tools for assessing how well the models fit. 2) Advancing the state-of-the-art in modeling complex networks, presenting ideas on capturing self-similarity in real networks as well as hierarchical statistical models for multilayer networks. 3) Advancing inference based on network dynamics: Many networks are accompanied by dynamics governed by the network structure, e.g., the spread of rumors and diseases. We often observe the result of the dynamics (who gets infected over time) and would like to make inference about the origin of the dynamic or the structure of the underlying network. We will address the challenges in dealing with these questions in real networks where the presence of many cycles and incomplete information about the dynamic pose serious difficulties. 4) Advancing inference of high-dimensional dependency structures: Characterizing dependencies (correlation, causation, etc.) among a collection of random variables is a fundamental task of statistical analysis. The principal investigator will explore learning high-dimensional directed graphical models from data that are suitable for causal interpretations.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1945667
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2020-07-15
Budget End
2025-06-30
Support Year
Fiscal Year
2019
Total Cost
$78,664
Indirect Cost
Name
University of California Los Angeles
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90095