Big data poses big challenges. Perhaps the biggest challenge is to extract small but useful information from big noisy data. What approach should be used to do that, and for what data, so that this extraction is scalable, and yields not spurious artifacts but provably reliable predictive knowledge? Numerous data science applications are blocked on these questions. For example, the prediction and control of opinions, (fake) news, and (mis)information is a practical problem that is becoming of increasingly high and broad impact these days of pervasion of online social media into everyday human life. This particular problem is largely blocked on general impossibility of disentangling those who naturally bond with others like themselves from those influenced by peers in social networks, except in some specific settings. The specific settings of this project -- real networks with latent-space structure -- are exactly the settings in which these theoretical and practical difficulties can be resolved.

The project will make a series of contributions in two areas. First, it will resolve a long-standing problem of obtaining a class of random graph models satisfying four requirements of realism: sparsity, exchangeability, projectivity and unbiasedness/maximum-entropy. Within this class, a set of graph-structural properties will be determined such that unbiased random graphs that have these properties are proved to have latent-geometric structure, thus rigorously linking discrete combinatorial structure of random graphs to smooth geometry of latent manifolds. The framework that the project will develop to prove this, will be quite general and applicable to other types of big data. The properties responsible for latent geometricity of random graphs are expected to characterize many real networks, meaning that such networks will be guaranteed to have latent geometries. The second part of the project will focus on developing scalable algorithms and software, with optimal computational complexity scaling linearly with the data size, and with proved accuracy guarantees, to learn the latent structure of a real network if the network has it, and apply these algorithms to large real networks. The outcomes of this latent-geometric learning will make it possible to map dynamical processes in real networks, such as spreading phenomena in social networks, to latent dynamics, while the knowledge of latent statistical factors behind this dynamic can then be used to predict and control it in practice with known accuracy bounds.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1741355
Program Officer
Victor Roytburd
Project Start
Project End
Budget Start
2017-09-01
Budget End
2021-08-31
Support Year
Fiscal Year
2017
Total Cost
$900,000
Indirect Cost
Name
Northeastern University
Department
Type
DUNS #
City
Boston
State
MA
Country
United States
Zip Code
02115