The proposal focuses on Markov random fields and directed graphical models, classes of statistical models that are based on a marriage between graph theory and probability theory, and allow for flexible modeling of network-structured data. The core of the proposal consists of multiple research thrusts, all centered around the goal of developing practical algorithms and theory for statistical estimation with network-structured data. One research thrust concerns various issues associated with model selection in undirected graphical models, also known as Gibbs distributions or Markov random fields. Problems include determining the information-theoretic limitations of graphical model selection in high dimensions (where the number of vertices may be larger than the sample size), not only for i.i.d. data but also dependent data; developing methods for tracking sequences of networks that evolve over time; and developing methods for data with hidden variables. Another research thrust concerns various statistical problems associated with directed acyclic graphical structures (DAGs), including estimating equivalence classes of DAGs in the high-dimensional setting; estimating causal relationships via designed interventions; and efficient computational methods for DAG selection using the Lasso and related methods. Overall, the proposed research is inter-disciplinary in nature, drawing on techniques from mathematical statistics, convex optimization, information theory, concentration of measure, and graph theory.

Science and engineering abounds with different types of networks. Examples include social networks such as FaceBook and Twitter, networks of genes and proteins in molecular biology, network models for economic and market dynamics, neural networks in brain imaging, networks of disease transmission in epidemiology, and information networks in law enforcement. In the real-world, the structure of the underlying network is not known, but instead one observes samples of the network behavior (e.g., packet counts in a computer network; instances of infection at given time instances of an epidemic; emails or text messages sent among a group of people), and the goal is to infer the network structure. Methods for solving this network inference problem have a broad range of applications. Examples include inferring brain connectivity and disease etiology in neuroimaging studies, detecting terrorist cells in social networks, monitoring intrusions in computer networks, and understanding the basis of gene-protein interactions in systems biology.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1107000
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2011-07-01
Budget End
2014-06-30
Support Year
Fiscal Year
2011
Total Cost
$420,000
Indirect Cost
Name
University of California Berkeley
Department
Type
DUNS #
City
Berkeley
State
CA
Country
United States
Zip Code
94704