Technological advances and the information era allow the collection of massive amounts of data at unprecedented resolution. Making use of this data to gain insight into complex phenomena requires characterizing the relationships among a large number of variables. Graphical models explicitly capture the statistical relationships between the variables of interest in the form of a network. Such a representation, in addition to enhancing interpretability of the model, enables computationally efficient inference. The investigator develops methodology to infer undirected and directed networks between a large number of variables from observational data. This research has broad societal impact, as it affects application domains from weather forecasting to phylogenetics and to personalized medicine. In addition, the PI is one of the initial faculty hires in a new MIT-wide effort in statistics. As such, the PI has major impact on creating new undergraduate and PhD programs in statistics to train the next generation in big data analytics, crucial for taking on challenging roles in this data-rich world.

The goal of this project is to study probabilistic graphical models using an integrated approach that combines ideas from applied algebraic geometry, convex optimization, mathematical statistics, and machine learning, and to apply these models to scientifically important novel problems. The research agenda is structured into three projects. In the first project, the investigator develops methods to infer causal relationships between variables from observational data using the framework of directed Gaussian graphical models combined with tools from optimization and algebraic geometry. The end goal is to apply this new methodology to learn tissue- and person-specific gene regulatory networks from gene expression data such as the Genotype-Tissue Expression (GTEx) project. In the second project, the investigator develops scalable methods for maximum likelihood estimation in Gaussian models with linear constraints on the covariance matrix or its inverse. Such models are important for inference of phylogenetic trees or cellular differentiation trees. The third project is an application of graphical models to weather forecasting; the investigator develops new parametric methods based on Gaussian copulas and also non-parametric methods for the post-processing of numerical weather prediction models that take into account the complicated dependence structure of weather variables in space and time.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1651995
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2017-07-01
Budget End
2022-06-30
Support Year
Fiscal Year
2016
Total Cost
$315,205
Indirect Cost
Name
Massachusetts Institute of Technology
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02139