The project brings together an interdisciplinary team of researchers from Johns Hopkins University, Carnegie Mellon University, and the University of Chicago to develop methods, theory and algorithms for discovering hidden structure from complex scientific datasets, without making strong a priori assumptions. The outcomes include practical models and provably correct algorithms that can help scientists to conduct sophisticated data analysis. The application areas include genomics, cognitive neuroscience, climate science, astrophysics, and language processing.

The project has five aims: (i) Nonparametric structure learning in high dimensions: In a standard structure learning problem, observations of a random vector X are available and the goal is to estimate the structure of the distribution of X. When the dimension is large, nonparametric structure learning becomes challenging. The project develops new methods and establishes theoretical guarantees for this problem; (ii) Nonparametric conditional structure learning: In many applications, it is of interest to estimate the structure of a high-dimensional random vector X conditional on another random vector Z . Nonparametric methods for estimating the structure of X given Z are being developed, building on recent approaches to graph-valued and manifold-valued regression developed by the investigators; (iii) Regularization parameter selection: Most structure learning algorithms have at least one tuning parameter that controls the bias-variance tradeoff. Classical methods for selecting tuning parameters are not suitable for complex nonparametric structure learning problems. The project explores stability-based approaches for regularization selection; (iv) Parallel and online nonparametric learning: Handling large-scale data is a bottleneck of many nonparametric methods. The project develops parallel and online techniques to extend nonparametric learning algorithms to large scale problems; (v) Minimax theory for nonparametric structure learning problems: Minimax theory characterizes the performance limits for learning algorithms. Few theoretical results are known for complex, high-dimensional nonparametric structure learning. The project develops new minimax theory in this setting. The results of this project will be disseminated through publications in scientific journals and major conferences, and free dissemination of software that implements the nonparametric structure learning algorithms resulting from this research.

The broader impacts of the project include: Creation of powerful data analysis techniques and software to a wide range of scientists and engineers to analyze and understand more complex scientific data; Increased collaboration and interdisciplinary interactions between researchers at multiple institutions (Johns Hopkins University, Carnegie Mellon University, and the University of Chicago); and Broad dissemination of the results of this research in different scientific communities. Additional information about the project can be found at:

National Science Foundation (NSF)
Division of Information and Intelligent Systems (IIS)
Application #
Program Officer
Vasant G. Honavar
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Johns Hopkins University
United States
Zip Code