This project develops new statistical methodology for solving important applied problems in medicine and science. In personalized medicine, the PI will focus on the prediction of whether particular treatments are suitable for a patient based on their demographics and clinical history, using vast troves of past patient experiences. Biologists seek to understand the folding patterns of chromosomes in cells, a key ingredient in understanding their function. In a second project, the PI will develop novel curve-fitting methods to learn these folding patterns from indirect and noisy measurements of this three dimensional structure. Ecologists try to learn the characteristics of environments that attract certain species, as well as shared aspects of species that coinhabit environments, as a critical component in species survival, pest control and disease prevention. In a third project, the PI will develop methods that can scale to extremely large species populations (such as bacteria and insects) based on site-specific surveys. The project also provides research training opportunities for graduate students.

The project develops validation methods to select from a collection of models for estimating heterogenous treatment effects, despite the fact that in observational data there are no direct measurements of the treatment effect. The project develops adaptive nearest-neighbor matching techniques to construct a comparison set for each validation point. With high-dimensional chromosomal contact maps, the PI plans to draw on his early work on principal curves to model the three-dimensional folding structure of chromosomes. This amounts to metric scaling with side information on the local structure of the three-dimensional solution. Generalized linear latent-variable models are popular for modeling species distributions (usually Poisson models for counts, and binomial models for presence/absence), but they grind to a halt when the number of species and/or locations is very large. The PI plans to adapt earlier work on matrix completion to develop alternating maximum-likelihood fitting algorithms to scale these methods to extremely large populations.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

National Science Foundation (NSF)
Division of Mathematical Sciences (DMS)
Standard Grant (Standard)
Application #
Program Officer
Yong Zeng
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Stanford University
United States
Zip Code