Models with tree-structured, hierarchical autocorrelation are used when sampling units are related to each other. Their inheritance history is modeled by a tree, which is used to parametrize the residual correlation structure among observations. The project will develop an asymptotic theory for these autocorrelation models, arising from an Ornstein-Uhlenbeck process along the tree. As the number of tips in the tree grows indefinitely, the investigators will determine which parameters are microergodic and which parameters are not. The asymptotic consistency and the rate of convergence of the maximum likelihood estimator are expected to vary importantly depending on the microergodicity of the parameter and on topological properties of the tree. Analogies will be built between this asymptotic framework and the infill asymptotic framework in spatial statistics, when observations are collected on a dense set of locations within a bounded region of space. The project will refine the concept of effective sample size for hierarchically autocorrelated data and study optimal sampling designs. This work will provide important steps toward developing appropriate model selection tools for the detection of possibly many Ornstein-Uhlenbeck selection regimes, with a large number of model parameters compared to the sample size.

Tree models with hierarchical autocorrelation arose first in evolutionary biology and ecology, with the comparison of biological species. These models are now used in many other areas, ranging from the study of rapidly evolving viruses to the study of human language evolution. The Ornstein-Uhlenbeck model is used to detect selection as opposed to neutral evolution, to discover changes in selective regime and to determine driving factors of selection. The project will provide a unified statistical asymptotic framework for these models and will inform best practices for empirical studies. Computational tools will be broadly disseminated, and opportunities will be provided for training at the interface between statistics and biology.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1106483
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2011-07-01
Budget End
2014-06-30
Support Year
Fiscal Year
2011
Total Cost
$206,505
Indirect Cost
Name
University of Wisconsin Madison
Department
Type
DUNS #
City
Madison
State
WI
Country
United States
Zip Code
53715