Matrix-variate models provide a way to analyze complex multivariate datasets in which meaningful relationships may exist among both the variables and among the observed units. This is an extension of the more standard setting of multivariate statistical analysis, in which the variables are dependent, but the observations are viewed as being independent. The primary motivation for treating the data as a single high-dimensional sample is the potential for increased power when estimating parameters that are sensitive to relationships among the experimental units. Recent advances in high-dimensional non-asymptotic theory, convex analysis, and algorithms allow such models to be fit to the very large data sets that arise in critical scientific areas such as genomics and neuroscience. A major goal of this project is to assess the extent to which accounting for relationships among samples allows more accurate estimates to be made of relationships among variables. An example of how this might be applied is in the assessment of associations between rare genetic variants and a phenotype. The Sequential Kernel Association Test (SKAT) uses a kernel matrix which is essentially a covariance matrix of the genetic features. The investigators will evaluate the use of the covariance matrix obtained from the recently-proposed Gemini approach to matrix-variate analysis as a kernel for the SKAT procedure. Three scientific applications have been identified to which the Gemini framework may be usefully extended. In addition to conceptual and algorithmic challenges, these applications require the theory and methodology for the Gemini estimators to cover new settings, for example, due to SNP genotypes being highly non-Gaussian, and due to brain connectivity graphs changing over time. The investigators will adapt the baseline Gemini models and algorithms to these new settings, and study their statistical and computational properties.

Recent technological breakthroughs in instrumentation allow large and detailed data sets describing living systems to be efficiently collected. Researchers in biology and health science have embraced these technologies, but have sometimes been frustrated by the fact that such data must be interpreted conservatively, to guard against making non-reproducible claims. Research progress would be accelerated if it were possible to analyze such data in a way that provides more statistical power, without expensive increases in the sample size. One path to doing this is to move beyond the paradigm of treating the observed units in a study (e.g. human research subjects or laboratory animals) independently. The investigator and his colleagues have developed a new type of statistical procedure that uses inferred relationships among the units of observation to more accurately estimate physical and biological unknowns. They plan to develop the theory behind their technique, to develop software for carrying out the analyses, and to work with scientists in cancer biology, genomics, and neuroscience to assess the potential for using this approach to obtain novel scientific insights. The project will focus on three scientific problems: identification of DNA lesions involved with cancer incidence and progression, identification of inherited genetic variants associated with human diseases, and characterization of neural connectivity changes induced by changes in consciousness state.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1316731
Program Officer
Yong Zeng
Project Start
Project End
Budget Start
2013-08-01
Budget End
2017-07-31
Support Year
Fiscal Year
2013
Total Cost
$220,000
Indirect Cost
Name
Regents of the University of Michigan - Ann Arbor
Department
Type
DUNS #
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109