Multiple testing methods for random fields and high-dimensional dependent data

Schwartzman, Armin

Abstract

Large-scale multiple testing has become ubiquitous in the search for disease and health risk markers using high-throughput technologies. While statistical methods for multiple testing often assume independence between the tests, many real situations exhibit dependence and an underlying structure. Examples of spatial structure are one-dimensional (1D) in the case of proteomic data; 2D in the case of environmental data; and 3D in the case of brain imaging data. Ignoring correlation in the analysis may lead to a different set and ordering of discovered features, resulting in increased error rates and potential missing of important features. There is a need to characterize the effect of correlation in multiple testing and incorporate it into the analysis. The goal of this proposal is to develop multiple testing methods that incorporate the correlation in the data in order to increase statistical power, control error rates and obtain appropriately interpretable results. This is done in two different ways. (1) In Aims 1 and 2, we assume a spatial structure and stationary ergodic correlation, where the signal of interest consists of a relatively small number of unimodal peaks. We use random field theory to compute p-values for testing the heights of local maxima of the observed data after smoothing. We develop these methods in complexity from 1D to 3D domains, and from peaks of equal width to peaks of unequal width. We then adapt and apply these methods to various types of data obtained from high-throughput technologies, specifically: mass- spectrometry data for identifying protein biomarkers of cancer; climate model output data for identification of geographical regions at risk for heat stress as a result of climate change; and brain imaging data for identification of anatomical regions involved in abnormal cognitive development. (2) In Aim 3, we assume a general correlation structure, not necessarily stationary or ergodic, and propose a conditional marginal analysis, where correlation is incorporated through conditioning on the observed marginal distribution of likely null cases. Although not exclusively, emphasis throughout is placed on false discovery rate inference. This proposal provides a unified view of signal detection for random fields that applies broadly to a large class of problems ranging from proteomics to medical imaging to environmental monitoring. From a statistical point of view, it provides a new answer to the problem of controlling FDR in random fields. By taking advantage of the dependence structure, the methods developed in this proposal offer higher statistical power in the search for markers, so that a smaller number of false markers will be tested in follow-up studies.

Public Health Relevance

This proposal provides a unified view of feature detection in high-throughput data that applies to a large class of problems ranging from cancer proteomics to medical imaging to environmental monitoring. By taking advantage of the dependence structure, the methods developed in this proposal offer higher statistical power in the search for markers, facilitating the discovery of new true markers and reducing the number of false markers to be tested in follow-up studies.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Research Project (R01)
Project #: 5R01CA157528-04
Application #: 8633009
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Zhu, Li

Project Start: 2012-06-03
Project End: 2016-01-31
Budget Start: 2015-04-01
Budget End: 2016-01-31
Support Year: 4
Fiscal Year: 2015
Total Cost
Indirect Cost

Institution

Name: North Carolina State University Raleigh
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 042092122

City: Raleigh
State: NC
Country: United States
Zip Code: 27695

Related projects


NIH 2017 R01 CA	Multiple testing methods for random fields and high-dimensional dependent data Schwartzman, Armin / University of California San Diego	$243,023
NIH 2016 R01 CA	Multiple testing methods for random fields and high-dimensional dependent data Schwartzman, Armin / University of California San Diego
NIH 2015 R01 CA	Multiple testing methods for random fields and high-dimensional dependent data Schwartzman, Armin / North Carolina State University Raleigh
NIH 2014 R01 CA	Multiple testing methods for random fields and high-dimensional dependent data Schwartzman, Armin / North Carolina State University Raleigh	$227,744
NIH 2013 R01 CA	Multiple testing methods for random fields and high-dimensional dependent data Schwartzman, Armin / Dana-Farber Cancer Institute	$271,388
NIH 2012 R01 CA	Multiple testing methods for random fields and high-dimensional dependent data Schwartzman, Armin / Dana-Farber Cancer Institute	$288,711

Publications

French, Joshua P (2017) autoimage: Multiple Heat Maps for Projected Coordinates. R J 9:284-297

French, Joshua P; McGinnis, Seth; Schwartzman, Armin (2017) Assessing NARCCAP climate model effects using spatial confidence regions. Adv Stat Climatol Meteorol Oceanogr 3:67-92

Lipner, Ettie M; Knox, David; French, Joshua et al. (2017) A Geospatial Epidemiologic Analysis of Nontuberculous Mycobacterial Infection: An Ecological Study in Colorado. Ann Am Thorac Soc 14:1523-1532

Cheng, Dan; Schwartzman, Armin (2015) Distribution of the Height of Local Maxima of Gaussian Random Fields. Extremes (Boston) 18:213-240

Azriel, David; Schwartzman, Armin (2015) The Empirical Distribution of a Large Number of Correlated Normal Variables. J Am Stat Assoc 110:1217-1228

Naylor, Melissa G; Cardenas, Valerie A; Tosun, Duygu et al. (2014) Voxelwise multivariate analysis of multimodality magnetic resonance imaging. Hum Brain Mapp 35:831-46

Schwartzman, Armin; Jaffe, Andrew; Gavrilov, Yulia et al. (2013) MULTIPLE TESTING OF LOCAL MAXIMA FOR DETECTION OF PEAKS IN CHIP-SEQ DATA. Ann Appl Stat 7:471-494

Schwartzman, Armin (2012) Comment on ""Estimating False Discovery Proportion Under Arbitrary Covariance Dependence"" by Fan et al. J Am Stat Assoc 107:1039-1041

Schwartzman, Armin; Gavrilov, Yulia; Adler, Robert J (2011) MULTIPLE TESTING OF LOCAL MAXIMA FOR DETECTION OF PEAKS IN 1D. Ann Stat 39:3290-3319

Comments

Be the first to comment on Armin Schwartzman's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: