This is a project to develop new methods of functional data analysis directed towards important public health applications in genomics and life course epidemiology. In genome-wide expression and DNA methylation studies, it is of interest to locate genes showing activity that is associated with clinical outcomes, e.g., to use gene expression profiles from the tumors of breast cancer patients to predict estrogen receptor protein concentration, an important prognostic marker for breast tumors. In such studies, the gene expression profile across a chromosome can be regarded a functional predictor, and a gene associated with the clinical outcome is identified by its base pair position along the chromosome. The key aim of the project is to develop new methods of statisti- cal inference for finding such genetic loci, leading to the identification of chromosomal regions that are potentially useful for diagnosis and therapy. Although there is extensive statistical literature on gene expression data, it is almost exclusively concerned with multiple testing procedures for detecting the presence of differentially expressed genes, and statistical methods for locating such genes based on expression profiles (interpreted as functional predictors) are not well developed. Although functional data analysis has reached a mature stage of development over the last ten years, serious problems can arise when the currently available methods are applied in situations involving functional predictors (or trajectories) that have point impact effects (as with gene expression), or in situations in which there is only sparse temporal resolution in the observation of the trajectories. The broad objectives of the project are to exploit fractal behavior in the trajectories to improve statistical learning methodology in functional data analysis. The project will have important implications for understanding a wide variety of complex adaptive systems having fractal behavior. Studies of calorie-intake trajectories and DNA methylation profiles related to cardiovascular risk outcomes, and growth rate trajectories related to neuropsychological outcomes, will be developed as applications of the new methodology. The first specific problem to be addressed is to show that the rates of learning in systems involving trajectories with fractal characteristics are determined by the Hurst parameter (i.e., the exponent of self-similarity scaling) and to show that a type of bootstrap learning can adapt to the full range of fractal behavior. The second specific problem to be addressed is to develop an imputation method for generating missing values of trajectories that have fractal properties (e.g., growth rate curves), and to find a way to carry out functional regression modeling based on the imputed trajectories. 1

Public Health Relevance

The relevance of the project to public health is that novel statistical methods will be developed for addressing important questions in genomics and life course epidemiology. In particular, rigorous statistical inference for locating genes based on gene expression and DNA methylation profiles, and for studying the effect of growth rate trajectories on adult neuropsychological outcomes, will be developed.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM095722-01
Application #
8023927
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Gaillard, Shawn R
Project Start
2011-09-01
Project End
2015-06-30
Budget Start
2011-09-01
Budget End
2012-06-30
Support Year
1
Fiscal Year
2011
Total Cost
$180,612
Indirect Cost
Name
Columbia University (N.Y.)
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
621889815
City
New York
State
NY
Country
United States
Zip Code
10032
McKeague, Ian W; Qian, Min (2018) Marginal screening of 2 × 2 tables in large-scale case-control studies. Biometrics :
Wang, Huixia Judy; McKeague, Ian W; Qian, Min (2018) Testing for Marginal Linear Effects in Quantile Regression. J R Stat Soc Series B Stat Methodol 80:433-452
Brown, Alan S; Gyllenberg, David; Hinkka-Yli-Salomäki, Susanna et al. (2017) Altered growth trajectory of head circumference during infancy and schizophrenia in a National Birth Cohort. Schizophr Res 182:115-119
Niemelä, Solja; Sourander, Andre; Surcel, Heljä-Marja et al. (2016) Prenatal Nicotine Exposure and Risk of Schizophrenia Among Offspring in a National Birth Cohort. Am J Psychiatry 173:799-806
Eck, Daniel J; McKeague, Ian W (2016) Central Limit Theorems under additive deformations. Stat Probab Lett 118:156-162
McKeague, Ian W; Levin, Bruce (2016) Convergence of empirical distributions in an interpretation of quantum mechanics. Ann Appl Probab 26:2540-2555
Qian, Min (2016) Comment. J Am Stat Assoc 111:1538-1541
Chang, Hsin-Wen; El Barmi, Hammou; McKeague, Ian W (2016) Tests for stochastic ordering under biased sampling. J Nonparametr Stat 28:659-682
McKeague, Ian W; Qian, Min (2015) An adaptive resampling test for detecting the presence of significant predictors. J Am Stat Assoc 110:1422-1433
McKeague, Ian W (2015) Central limit theorems under special relativity. Stat Probab Lett 99:149-155

Showing the most recent 10 out of 18 publications