This is a project to develop new methods of functional data analysis directed towards important public health applications in genomics and life course epidemiology. In genome-wide expression and DNA methylation studies, it is of interest to locate genes showing activity that is associated with clinical outcomes, e.g., to use gene expression profiles from the tumors of breast cancer patients to predict estrogen receptor protein concentration, an important prognostic marker for breast tumors. In such studies, the gene expression profile across a chromosome can be regarded a functional predictor, and a gene associated with the clinical outcome is identified by its base pair position along the chromosome. The key aim of the project is to develop new methods of statisti- cal inference for finding such genetic loci, leading to the identification of chromosomal regions that are potentially useful for diagnosis and therapy. Although there is extensive statistical literature on gene expression data, it is almost exclusively concerned with multiple testing procedures for detecting the presence of differentially expressed genes, and statistical methods for locating such genes based on expression profiles (interpreted as functional predictors) are not well developed. Although functional data analysis has reached a mature stage of development over the last ten years, serious problems can arise when the currently available methods are applied in situations involving functional predictors (or trajectories) that have point impact effects (as with gene expression), or in situations in which there is only sparse temporal resolution in the observation of the trajectories. The broad objectives of the project are to exploit fractal behavior in the trajectories to improve statistical learning methodology in functional data analysis. The project will have important implications for understanding a wide variety of complex adaptive systems having fractal behavior. Studies of calorie-intake trajectories and DNA methylation profiles related to cardiovascular risk outcomes, and growth rate trajectories related to neuropsychological outcomes, will be developed as applications of the new methodology. The first specific problem to be addressed is to show that the rates of learning in systems involving trajectories with fractal characteristics are determined by the Hurst parameter (i.e., the exponent of self-similarity scaling) and to show that a type of bootstrap learning can adapt to the full range of fractal behavior. The second specific problem to be addressed is to develop an imputation method for generating missing values of trajectories that have fractal properties (e.g., growth rate curves), and to find a way to carry out functional regression modeling based on the imputed trajectories. 1

Public Health Relevance

The relevance of the project to public health is that novel statistical methods will be developed for addressing important questions in genomics and life course epidemiology. In particular, rigorous statistical inference for locating genes based on gene expression and DNA methylation profiles, and for studying the effect of growth rate trajectories on adult neuropsychological outcomes, will be developed.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Gaillard, Shawn R
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Columbia University (N.Y.)
Biostatistics & Other Math Sci
Schools of Public Health
New York
United States
Zip Code
Brown, Alan S; Gyllenberg, David; Hinkka-Yli-Salomäki, Susanna et al. (2017) Altered growth trajectory of head circumference during infancy and schizophrenia in a National Birth Cohort. Schizophr Res 182:115-119
Niemelä, Solja; Sourander, Andre; Surcel, Heljä-Marja et al. (2016) Prenatal Nicotine Exposure and Risk of Schizophrenia Among Offspring in a National Birth Cohort. Am J Psychiatry 173:799-806
Eck, Daniel J; McKeague, Ian W (2016) Central Limit Theorems under additive deformations. Stat Probab Lett 118:156-162
McKeague, Ian W; Levin, Bruce (2016) Convergence of empirical distributions in an interpretation of quantum mechanics. Ann Appl Probab 26:2540-2555
Qian, Min (2016) Comment. J Am Stat Assoc 111:1538-1541
Chang, Hsin-Wen; El Barmi, Hammou; McKeague, Ian W (2016) Tests for stochastic ordering under biased sampling. J Nonparametr Stat 28:659-682
McKeague, Ian W; Qian, Min (2015) An adaptive resampling test for detecting the presence of significant predictors. J Am Stat Assoc 110:1422-1433
McKeague, Ian W (2015) Central limit theorems under special relativity. Stat Probab Lett 99:149-155
McKeague, Ian W; Brown, Alan S; Bao, Yuanyuan et al. (2015) Autism with intellectual disability related to dynamics of head circumference growth during early infancy. Biol Psychiatry 77:833-40
Li, Zhigang; McKeague, Ian W; Lumey, Lambert H (2014) Optimal design strategies for sibling studies with binary exposures. Int J Biostat 10:185-96

Showing the most recent 10 out of 16 publications