This is a project to develop new methods of functional data analysis directed towards important public health applications in genomics and life course epidemiology. In genome-wide expression and DNA methylation studies, it is of interest to locate genes showing activity that is associated with clinical outcomes, e.g., to use gene expression profiles from the tumors of breast cancer patients to predict estrogen receptor protein concentration, an important prognostic marker for breast tumors. In such studies, the gene expression profile across a chromosome can be regarded a functional predictor, and a gene associated with the clinical outcome is identified by its base pair position along the chromosome. The key aim of the project is to develop new methods of statisti- cal inference for finding such genetic loci, leading to the identification of chromosomal regions that are potentially useful for diagnosis and therapy. Although there is extensive statistical literature on gene expression data, it is almost exclusively concerned with multiple testing procedures for detecting the presence of differentially expressed genes, and statistical methods for locating such genes based on expression profiles (interpreted as functional predictors) are not well developed. Although functional data analysis has reached a mature stage of development over the last ten years, serious problems can arise when the currently available methods are applied in situations involving functional predictors (or trajectories) that have point impact effects (as with gene expression), or in situations in which there is only sparse temporal resolution in the observation of the trajectories. The broad objectives of the project are to exploit fractal behavior in the trajectories to improve statistical learning methodology in functional data analysis. The project will have important implications for understanding a wide variety of complex adaptive systems having fractal behavior. Studies of calorie-intake trajectories and DNA methylation profiles related to cardiovascular risk outcomes, and growth rate trajectories related to neuropsychological outcomes, will be developed as applications of the new methodology. The first specific problem to be addressed is to show that the rates of learning in systems involving trajectories with fractal characteristics are determined by the Hurst parameter (i.e., the exponent of self-similarity scaling) and to show that a type of bootstrap learning can adapt to the full range of fractal behavior. The second specific problem to be addressed is to develop an imputation method for generating missing values of trajectories that have fractal properties (e.g., growth rate curves), and to find a way to carry out functional regression modeling based on the imputed trajectories. 1
of the project to public health is that novel statistical methods will be developed for addressing important questions in genomics and life course epidemiology. In particular, rigorous statistical inference for locating genes based on gene expression and DNA methylation profiles, and for studying the effect of growth rate trajectories on adult neuropsychological outcomes, will be developed.
|McKeague, Ian W; Qian, Min (2014) Estimation of treatment policies based on functional predictors. Stat Sin 24:1461-1485|
|Li, Zhigang; McKeague, Ian W; Lumey, Lambert H (2014) Optimal design strategies for sibling studies with binary exposures. Int J Biostat 10:185-96|
|Barmi, Hammou El; McKeague, Ian W (2013) Empirical likelihood-based tests for stochastic ordering. Bernoulli (Andover) 19:295-307|
|Lopez-Pintado, Sara; McKeague, Ian W (2013) Recovering gradients from sparsely observed functional data. Biometrics 69:396-404|