Large-scale genomic, proteomic and other """"""""omic"""""""" research has become increasingly important and common for discovering disease genes and """"""""omic"""""""" biomarkers for cancer prevention and intervention, and for studying gene-environment interactions in population-based studies. Such high-dimensional """"""""omic"""""""" data present fundamental statistical and computational challenges in data analysis and result interpretation. Limited statistical developments have been made on analysis of high-dimensional """"""""omic"""""""" data in populationbased studies. Such a methodological shortage limits the speed of using genomic and proteomic data to effectively advance population sciences. The purpose of this proposal is to respond to this need by developing advanced statistical methods in conjunction with other advanced quantitative methods for analysis of high-dimensional genomic and proteomic data arising from population-based studies.
The specific aims are: (1) To develop regularized estimating equation-based variable selection methods for gene/biomarker discovery in the presence of a large number of SNPs or proteins and in studying gene-environment (space) interactions. The methods are developed for (a) continuous and discrete cross-sectional/case-control data, (b) longitudinal, clustered and spatial data, (c) independent, clustered, and spatial survival data;(2) To develop penalized likelihood-based methods for multiple testing for high-dimensional genomic and proteomic data subject to moderate/high correlation, such as microarrays and proteomic mass-spectrometry data, with the goal of providing higher statistical power and better false discovery rate (FDR) estimation;(3) To develop a suite of tools using contemporary advances in signal processing based on local Fourier analysis to effectively preprocess mass spectrometry (MS) proteomic data;(4) To develop supervised clustering methods for array CGH (aCGH) data to identify aCGH profiles related to survival;(5) To develop efficient user-friendly statistical software that implement these methods with the goal of disseminating them freely to health science researchers. The proposed methods will be applied to data from the motivating Harvard/MGH lung cancer genetic susceptibility and progression studies, the Harvard/MGH lung cancer proteomic study, the DFCI lung cancer LBK mutation micorarray study, the longitudinal HIV codon mutation study, and the Harvard/MGH brain tumor aCGH study. This project integrates closely with the spatial and surveillance projects 1 and 2 and the cores, as they have a common theme of analysis of high-dimensional observational study data;need advanced computing, and jointly provide tools for studying gene-space interactions.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Program Projects (P01)
Project #
Application #
Study Section
Special Emphasis Panel (ZCA1-RPRB-7)
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Harvard University
United States
Zip Code
Bind, M-A C; Vanderweele, T J; Coull, B A et al. (2016) Causal mediation analysis for longitudinal data with exogenous exposure. Biostatistics 17:122-34
Hernán, Miguel A; Robins, James M (2016) Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available. Am J Epidemiol :
Chen, Jun; Just, Allan C; Schwartz, Joel et al. (2016) CpGFilter: model-based CpG probe filtering with replicates for epigenome-wide association studies. Bioinformatics 32:469-71
Lin, Xinyi; Lee, Seunggeun; Wu, Michael C et al. (2016) Test for rare variants by environment interactions in sequencing association studies. Biometrics 72:156-64
Lee, Kyu Ha; Tadesse, Mahlet G; Baccarelli, Andrea A et al. (2016) Multivariate Bayesian variable selection exploiting dependence structure among outcomes: Application to air pollution effects on DNA methylation. Biometrics :
Yung, Godwin; Lin, Xihong (2016) Validity of using ad hoc methods to analyze secondary traits in case-control association studies. Genet Epidemiol 40:732-743
Arvold, Nils D; Cefalu, Matthew; Wang, Yun et al. (2016) Comparative effectiveness of radiotherapy with vs. without temozolomide in older patients with glioblastoma. J Neurooncol :
Wasfy, Jason H; Dominici, Francesca; Yeh, Robert W (2016) Letter by Wasfy et al Regarding Article, ""Facility Level Variation in Hospitalization, Mortality, and Costs in the 30 Days After Percutaneous Coronary Intervention: Insights on Short-Term Healthcare Value From the Veterans Affairs Clinical Assessment, Re Circulation 133:e376
Carere, Deanna Alexis; Kraft, Peter; Kaphingst, Kimberly A et al. (2016) Consumers report lower confidence in their genetics knowledge following direct-to-consumer personal genomic testing. Genet Med 18:65-72
Zigler, Corwin Matthew (2016) The Central Role of Bayes' Theorem for Joint Estimation of Causal Effects and Propensity Scores. Am Stat 70:47-54

Showing the most recent 10 out of 136 publications