Statistical Methods for Genomics and Proteomic Data in Population-Based Studies

Lin, Xihong

Abstract

Large-scale genomic, proteomic and other """"""""omic"""""""" research has become increasingly important and common for discovering disease genes and """"""""omic"""""""" biomarkers for cancer prevention and intervention, and for studying gene-environment interactions in population-based studies. Such high-dimensional """"""""omic"""""""" data present fundamental statistical and computational challenges in data analysis and result interpretation. Limited statistical developments have been made on analysis of high-dimensional """"""""omic"""""""" data in populationbased studies. Such a methodological shortage limits the speed of using genomic and proteomic data to effectively advance population sciences. The purpose of this proposal is to respond to this need by developing advanced statistical methods in conjunction with other advanced quantitative methods for analysis of high-dimensional genomic and proteomic data arising from population-based studies.
The specific aims are: (1) To develop regularized estimating equation-based variable selection methods for gene/biomarker discovery in the presence of a large number of SNPs or proteins and in studying gene-environment (space) interactions. The methods are developed for (a) continuous and discrete cross-sectional/case-control data, (b) longitudinal, clustered and spatial data, (c) independent, clustered, and spatial survival data;(2) To develop penalized likelihood-based methods for multiple testing for high-dimensional genomic and proteomic data subject to moderate/high correlation, such as microarrays and proteomic mass-spectrometry data, with the goal of providing higher statistical power and better false discovery rate (FDR) estimation;(3) To develop a suite of tools using contemporary advances in signal processing based on local Fourier analysis to effectively preprocess mass spectrometry (MS) proteomic data;(4) To develop supervised clustering methods for array CGH (aCGH) data to identify aCGH profiles related to survival;(5) To develop efficient user-friendly statistical software that implement these methods with the goal of disseminating them freely to health science researchers. The proposed methods will be applied to data from the motivating Harvard/MGH lung cancer genetic susceptibility and progression studies, the Harvard/MGH lung cancer proteomic study, the DFCI lung cancer LBK mutation micorarray study, the longitudinal HIV codon mutation study, and the Harvard/MGH brain tumor aCGH study. This project integrates closely with the spatial and surveillance projects 1 and 2 and the cores, as they have a common theme of analysis of high-dimensional observational study data;need advanced computing, and jointly provide tools for studying gene-space interactions.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Research Program Projects (P01)
Project #: 5P01CA134294-04
Application #: 8323841
Study Section: Special Emphasis Panel (ZCA1)

Project Start
Project End
Budget Start: 2011-09-01
Budget End: 2012-08-31
Support Year: 4
Fiscal Year: 2011
Total Cost: $134,994
Indirect Cost

Institution

Name: Harvard University
Department
Type
DUNS #: 149617367

City: Boston
State: MA
Country: United States
Zip Code: 02115

Related projects

Publications

Bobb, Jennifer F; Claus Henn, Birgit; Valeri, Linda et al. (2018) Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression. Environ Health 17:67

Chen, Han; Cade, Brian E; Gleason, Kevin J et al. (2018) Multiethnic Meta-Analysis Identifies RAI1 as a Possible Obstructive Sleep Apnea-related Quantitative Trait Locus in Men. Am J Respir Cell Mol Biol 58:391-401

Pierce, Brandon L; Kraft, Peter; Zhang, Chenan (2018) Mendelian randomization studies of cancer risk: a literature review. Curr Epidemiol Rep 5:184-196

Barfield, Richard; Feng, Helian; Gusev, Alexander et al. (2018) Transcriptome-wide association studies accounting for colocalization using Egger regression. Genet Epidemiol 42:418-433

Liu, Zhonghua; Lin, Xihong (2018) Multiple phenotype association tests using summary statistics in genome-wide association studies. Biometrics 74:165-175

Emilsson, Louise; García-Albéniz, Xabier; Logan, Roger W et al. (2018) Examining Bias in Studies of Statin Treatment and Survival in Patients With Cancer. JAMA Oncol 4:63-70

Sun, Ryan; Carroll, Raymond J; Christiani, David C et al. (2018) Testing for gene-environment interaction under exposure misspecification. Biometrics 74:653-662

Antonelli, Joseph; Cefalu, Matthew; Palmer, Nathan et al. (2018) Doubly robust matching estimators for high dimensional confounding adjustment. Biometrics :

Wilson, Ander; Zigler, Corwin M; Patel, Chirag J et al. (2018) Model-averaged confounder adjustment for estimating multivariate exposure effects with linear regression. Biometrics 74:1034-1044

García-Albéniz, Xabier; Hsu, John; Hernán, Miguel A (2017) The value of explicitly emulating a target trial when using real world evidence: an application to colorectal cancer screening. Eur J Epidemiol 32:495-500

Showing the most recent 10 out of 192 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: