Statistical Methods for Correlated and High-Dimensional Biomedical Data

Lin, Xihong

Abstract

Correlated and high-dimensional data arise frequently in health sciences research, especially in cancer research. Correlated data arise in longitudinal studies and familial studies, while high-dimensional data have emerged in recent years as a consequence of the rapid advance of genomic and proteomic research. We propose in this application to develop nonparametric and semiparametric regression methods for clustered/longitudinal data and high-dimensional genomic and proteomic data. Specifically, we propose to develop (1) the kernel (spline) profile EM method for generalized semiparametric mixed models for clustered/longitudinal data;(2) nonparametric and semiparametric regression models for longitudinal data with dropouts;(3) the mixed model kernel machine method for generalized semiparametric regression models and semiparametric Cox models for the analysis of gene expression pathways and tag single nucleotide polymorphisms (SNPs) within a candidate gene, and the sparse kernel machine (SKM) method for selecting genes and tag SNPs from a large pool of genes or tag SNPs;(4) the joint modeling method using functional wavelet models and generalized semiparametric models for mass spectrometry proteomic data and disease outcomes. Asymptotic properties of the proposed methods will be investigated and simulation studies will be conducted to evaluate their finite sample performance. Efficient numerical algorithms and user-friendly statistical software will be developed, with the goal of disseminating these models and methods to health sciences researchers. In collaboration with biomedical investigators, we will apply the proposed models and methods to several motivating data sets on cancer research and other fields of research.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Method to Extend Research in Time (MERIT) Award (R37)
Project #: 5R37CA076404-16
Application #: 8248721
Study Section: Special Emphasis Panel (NSS)
Program Officer: Dunn, Michelle C

Project Start: 1997-12-15
Project End: 2016-03-31
Budget Start: 2012-04-01
Budget End: 2013-03-31
Support Year: 16
Fiscal Year: 2012
Total Cost: $308,344
Indirect Cost: $117,419

Institution

Name: Harvard University
Department: Biostatistics & Other Math Sci
Type: Schools of Public Health
DUNS #: 149617367

City: Boston
State: MA
Country: United States
Zip Code: 02115

Related projects


NIH 2014 R37 CA	Statistical Methods for Correlated and High-Dimensional Biomedical Data Lin, Xihong / Harvard University	$299,093
NIH 2013 R37 CA	Statistical Methods for Correlated and High-Dimensional Biomedical Data Lin, Xihong / Harvard University	$289,844
NIH 2012 R37 CA	Statistical Methods for Correlated and High-Dimensional Biomedical Data Lin, Xihong / Harvard University	$308,344
NIH 2011 R37 CA	Statistical Methods for Correlated and High-Dimensional Biomedical Data Lin, Xihong / Harvard University	$308,344
NIH 2010 R37 CA	Statistical Methods for Correlated and High-Dimensional Biomedical Data Lin, Xihong / Harvard University	$312,162
NIH 2009 R37 CA	Statistical Methods for Correlated and High-Dimensional Biomedical Data Lin, Xihong / Harvard University	$311,924
NIH 2008 R37 CA	Statistical Methods for Correlated and High-Dimensional Biomedical Data Lin, Xihong / Harvard University	$311,208
NIH 2007 R37 CA	Statistical Methods for Correlated and High-Dimensional Biomedical Data Lin, Xihong / Harvard University	$311,685

Publications

Sofer, Tamar; Schifano, Elizabeth D; Christiani, David C et al. (2017) Weighted pseudolikelihood for SNP set analysis with multiple secondary outcomes in case-control genetic association studies. Biometrics 73:1210-1220

Chen, Jun; Behnam, Ehsan; Huang, Jinyan et al. (2017) Fast and robust adjustment of cell mixtures in epigenome-wide association studies with SmartSVA. BMC Genomics 18:413

Barnett, Ian; Mukherjee, Rajarshi; Lin, Xihong (2017) The Generalized Higher Criticism for Testing SNP-Set Effects in Genetic Association Studies. J Am Stat Assoc 112:64-76

Chen, Jun; Just, Allan C; Schwartz, Joel et al. (2016) CpGFilter: model-based CpG probe filtering with replicates for epigenome-wide association studies. Bioinformatics 32:469-71

Chen, Han; Wang, Chaolong; Conomos, Matthew P et al. (2016) Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models. Am J Hum Genet 98:653-66

Lin, Xinyi; Lee, Seunggeun; Wu, Michael C et al. (2016) Test for rare variants by environment interactions in sequencing association studies. Biometrics 72:156-64

Yung, Godwin; Lin, Xihong (2016) Validity of using ad hoc methods to analyze secondary traits in case-control association studies. Genet Epidemiol 40:732-743

Barnett, Ian J; Lin, Xihong (2014) Analytic P-value calculation for the higher criticism test in finite d problems. Biometrika 101:964-970

Huang, Yen-Tsung; Vanderweele, Tyler J; Lin, Xihong (2014) JOINT ANALYSIS OF SNP AND GENE EXPRESSION DATA IN GENETIC ASSOCIATION STUDIES OF COMPLEX DISEASES. Ann Appl Stat 8:352-376

Sofer, Tamar; Dicker, Lee; Lin, Xihong (2014) VARIABLE SELECTION FOR HIGH DIMENSIONAL MULTIVARIATE OUTCOMES. Stat Sin 24:1633-1654

Showing the most recent 10 out of 67 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: