Correlated data are common in biomedical research such as cancer research, where clustered and spatial data are often observed. This correlation may be due to repeated measures over time as in longitudinal studies; or may be due to outcomes from multiple members within the same family as in genetic epidemiology; or may be due to geographic proximity as in estimation of disease maps. Valid statistical analysis needs to account for the correlation among observations. This proposal aims at developing statistical models and methods for several emerging correlated data problems. They include: (1) nonparametric regression which allows flexible modeling of covariate effects using nonparametric spline and kemel techniques, and semiparametric regression where the covariates of main interest are modeled parametrically and the nuisance covariates are modeled nonparametrically; (2) measurement errors in covariates which allow covariates to be measured with errors; (3) case-control studies with longitudinal covariates, where some covariates collected in outcome-dependent retrospective case-control studies are measured longitudinally and retrospectively; (4) causal inference in choice-based longitudinal intervention studies, where a subject chooses which intervention program he/she prefers and causal inference is challenged by the nonrandom nature of the design. Statistical models and methods will be developed to handle these problems and the correlation among observations will be accounted in these statistical developments. Asymptotic properties of the proposed methods will be investigated and simulation studies will be conducted to evaluate their finite sample performance. Efficient numerical algorithms and user-friendly statistical software will be developed, with the goal of disseminating these models and methods to health sciences researchers. In collaboration with biomedical investigators, we will apply the proposed models and methods to several motivating data sets on cancer research and other fields of research.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Research Project (R01)
Project #
7R01CA076404-09
Application #
7097795
Study Section
Special Emphasis Panel (ZRG1-SNEM-1 (03))
Program Officer
Tiwari, Ram C
Project Start
1997-12-15
Project End
2007-03-31
Budget Start
2005-08-01
Budget End
2006-03-31
Support Year
9
Fiscal Year
2005
Total Cost
$199,104
Indirect Cost
Name
Harvard University
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
149617367
City
Boston
State
MA
Country
United States
Zip Code
02115
Sofer, Tamar; Schifano, Elizabeth D; Christiani, David C et al. (2017) Weighted pseudolikelihood for SNP set analysis with multiple secondary outcomes in case-control genetic association studies. Biometrics 73:1210-1220
Mukherjee, Rajarshi; Pillai, Natesh S; Lin, Xihong (2015) HYPOTHESIS TESTING FOR HIGH-DIMENSIONAL SPARSE BINARY REGRESSION. Ann Stat 43:352-381
Wang, Chaolong; Zhan, Xiaowei; Bragg-Gresham, Jennifer et al. (2014) Ancestry estimation and control of population stratification for sequence-based association studies. Nat Genet 46:409-15
Huang, Yen-Tsung; Vanderweele, Tyler J; Lin, Xihong (2014) JOINT ANALYSIS OF SNP AND GENE EXPRESSION DATA IN GENETIC ASSOCIATION STUDIES OF COMPLEX DISEASES. Ann Appl Stat 8:352-376
Barnett, Ian J; Lee, Seunggeun; Lin, Xihong (2013) Detecting rare variant effects using extreme phenotype sampling in sequencing association studies. Genet Epidemiol 37:142-51
VanderWeele, Tyler J; Asomaning, Kofi; Tchetgen Tchetgen, Eric J et al. (2012) Genetic variants on 15q25.1, smoking, and lung cancer: an assessment of mediation and interaction. Am J Epidemiol 175:1013-20
Huang, Yen-Tsung; Lin, Xihong; Liu, Yan et al. (2011) Cigarette smoking increases copy number alterations in nonsmall-cell lung cancer. Proc Natl Acad Sci U S A 108:16345-50
Lin, Xinyi; Cai, Tianxi; Wu, Michael C et al. (2011) Kernel machine SNP-set analysis for censored survival outcomes in genome-wide association studies. Genet Epidemiol 35:620-31
Long, Qi; Little, Roderick J A; Lin, Xihong (2010) Estimating Causal Effects in Trials Involving Multi-Treatment Arms Subject to Non-compliance: A Bayesian framework. J R Stat Soc Ser C Appl Stat 59:513-531
Wu, Michael C; Kraft, Peter; Epstein, Michael P et al. (2010) Powerful SNP-set analysis for case-control genome-wide association studies. Am J Hum Genet 86:929-42

Showing the most recent 10 out of 22 publications