The broad, long-term objectives of this research are the developments of innovative and high-impact statistical methods for the designs and analysis of chronic disease studies, with an emphasis on genomics.
The specific aims of this competing renewal application include: (1) efficient estimation for general two-phase studies, in which possibly incomplete multivariate outcomes and inexpensive covariates are measured on all study subjects in the first phase and the first-phase information is used to optimally select subjects for measurements of expensive covariates in the second phase;(2) valid and efficient analysis of genetic association when all study subjects are genotyped on a SNP array but only a small subset is sequenced or when unobserved allele-specific copy numbers are of direct interest;(3) meta-analysis under random-effects models when the number of studies is small relative to study sample sizes and variable selection based on summary statistics of multiple studies under a variety of model structures. All these problems are motivated by the principal investigator's applied research experiences and are highly relevant to current genomic studies. The proposed solutions are based on likelihood and other sound statistical principles. The large-sample properties of the new methods will be established rigorously via modern empirical process theory and semiparametric efficiency theory. Efficient and stable numerical algorithms will be developed to implement the inference procedures. The proposed methods will be evaluated extensively through simulation studies mimicking real data and be applied to several major genomic studies, most of which are carried out at the UNC. Efficient, reliable and user-friendly software with proper documentation will be freely available. This research will not only advance the fields of biostatistics and statistical genetics but also influence chronic disease research at the UNC and elsewhere.

Public Health Relevance

The broad, long-term objectives of this research are the developments of innovative and high-impact statistical methods for the designs and analysis of chronic disease studies, with an emphasis on genomics. The specific aims of this competing renewal application include efficient estimation under outcome-dependent sampling, genetic association analysis with incomplete DNA data, and meta-analysis with heterogeneous effects and high-dimensional covariate data. This research will not only advance the fields of biostatistics and statistical genetics but also influence current chronic disease research.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Research Project (R01)
Project #
2R01CA082659-14A1
Application #
8438778
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Dunn, Michelle C
Project Start
2000-04-01
Project End
2017-01-31
Budget Start
2013-02-07
Budget End
2014-01-31
Support Year
14
Fiscal Year
2013
Total Cost
$237,852
Indirect Cost
$76,943
Name
University of North Carolina Chapel Hill
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
608195277
City
Chapel Hill
State
NC
Country
United States
Zip Code
27599
Li, Xiang; Xie, Shanghong; Zeng, Donglin et al. (2018) Efficient ?0 -norm feature selection based on augmented and penalized minimization. Stat Med 37:473-486
Wong, Kin Yau; Zeng, Donglin; Lin, D Y (2018) Efficient Estimation for Semiparametric Structural Equation Models With Censored Data. J Am Stat Assoc 113:893-905
Gao, Fei; Zeng, Donglin; Lin, Dan-Yu (2018) Semiparametric regression analysis of interval-censored data with informative dropout. Biometrics :
Tao, Ran; Zeng, Donglin; Lin, Dan-Yu (2017) Efficient Semiparametric Inference Under Two-Phase Sampling, With Applications to Genetic Association Studies. J Am Stat Assoc 112:1468-1476
Tang, Zheng-Zheng; Bunn, Paul; Tao, Ran et al. (2017) PreMeta: a tool to facilitate meta-analysis of rare-variant associations. BMC Genomics 18:160
Silva, Grace O; Siegel, Marni B; Mose, Lisle E et al. (2017) SynthEx: a synthetic-normal-based DNA sequencing tool for copy number alteration detection and tumor heterogeneity profiling. Genome Biol 18:66
Mao, Lu; Lin, D Y (2017) Efficient Estimation of Semiparametric Transformation Models for the Cumulative Incidence of Competing Risks. J R Stat Soc Series B Stat Methodol 79:573-587
Zeng, Donglin; Gao, Fei; Lin, D Y (2017) Maximum likelihood estimation for semiparametric regression models with multivariate interval-censored data. Biometrika 104:505-525
Mao, Lu; Lin, Dan-Yu; Zeng, Donglin (2017) Semiparametric regression analysis of interval-censored competing risks data. Biometrics 73:857-865
Lin, Dan-Yu; Gong, Jianjian; Gallo, Paul et al. (2016) Simultaneous inference on treatment effects in survival studies with factorial designs. Biometrics 72:1078-1085

Showing the most recent 10 out of 104 publications