Genome-wide association and linkage studies involving hundreds or thousands of Single Nucleotide Polymorphisms (SNPs) are becoming increasingly common due to the rapid development of biotechnologies. Among many statistical challenges arising from these studies, the typical limited sample size is of particular concern because of high genotyping cost putting pressure to limit the number of individuals genotyped;increased statistical significant level for fear of too many false positives due to multiple comparisons;and moderate risk from each disease-associated variant allele in complex diseases. This application considers two strategies to address this issue: (1) to increase sample size by pooling data obtained from several sources;(2) to devise better statistical and computational tools for more efficient usage of the data. Correspondly, the first aim is to develop estimation and inference procedures for genetic association using data obtained from both population-based case-control and family-based studies, accommodating diverse ascertainment schemes of cases and controls, whereas the second aim is to develop analysis and regularization methods that enhance the possibility that the disease-associated variants and their interactions can actually be identified.
The second aim i s also concerned with the construction of risk predictive models from these SNPs. The highly dense SNP markers also pose problems to a more traditional model-based linkage analysis for gene discovery, because the methods for this analysis were developed assuming markers in linkage equilibrium, an assumption that is likely violated with the density of the SNPs.
The third aim i s to develop and evaluate estimating procedures for multipoint linkage analysis in the presence of linkage disequilibrium among SNP markers.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Program Projects (P01)
Project #
Application #
Study Section
Subcommittee G - Education (NCI)
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Fred Hutchinson Cancer Research Center
United States
Zip Code
Howard, Barbara V; Aragaki, Aaron K; Tinker, Lesley F et al. (2018) A Low-Fat Dietary Pattern and Diabetes: A Secondary Analysis From the Women's Health Initiative Dietary Modification Trial. Diabetes Care 41:680-687
Huang, Yijian; Wang, Ching-Yun (2018) Cox regression with dependent error in covariates. Biometrics 74:118-126
Su, Yu-Ru; Di, Chongzhi; Bien, Stephanie et al. (2018) A Mixed-Effects Model for Powerful Association Tests in Integrative Functional Genomics. Am J Hum Genet 102:904-919
Liu, Dandan; Cai, Tianxi; Lok, Anna et al. (2018) Nonparametric Maximum Likelihood Estimators of Time-Dependent Accuracy Measures for Survival Outcome Under Two-Stage Sampling Designs. J Am Stat Assoc 113:882-892
Yu, Hsiang; Cheng, Yu-Jen; Wang, Ching-Yun (2018) Methods for multivariate recurrent event data with measurement error and informative censoring. Biometrics 74:966-976
Monaco, John V; Gorfine, Malka; Hsu, Li (2018) General Semiparametric Shared Frailty Model: Estimation and Simulation with frailtySurv. J Stat Softw 86:
Dai, James Y; Wang, Xiaoyu; Buas, Matthew F et al. (2018) Whole-genome sequencing of esophageal adenocarcinoma in Chinese patients reveals distinct mutational signatures and genomic alterations. Commun Biol 1:174
Dai, James Y; Peters, Ulrike; Wang, Xiaoyu et al. (2018) Diagnostics for Pleiotropy in Mendelian Randomization Studies: Global and Individual Tests for Direct Effects. Am J Epidemiol 187:2672-2680
Dai, James Y; Liang, C Jason; LeBlanc, Michael et al. (2018) Case-only approach to identifying markers predicting treatment effects on the relative risk scale. Biometrics 74:753-763
Prentice, Ross L; Zhao, Shanshan (2018) Nonparametric estimation of the multivariate survivor function: the multivariate Kaplan-Meier estimator. Lifetime Data Anal 24:3-27

Showing the most recent 10 out of 319 publications