Modern genomic epidemiology has rapidly evolved beyond initial expectations, primarily because of cutting- edge genetic assays and next-generation sequencing technologies combined with large well-characterized studies. Yet, novel statistical analysis methods that combine genomic annotation with measured genotypes and phenotypes have lagged behind, with most published genome wide association studies (GWAS) focused on single-marker (single nucleotide polymorphisms, SNPs) analyses. Recognizing that the majority of common genetic variants have small effects on traits, and that there are many associated variants, the time is ripe to re-harvest the many existing GWAS data sets, and many expected in the near future, by joining genomic annotation with GWAS results. Hence, we propose to develop new statistical and computational methods in order to scan all possible gene-sets using GWAS SNP data and public gene annotation. We also plan to develop penalized regression models to simultaneously model the effects of individual SNPs on a trait, the effects of genes on a trait, and the effects o gene-sets on a trait. This will allow incorporation of annotation when available, but not lose SNPs or genes when annotation is incomplete. Rare variants are likely to have a prominent role in the etiology of complex traits, and next-generation sequencing technologies will soon be affordable for large studies. We propose new strategies to screen for the association of rare variants with traits based on both the first- and second-moments of generalized regression models (as well as censored survival models). Finally, including annotation information into statistical models is particularly important for analyzing rare variants because they are sparse, and has potential to improve analyses for common SNPs, or even combining both rare and common variants into models. For this, we propose novel statistical methods based on kernel matrices that provide information on how regression coefficients should be "fused" according to similarities of variants based on genomic annotation.

Public Health Relevance

Our proposed plans to develop improved statistical analysis methods for genomic epidemiology are likely to have high impact on the many different past and ongoing studies of the genetic etiology of common human diseases and traits. By applying our new analytic methods to existing data sets, or to future studies, new insights are expected regarding the genetic etiology of disease causation or - in pharmacogenomic studies- the genetic etiology of response to treatments or toxicities. These insights should provide the basis for designing future follow-up studies, such as laboratory-based functional studies to further refine understanding of disease causation, or how best to tailor treatments for optimal therapeutic benefits with reduced side-effects. Hence, our research plans have broad public health implications, ranging from disease screening, to diagnosis, to prognosis and treatment.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Krasnewich, Donna M
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Mayo Clinic, Rochester
United States
Zip Code
Larson, Nicholas B; McDonnell, Shannon; Albright, Lisa Cannon et al. (2016) Post hoc Analysis for Detecting Individual Rare Variant Risk Associations Using Probit Regression Bayesian Variable Selection Methods in Case-Control Sequencing Studies. Genet Epidemiol 40:461-9
Chen, Wenan; McDonnell, Shannon K; Thibodeau, Stephen N et al. (2016) Incorporating Functional Annotations for Fine-Mapping Causal Variants in a Bayesian Framework Using Summary Statistics. Genetics :
Schaid, Daniel J; Tong, Xingwei; Larrabee, Beth et al. (2016) Statistical Methods for Testing Genetic Pleiotropy. Genetics 204:483-497
Chen, Jun; Chen, Wenan; Zhao, Ni et al. (2016) Small Sample Kernel Association Tests for Human Genetic and Microbiome Association Studies. Genet Epidemiol 40:5-19
Chen, Wenan; Larrabee, Beth R; Ovsyannikova, Inna G et al. (2015) Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics. Genetics 200:719-36
Oberg, Ann L; McKinney, Brett A; Schaid, Daniel J et al. (2015) Lessons learned in the analysis of high-dimensional data in vaccinomics. Vaccine 33:5262-70
Wang, Xuefeng; Xing, Eric P; Schaid, Daniel J (2015) Kernel methods for large-scale genomic data analysis. Brief Bioinform 16:183-92
Larson, Nicholas B; Schaid, Daniel J (2014) Regularized rare variant enrichment analysis for case-control exome sequencing data. Genet Epidemiol 38:104-13
Sinnwell, Jason P; Therneau, Terry M; Schaid, Daniel J (2014) The kinship2 R package for pedigree data. Hum Hered 78:91-3
Chen, Wenan; Schaid, Daniel J (2014) PedBLIMP: extending linear predictors to impute genotypes in pedigrees. Genet Epidemiol 38:531-41

Showing the most recent 10 out of 37 publications