Modern genomic epidemiology has rapidly evolved beyond initial expectations, primarily because of cutting- edge genetic assays and next-generation sequencing technologies combined with large well-characterized studies. Yet, novel statistical analysis methods that combine genomic annotation with measured genotypes and phenotypes have lagged behind, with most published genome wide association studies (GWAS) focused on single-marker (single nucleotide polymorphisms, SNPs) analyses. Recognizing that the majority of common genetic variants have small effects on traits, and that there are many associated variants, the time is ripe to re-harvest the many existing GWAS data sets, and many expected in the near future, by joining genomic annotation with GWAS results. Hence, we propose to develop new statistical and computational methods in order to scan all possible gene-sets using GWAS SNP data and public gene annotation. We also plan to develop penalized regression models to simultaneously model the effects of individual SNPs on a trait, the effects of genes on a trait, and the effects o gene-sets on a trait. This will allow incorporation of annotation when available, but not lose SNPs or genes when annotation is incomplete. Rare variants are likely to have a prominent role in the etiology of complex traits, and next-generation sequencing technologies will soon be affordable for large studies. We propose new strategies to screen for the association of rare variants with traits based on both the first- and second-moments of generalized regression models (as well as censored survival models). Finally, including annotation information into statistical models is particularly important for analyzing rare variants because they are sparse, and has potential to improve analyses for common SNPs, or even combining both rare and common variants into models. For this, we propose novel statistical methods based on kernel matrices that provide information on how regression coefficients should be """"""""fused"""""""" according to similarities of variants based on genomic annotation.

Public Health Relevance

Our proposed plans to develop improved statistical analysis methods for genomic epidemiology are likely to have high impact on the many different past and ongoing studies of the genetic etiology of common human diseases and traits. By applying our new analytic methods to existing data sets, or to future studies, new insights are expected regarding the genetic etiology of disease causation or - in pharmacogenomic studies- the genetic etiology of response to treatments or toxicities. These insights should provide the basis for designing future follow-up studies, such as laboratory-based functional studies to further refine understanding of disease causation, or how best to tailor treatments for optimal therapeutic benefits with reduced side-effects. Hence, our research plans have broad public health implications, ranging from disease screening, to diagnosis, to prognosis and treatment.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Krasnewich, Donna M
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Mayo Clinic, Rochester
United States
Zip Code
Schaid, Daniel J; Chen, Wenan; Larson, Nicholas B (2018) From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat Rev Genet 19:491-504
Larson, Nicholas B; McDonnell, Shannon; Cannon Albright, Lisa et al. (2017) gsSKAT: Rapid gene set analysis and multiple testing correction for rare-variant association studies using weighted linear kernels. Genet Epidemiol 41:297-308
Chen, Jun; Chen, Wenan; Zhao, Ni et al. (2016) Small Sample Kernel Association Tests for Human Genetic and Microbiome Association Studies. Genet Epidemiol 40:5-19
Larson, Nicholas B; McDonnell, Shannon; Albright, Lisa Cannon et al. (2016) Post hoc Analysis for Detecting Individual Rare Variant Risk Associations Using Probit Regression Bayesian Variable Selection Methods in Case-Control Sequencing Studies. Genet Epidemiol 40:461-9
Chen, Wenan; McDonnell, Shannon K; Thibodeau, Stephen N et al. (2016) Incorporating Functional Annotations for Fine-Mapping Causal Variants in a Bayesian Framework Using Summary Statistics. Genetics 204:933-958
Schaid, Daniel J; Tong, Xingwei; Larrabee, Beth et al. (2016) Statistical Methods for Testing Genetic Pleiotropy. Genetics 204:483-497
Wang, Xuefeng; Xing, Eric P; Schaid, Daniel J (2015) Kernel methods for large-scale genomic data analysis. Brief Bioinform 16:183-92
Chen, Wenan; Larrabee, Beth R; Ovsyannikova, Inna G et al. (2015) Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics. Genetics 200:719-36
Wu, Lang; Schaid, Daniel J; Sicotte, Hugues et al. (2015) Case-only exome sequencing and complex disease susceptibility gene discovery: study design considerations. J Med Genet 52:10-6
Oberg, Ann L; McKinney, Brett A; Schaid, Daniel J et al. (2015) Lessons learned in the analysis of high-dimensional data in vaccinomics. Vaccine 33:5262-70

Showing the most recent 10 out of 41 publications