Modern genomic epidemiology has rapidly evolved beyond initial expectations, primarily because of cutting- edge genetic assays and next-generation sequencing technologies combined with large well-characterized studies. Yet, novel statistical analysis methods that combine genomic annotation with measured genotypes and phenotypes have lagged behind, with most published genome wide association studies (GWAS) focused on single-marker (single nucleotide polymorphisms, SNPs) analyses. Recognizing that the majority of common genetic variants have small effects on traits, and that there are many associated variants, the time is ripe to re-harvest the many existing GWAS data sets, and many expected in the near future, by joining genomic annotation with GWAS results. Hence, we propose to develop new statistical and computational methods in order to scan all possible gene-sets using GWAS SNP data and public gene annotation. We also plan to develop penalized regression models to simultaneously model the effects of individual SNPs on a trait, the effects of genes on a trait, and the effects o gene-sets on a trait. This will allow incorporation of annotation when available, but not lose SNPs or genes when annotation is incomplete. Rare variants are likely to have a prominent role in the etiology of complex traits, and next-generation sequencing technologies will soon be affordable for large studies. We propose new strategies to screen for the association of rare variants with traits based on both the first- and second-moments of generalized regression models (as well as censored survival models). Finally, including annotation information into statistical models is particularly important for analyzing rare variants because they are sparse, and has potential to improve analyses for common SNPs, or even combining both rare and common variants into models. For this, we propose novel statistical methods based on kernel matrices that provide information on how regression coefficients should be "fused" according to similarities of variants based on genomic annotation.

Public Health Relevance

Our proposed plans to develop improved statistical analysis methods for genomic epidemiology are likely to have high impact on the many different past and ongoing studies of the genetic etiology of common human diseases and traits. By applying our new analytic methods to existing data sets, or to future studies, new insights are expected regarding the genetic etiology of disease causation or - in pharmacogenomic studies- the genetic etiology of response to treatments or toxicities. These insights should provide the basis for designing future follow-up studies, such as laboratory-based functional studies to further refine understanding of disease causation, or how best to tailor treatments for optimal therapeutic benefits with reduced side-effects. Hence, our research plans have broad public health implications, ranging from disease screening, to diagnosis, to prognosis and treatment.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Krasnewich, Donna M
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Mayo Clinic, Rochester
United States
Zip Code
Wang, Xuefeng; Xing, Eric P; Schaid, Daniel J (2015) Kernel methods for large-scale genomic data analysis. Brief Bioinform 16:183-92
Larson, Nicholas B; Schaid, Daniel J (2014) Regularized rare variant enrichment analysis for case-control exome sequencing data. Genet Epidemiol 38:104-13
Sinnwell, Jason P; Therneau, Terry M; Schaid, Daniel J (2014) The kinship2 R package for pedigree data. Hum Hered 78:91-3
Chen, Wenan; Schaid, Daniel J (2014) PedBLIMP: extending linear predictors to impute genotypes in pedigrees. Genet Epidemiol 38:531-41
Schaid, Daniel J; McDonnell, Shannon K; Sinnwell, Jason P et al. (2013) Multiple genetic variant association testing by collapsing and kernel methods with pedigree or population structured data. Genet Epidemiol 37:409-18
Wen, Yalu; Schaid, Daniel J; Lu, Qing (2013) A bivariate mann-whitney approach for unraveling genetic variants and interactions contributing to comorbidity. Genet Epidemiol 37:248-55
Schaid, Daniel J; Jenkins, Gregory D; Ingle, James N et al. (2013) Two-phase designs to follow-up genome-wide association signals with DNA resequencing studies. Genet Epidemiol 37:229-38
Schaid, Daniel J; Sinnwell, Jason P; McDonnell, Shannon K et al. (2013) Detecting genomic clustering of risk variants from sequence data: cases versus controls. Hum Genet 132:1301-9
Larson, Nicholas B; Schaid, Daniel J (2013) A kernel regression approach to gene-gene interaction detection for case-control studies. Genet Epidemiol 37:695-703
Schaid, Daniel J; McDonnell, Shannon K; Riska, Shaun M et al. (2010) Estimation of genotype relative risks from pedigree data by retrospective likelihoods. Genet Epidemiol 34:287-98

Showing the most recent 10 out of 25 publications