Modern genomic epidemiology has rapidly evolved beyond initial expectations, primarily because of cutting- edge genetic assays and next-generation sequencing technologies combined with large well-characterized studies. Yet, novel statistical analysis methods that combine genomic annotation with measured genotypes and phenotypes have lagged behind, with most published genome wide association studies (GWAS) focused on single-marker (single nucleotide polymorphisms, SNPs) analyses. Recognizing that the majority of common genetic variants have small effects on traits, and that there are many associated variants, the time is ripe to re-harvest the many existing GWAS data sets, and many expected in the near future, by joining genomic annotation with GWAS results. Hence, we propose to develop new statistical and computational methods in order to scan all possible gene-sets using GWAS SNP data and public gene annotation. We also plan to develop penalized regression models to simultaneously model the effects of individual SNPs on a trait, the effects of genes on a trait, and the effects o gene-sets on a trait. This will allow incorporation of annotation when available, but not lose SNPs or genes when annotation is incomplete. Rare variants are likely to have a prominent role in the etiology of complex traits, and next-generation sequencing technologies will soon be affordable for large studies. We propose new strategies to screen for the association of rare variants with traits based on both the first- and second-moments of generalized regression models (as well as censored survival models). Finally, including annotation information into statistical models is particularly important for analyzing rare variants because they are sparse, and has potential to improve analyses for common SNPs, or even combining both rare and common variants into models. For this, we propose novel statistical methods based on kernel matrices that provide information on how regression coefficients should be fused according to similarities of variants based on genomic annotation.
Our proposed plans to develop improved statistical analysis methods for genomic epidemiology are likely to have high impact on the many different past and ongoing studies of the genetic etiology of common human diseases and traits. By applying our new analytic methods to existing data sets, or to future studies, new insights are expected regarding the genetic etiology of disease causation or - in pharmacogenomic studies- the genetic etiology of response to treatments or toxicities. These insights should provide the basis for designing future follow-up studies, such as laboratory-based functional studies to further refine understanding of disease causation, or how best to tailor treatments for optimal therapeutic benefits with reduced side-effects. Hence, our research plans have broad public health implications, ranging from disease screening, to diagnosis, to prognosis and treatment.
Showing the most recent 10 out of 41 publications