Variable Selection in Genetic Epidemiological Studies of Cardiovascular Diseases

Gu, C

Abstract

Cardiovascular diseases (CVD) affect millions of people in US and across the world. There is strong evidence of a genetic component in cardiovascular diseases (CVD) and related traits. An emerging consensus is that both genes and environment and, perhaps more importantly, their interactions are responsible for this complex disease. As a result, many genetic epidemiological (GE) studies of CVD use a study design that tests hundreds of thousands of genetic predictors (e.g., single nucleotide polymorphism (SNP) markers) and hundreds of (related) disease phenotypes and environmental covariates. This has brought tremendous analytical challenges, particularly the high dimensionality of the data and the obscure interactions among the many variables. As a result, searching for CVD disease genes has become a task of selecting important variables from a vast number of SNPs and other predictor variables. Our real data analyses in several ongoing large scale CVD related studies motivated us to consider new methodological solutions to the variable selection problem. This application is developed upon these positive preliminary findings. Our main idea is to develop a strategy for selecting important predictors of CVD by integrating multiple sources of information via the method of statistical learning (i.e., optimizing the selection by repeated learning from examples). In this strategy, we will first develop a method for selecting significant SNPs in moderate-dimensional data (e.g., lower thousands of SNPs, in candidate genes studies) by an integrated classifier. The method will build upon existing techniques assessing information of SNPs in haplotype similarity, imputed functional potential, and gene-gene interactions. We then scale up the new method to the high-dimensional setting of genome-wide association studies (e.g., at least hundreds of thousands of SNPs), by dimension reduction that utilizes the local linkage-disequilibrium (LD) structure in SNPs and by combining latent factor analysis of correlated CVD traits and pathway-based analysis to account for gene-environment (GxE) interactions. A fast-search algorithm will also be developed based on an existing search heuristic that was successfully applied in high-dimensional data of gene expression and genomic sequence analysis. The new methods and algorithms will be coded into R programs and distributed as tool set for an association analysis pipeline. Evaluations of the new methods will be performed by intensive simulation studies and by applying to existing datasets in ongoing studies of CVD and related diseases. Results from evaluation studies, together with the ancillary databases generated by the study such as imputed functional scores of potential or known CVD SNPs will be distributed on a dedicated project website. By doing so, we believe that the utilities resulted from the proposed research will make a significant contribution to many ongoing genetic epidemiological studies of CVD and related traits.

Public Health Relevance

This project is aimed at timely development of computational tools for emerging large-scale genome-wide association studies of cardiovascular diseases (CVD) that affect millions of people in US and across the world. The new methods deal with the analytical challenges brought forth by the high dimensionality of the data and the obscure interactions among the many variables in these studies, and the tools will be applied to ongoing studies of CVD and related diseases. The results, together with the computer programs and ancillary databases will make a significant contribution to many ongoing and new genetic epidemiological studies of CVD and related diseases.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Heart, Lung, and Blood Institute (NHLBI)
Type: Research Project (R01)
Project #: 5R01HL091028-02
Application #: 7663792
Study Section: Cardiovascular and Sleep Epidemiology (CASE)
Program Officer: Wolz, Michael

Project Start: 2008-08-01
Project End: 2011-07-30
Budget Start: 2009-08-01
Budget End: 2010-07-31
Support Year: 2
Fiscal Year: 2009
Total Cost: $228,000
Indirect Cost

Institution

Name: Washington University
Department: Biostatistics & Other Math Sci
Type: Schools of Medicine
DUNS #: 068552207

City: Saint Louis
State: MO
Country: United States
Zip Code: 63130

Related projects


NIH 2010 R01 HL	Variable Selection in Genetic Epidemiological Studies of Cardiovascular Diseases Gu, C Charles / Washington University	$228,000
NIH 2009 R01 HL	Variable Selection in Genetic Epidemiological Studies of Cardiovascular Diseases Gu, C Charles / Washington University	$228,000
NIH 2009 R01 HL	Variable Selection in Genetic Epidemiological Studies of Cardiovascular Diseases Gu, C Charles / Washington University	$229,243
NIH 2008 R01 HL	Variable Selection in Genetic Epidemiological Studies of Cardiovascular Diseases Gu, C Charles / Washington University	$228,000

Publications

Barve, Ruteja A; Gu, C Charles; Yang, Wei et al. (2016) Genetic association of left ventricular mass assessed by M-mode and two-dimensional echocardiography. J Hypertens 34:88-96

Climer, Sharlee; Yang, Wei; de las Fuentes, Lisa et al. (2014) A custom correlation coefficient (CCC) approach for fast identification of multi-SNP association patterns in genome-wide SNPs data. Genet Epidemiol 38:610-21

Yang, Wei; Charles Gu, C (2014) Random forest fishing: a novel approach to identifying organic group of risk factors in genome-wide association studies. Eur J Hum Genet 22:254-9

Yang, Wei; Gu, C Charles (2013) A whole-genome simulator capable of modeling high-order epistasis for complex disease. Genet Epidemiol 37:686-94

de las Fuentes, Lisa; Yang, Wei; Dávila-Román, Victor G et al. (2012) Pathway-based genome-wide association analysis of coronary heart disease identifies biologically important gene sets. Eur J Hum Genet 20:1168-73

Yang, Wei; de las Fuentes, Lisa; Dávila-Román, Victor G et al. (2011) Variable set enrichment analysis in genome-wide association studies. Eur J Hum Genet 19:893-900

Ray, Monika; Ruan, Jianhua; Zhang, Weixiong (2008) Variations in the transcriptome of Alzheimer's disease reveal molecular networks involved in cardiovascular diseases. Genome Biol 9:R148

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: