Dimension Reduction Approaches for Genome-wide Association Testing

Clark, Andrew

Abstract

Whole-genome association testing is widely cited as having promise for identification of genetic variants that are causal to elevated risk of complex disorders like cardiovascular disease, diabetes, and cancers. The technology for genotyping at the requisite scale is becoming practical and affordable, but we lag behind in having the analytical tools needed to make the most reliable inferences from these data. This implies that we cannot yet design optimal studies, because we do not know what aspects of experimental designs erode the power of the studies.
Specific Aim 1 will develop Bayesian classification models, a promising approach for inference when the number of predictors (SNPs) is large, but where the prior expectation is that most SNPs will have zero effect. The model will have a three-component mixture prior with a high point mass at zero (no effect) as well as positive and negative effects on risk. Fitting will be done by Monte Carlo Markov chain and by stochastic variable selection. We will apply the model to BeadArray data, providing transcript abundance for 700 genes in cell lines from the 270 subjects of the HapMap project (each having more than 4 M SNP genotypes). The Bayesian classification approach will be contrasted with linear model based approaches. Both case-control and random cohort data will be addressed. Performance of the methods in the face of missing and erroneous data will be quantified.
Specific Aim 2 will explore the effects of ascertainment bias and of departures from neutrality of the marker variation on association testing. The HapMap SNPs were discovered in small samples, resulting in a bias toward SNPs that are more common than are found in the full population. There is a pressing need to explore the impact of such ascertainment bias on inference of association. Most methods of association testing assume that the markers follow neutral expectations, but we know that many regions of the genome show marked departures from this pattern. We will show through theory and simulation how these distortions impact standard approaches to association testing, and devise accommodations to the test.
Specific Aim 3 will apply data reduction methods to both the SNP and the phenotype data. SNP data consist of discrete factors that arise through a well-understood process (the coalescent), and explicit modeling of this process is likely to identify better methods for SNP dimension reduction. Some beginnings of this have appeared in the literature as the """"""""tag SNP"""""""". The phenotype data can be reduced by combining methods like clustering and sparse principal components. These methods will be applied to the Sanger gene expression data, and will be tested by simulation.
Specific Aim 4 will employ simulations to assess the power of association tests under violations of model assumptions. Of particular interest will be the tuning model parameters to optimize the balance of false positive and false negative inferences. ? ? ? ?

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Heart, Lung, and Blood Institute (NHLBI)
Type: Research Project--Cooperative Agreements (U01)
Project #: 1U01HL084706-01
Application #: 7103138
Study Section: Special Emphasis Panel (ZHG1-HGR-P (J1))
Program Officer: Paltoo, Dina

Project Start: 2006-06-15
Project End: 2009-05-31
Budget Start: 2006-06-15
Budget End: 2007-05-31
Support Year: 1
Fiscal Year: 2006
Total Cost: $316,000
Indirect Cost

Institution

Name: Cornell University
Department: Biochemistry
Type: Schools of Arts and Sciences
DUNS #: 872612445

City: Ithaca
State: NY
Country: United States
Zip Code: 14850

Related projects


NIH 2008 U01 HL	Dimension Reduction Approaches for Genome-wide Association Testing Clark, Andrew G. / Cornell University	$301,129
NIH 2007 U01 HL	Dimension Reduction Approaches for Genome-wide Association Testing Clark, Andrew G. / Cornell University	$304,079
NIH 2006 U01 HL	Dimension Reduction Approaches for Genome-wide Association Testing Clark, Andrew G. / Cornell University	$316,000

Publications

Boyko, Adam R; Quignon, Pascale; Li, Lin et al. (2010) A simple genetic architecture underlies morphological variation in dogs. PLoS Biol 8:e1000451

Pool, John E; Hellmann, Ines; Jensen, Jeffrey D et al. (2010) Population genetic inference from genomic sequence variation. Genome Res 20:291-300

Hunter-Zinck, Haley; Musharoff, Shaila; Salit, Jacqueline et al. (2010) Population genetic structure of the people of Qatar. Am J Hum Genet 87:17-25

Jiang, Rong; Tavare, Simon; Marjoram, Paul (2009) Population genetic inference from resequencing data. Genetics 181:187-97

Manolio, Teri A; Collins, Francis S; Cox, Nancy J et al. (2009) Finding the missing heritability of complex diseases. Nature 461:747-53

Dermitzakis, Emmanouil T; Clark, Andrew G (2009) Genetics. Life after GWA studies. Science 326:239-40

Ramírez-Soriano, Anna; Nielsen, Rasmus (2009) Correcting estimators of theta and Tajima's D for ascertainment biases caused by the single-nucleotide polymorphism discovery process. Genetics 181:701-10

Gray, Melissa M; Granka, Julie M; Bustamante, Carlos D et al. (2009) Linkage disequilibrium and demographic history of wild and domestic canids. Genetics 181:1493-505

Torgerson, Dara G; Boyko, Adam R; Hernandez, Ryan D et al. (2009) Evolutionary processes acting on candidate cis-regulatory regions in humans inferred from patterns of polymorphism and divergence. PLoS Genet 5:e1000592

Pool, John E; Nielsen, Rasmus (2009) Inference of historical changes in migration rate from the lengths of migrant tracts. Genetics 181:711-9

Showing the most recent 10 out of 15 publications

Comments

Be the first to comment on Andrew Clark's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: