Whole-genome association testing is widely cited as having promise for identification of genetic variants that are causal to elevated risk of complex disorders like cardiovascular disease, diabetes, and cancers. The technology for genotyping at the requisite scale is becoming practical and affordable, but we lag behind in having the analytical tools needed to make the most reliable inferences from these data. This implies that we cannot yet design optimal studies, because we do not know what aspects of experimental designs erode the power of the studies.
Specific Aim 1 will develop Bayesian classification models, a promising approach for inference when the number of predictors (SNPs) is large, but where the prior expectation is that most SNPs will have zero effect. The model will have a three-component mixture prior with a high point mass at zero (no effect) as well as positive and negative effects on risk. Fitting will be done by Monte Carlo Markov chain and by stochastic variable selection. We will apply the model to BeadArray data, providing transcript abundance for 700 genes in cell lines from the 270 subjects of the HapMap project (each having more than 4 M SNP genotypes). The Bayesian classification approach will be contrasted with linear model based approaches. Both case-control and random cohort data will be addressed. Performance of the methods in the face of missing and erroneous data will be quantified.
Specific Aim 2 will explore the effects of ascertainment bias and of departures from neutrality of the marker variation on association testing. The HapMap SNPs were discovered in small samples, resulting in a bias toward SNPs that are more common than are found in the full population. There is a pressing need to explore the impact of such ascertainment bias on inference of association. Most methods of association testing assume that the markers follow neutral expectations, but we know that many regions of the genome show marked departures from this pattern. We will show through theory and simulation how these distortions impact standard approaches to association testing, and devise accommodations to the test.
Specific Aim 3 will apply data reduction methods to both the SNP and the phenotype data. SNP data consist of discrete factors that arise through a well-understood process (the coalescent), and explicit modeling of this process is likely to identify better methods for SNP dimension reduction. Some beginnings of this have appeared in the literature as the """"""""tag SNP"""""""". The phenotype data can be reduced by combining methods like clustering and sparse principal components. These methods will be applied to the Sanger gene expression data, and will be tested by simulation.
Specific Aim 4 will employ simulations to assess the power of association tests under violations of model assumptions. Of particular interest will be the tuning model parameters to optimize the balance of false positive and false negative inferences. ? ? ? ?

Agency
National Institute of Health (NIH)
Institute
National Heart, Lung, and Blood Institute (NHLBI)
Type
Research Project--Cooperative Agreements (U01)
Project #
1U01HL084706-01
Application #
7103138
Study Section
Special Emphasis Panel (ZHG1-HGR-P (J1))
Program Officer
Paltoo, Dina
Project Start
2006-06-15
Project End
2009-05-31
Budget Start
2006-06-15
Budget End
2007-05-31
Support Year
1
Fiscal Year
2006
Total Cost
$316,000
Indirect Cost
Name
Cornell University
Department
Biochemistry
Type
Schools of Arts and Sciences
DUNS #
872612445
City
Ithaca
State
NY
Country
United States
Zip Code
14850
Boyko, Adam R; Quignon, Pascale; Li, Lin et al. (2010) A simple genetic architecture underlies morphological variation in dogs. PLoS Biol 8:e1000451
Pool, John E; Hellmann, Ines; Jensen, Jeffrey D et al. (2010) Population genetic inference from genomic sequence variation. Genome Res 20:291-300
Hunter-Zinck, Haley; Musharoff, Shaila; Salit, Jacqueline et al. (2010) Population genetic structure of the people of Qatar. Am J Hum Genet 87:17-25
Pool, John E; Nielsen, Rasmus (2009) Inference of historical changes in migration rate from the lengths of migrant tracts. Genetics 181:711-9
Auton, Adam; Bryc, Katarzyna; Boyko, Adam R et al. (2009) Global distribution of genomic diversity underscores rich complex history of continental human populations. Genome Res 19:795-803
Jiang, Rong; Tavare, Simon; Marjoram, Paul (2009) Population genetic inference from resequencing data. Genetics 181:187-97
Manolio, Teri A; Collins, Francis S; Cox, Nancy J et al. (2009) Finding the missing heritability of complex diseases. Nature 461:747-53
Dermitzakis, Emmanouil T; Clark, Andrew G (2009) Genetics. Life after GWA studies. Science 326:239-40
Ramírez-Soriano, Anna; Nielsen, Rasmus (2009) Correcting estimators of theta and Tajima's D for ascertainment biases caused by the single-nucleotide polymorphism discovery process. Genetics 181:701-10
Gray, Melissa M; Granka, Julie M; Bustamante, Carlos D et al. (2009) Linkage disequilibrium and demographic history of wild and domestic canids. Genetics 181:1493-505

Showing the most recent 10 out of 15 publications