Whole-genome association testing is widely cited as having promise for identification of genetic variants that are causal to elevated risk of complex disorders like cardiovascular disease, diabetes, and cancers. The technology for genotyping at the requisite scale is becoming practical and affordable, but we lag behind in having the analytical tools needed to make the most reliable inferences from these data. This implies that we cannot yet design optimal studies, because we do not know what aspects of experimental designs erode the power of the studies.
Specific Aim 1 will develop Bayesian classification models, a promising approach for inference when the number of predictors (SNPs) is large, but where the prior expectation is that most SNPs will have zero effect. The model will have a three-component mixture prior with a high point mass at zero (no effect) as well as positive and negative effects on risk. Fitting will be done by Monte Carlo Markov chain and by stochastic variable selection. We will apply the model to BeadArray data, providing transcript abundance for 700 genes in cell lines from the 270 subjects of the HapMap project (each having more than 4 M SNP genotypes). The Bayesian classification approach will be contrasted with linear model based approaches. Both case-control and random cohort data will be addressed. Performance of the methods in the face of missing and erroneous data will be quantified.
Specific Aim 2 will explore the effects of ascertainment bias and of departures from neutrality of the marker variation on association testing. The HapMap SNPs were discovered in small samples, resulting in a bias toward SNPs that are more common than are found in the full population. There is a pressing need to explore the impact of such ascertainment bias on inference of association. Most methods of association testing assume that the markers follow neutral expectations, but we know that many regions of the genome show marked departures from this pattern. We will show through theory and simulation how these distortions impact standard approaches to association testing, and devise accommodations to the test.
Specific Aim 3 will apply data reduction methods to both the SNP and the phenotype data. SNP data consist of discrete factors that arise through a well-understood process (the coalescent), and explicit modeling of this process is likely to identify better methods for SNP dimension reduction. Some beginnings of this have appeared in the literature as the """"""""tag SNP"""""""". The phenotype data can be reduced by combining methods like clustering and sparse principal components. These methods will be applied to the Sanger gene expression data, and will be tested by simulation.
Specific Aim 4 will employ simulations to assess the power of association tests under violations of model assumptions. Of particular interest will be the tuning model parameters to optimize the balance of false positive and false negative inferences. ? ? ? ?
Pool, John E; Hellmann, Ines; Jensen, Jeffrey D et al. (2010) Population genetic inference from genomic sequence variation. Genome Res 20:291-300 |
Hunter-Zinck, Haley; Musharoff, Shaila; Salit, Jacqueline et al. (2010) Population genetic structure of the people of Qatar. Am J Hum Genet 87:17-25 |
Boyko, Adam R; Quignon, Pascale; Li, Lin et al. (2010) A simple genetic architecture underlies morphological variation in dogs. PLoS Biol 8:e1000451 |
Jiang, Rong; Tavare, Simon; Marjoram, Paul (2009) Population genetic inference from resequencing data. Genetics 181:187-97 |
Manolio, Teri A; Collins, Francis S; Cox, Nancy J et al. (2009) Finding the missing heritability of complex diseases. Nature 461:747-53 |
Dermitzakis, Emmanouil T; Clark, Andrew G (2009) Genetics. Life after GWA studies. Science 326:239-40 |
RamÃrez-Soriano, Anna; Nielsen, Rasmus (2009) Correcting estimators of theta and Tajima's D for ascertainment biases caused by the single-nucleotide polymorphism discovery process. Genetics 181:701-10 |
Gray, Melissa M; Granka, Julie M; Bustamante, Carlos D et al. (2009) Linkage disequilibrium and demographic history of wild and domestic canids. Genetics 181:1493-505 |
Torgerson, Dara G; Boyko, Adam R; Hernandez, Ryan D et al. (2009) Evolutionary processes acting on candidate cis-regulatory regions in humans inferred from patterns of polymorphism and divergence. PLoS Genet 5:e1000592 |
Pool, John E; Nielsen, Rasmus (2009) Inference of historical changes in migration rate from the lengths of migrant tracts. Genetics 181:711-9 |
Showing the most recent 10 out of 15 publications