This project is about the development of better statistical methods to dissect complex trait variation and to predict outcome from genome-wide marker data. It anticipates that individual risk prediction for disease will become an integral part of Genomic Medicine in the USA and elsewhere. To predict an individual's risk of disease from genetic data it is not necessary to have identified the causal variant or fully understand the biology - all that is needed is a predictor that is correlated with outcome. The statistically best predictor depends on the genetic architecture of the trait: the distribution of effect sizes of causal variants, the distribution of their allele frequency, and the correlation between the two. Therefore, methods to better understand the genetic architecture of complex traits will lead to better statistical prediction methods and the performance of prediction methods will lead to new inference on genetic architecture. We will develop, test and apply statistical genetic methods that utilize whole-genome genotype or sequence data from population based samples that have also been phenotyped for one or more complex traits, estimate locus-specific, chromosome-wide and whole genome matrices of genetic covariance between all pairs of individuals, and estimate variance components associated with these. We will use the results and those from large genomewide association studies to estimate the distribution of SNP and chromosome segment effects by fitting mixture models using an EM-algorithm. We will use simulation models to calibrate the observed distribution of risk allele frequencies for disease with evolutionary models that include the mode of natural selection and pleiotropic relationships in effects on fitness and disease as parameters. We will develop and test Bayesian and non-Bayesian statistical linear mixed models that utilize all available genetic data simultaneously to predict an individual's risk of disease. We will implement prediction methods using data from the Program Grant investigators, from large international research consortia and from data in the public domain, and test their efficiency by correlating outcome with predictors in independent data sets.
Showing the most recent 10 out of 152 publications