One of the most pressing problems in identifying and localizing genes influencing diseases is the need to model linkage disequilibrium between single nucleotide polymorphisms in dense genotyping assays. Currently available assays determine genotypes at over 500,000 loci per sample, and this data is being used in multiple study designs. Statistical models and methods are needed to account appropriately for linkage disequilibrium, as well as observational error and population admixture. Naive approaches to the problem that rely on single locus analyses are swamped by the need to correct for multiple and correlated tests. Other approaches, such as those that thin out the loci used to reduce linkage disequilibrium or assume that alleles occur in blocks based on location, while sensible and reasonably efficient, do not exploit all of the potential statistical power and resolution made possible by this kind of data. Graphical models are a class of statistical models that can be applied to joint distributions of multivariate observations. In preliminary work by the principal investigator under a current R21 grant, these have been shown to give both accurate and tractable representations of the patterns of allelic association that occur between proximal genetic loci in a variety of problems. Results have been consistent with other sophisticated modeling methods, such as ancestral recombination graphs. In contrast models in which strong assumptions are made based on physical location of loci, such as low order Markov models, have been shown to be inappropriate for this problem. The purpose of this proposal is to further develop graphical modeling methods for linkage disequilibrium in association studies, identity by descent mapping, and linkage analysis. In particular we focus on model restrictions that will give an order of magnitude improvement in computational efficiency;a new formulation for the linkage analysis problem that should improve the mixing properties of Markov chain Monte Carlo methods;and a novel and general method for approximating complex graphical models with simpler ones. In addition, we pursue an approach to identity by descent mapping that incorporates linkage disequilibrium and is scalable to the whole genome level. For this aim of the project we intend to apply the methods developed to dense genotype assays obtained for distantly related breast cancer cases in extended Utah pedigrees.
Showing the most recent 10 out of 15 publications