Genome-wide association studies (GWAS) are an effective tool for indentifying common genetic variants that contribute to disease and heritable traits. These studies use high-density oligoneculeotide arrays to assay hundreds of thousands of diallelic genetic markers in each individual. However, genome-wide association studies can also produce hundred of spurious disease-gene associations caused by genotyping error. This research will develop statistical and computational methods that use inter-marker correlation to substantially improve genotype accuracy. All existing methods for calling genotypes for large-scale data ignore the correlation between genetic markers. This correlation is highly informative, but exploiting inter-marker correlation is computationally difficult because it requires inference of the marker alleles inherited from a single parent (the haplotype phase). Recently, we have developed a novel method of haplotype phase inference for large-scale data sets of unrelated individuals that is orders of magnitude faster and more accurate than competing methods. The next step will be to improve haplotype phase inference and genotype calling by performing both tasks simultaneously. This will enable genotype uncertainty to be taken into account when inferring haplotype phase and inter-marker correlation to be taken into account when calling genotypes. Our methods will improve genotype accuracy, improve haplotype phase inference accuracy, decrease false positive associations due to genotyping error, and increase power to detect true genetic associations. We will extend these novel methods to call genotypes and phase haplotypes for parent-offspring trios where the additional relatedness information will lead to even larger gains in accuracy. The improved genotype accuracy and phased haplotypes from our methods will contribute to improved understanding of the genetic contribution to human disease. Our research will also address one of the main impediments to haplotypic analysis: the difficulty in interpreting analysis results. We will develop interactive methods for visualizing haplotype structure and haplotype-trait associations. These new data exploration methods will greatly simplify the task of identifying sequences of genetic variants that are associated with a trait.
Heritable genetic variants contribute to many common diseases, such as cardiovascular disease and diabetes. This research will develop new methods and tools that improve the accuracy of genetic data and that improve our ability to identify genetic variants that increase risk of disease. These methods and tools will contribute to the prevention, diagnosis, and treatment of heritable diseases in the United States and throughout the world.
Showing the most recent 10 out of 26 publications