The overall goal of our research is to develop and extend efficient exact statistical tools for testing genetic association, and to incorporate these methods into existing, widely used software packages that will serve the needs of data analysts in pharmaceuticals, epidemiology, public health, and other fields seeking to better understand the genetic causes of complex disease. The demand in this research area for greater statistical and computational innovation is rising dramatically, as rapid progress in genotyping technology is making it easier and less costly to measure sampled subjects for ever-larger numbers of genetic markers. Such investigative markers now predominantly include individual base pair mutations (referred to as single nucleotide polymorphisms or SNPs) along strands of cellular DNA. Marker panels of 1-2M SNPs are now common for genome-wide studies, and developing technologies (such as exome or whole-genome sequencing) will allow routine comparisons over marker sets that are orders of magnitude larger. With so many hypothesis tests, the need to preserve the rate of false positive findings presents some critical statistical and computational difficulties. Existing methods and their implementations often perform poorly under common conditions. The procedures developed during both phases of our project will significantly improve the efficiency, accuracy, and statistical power of genetic association tests, both for current GWAS panels as well as for next-generation technologies that are yielding even greater volumes of data. This project represents the joint efforts of investigators who are at the forefron of methodological research into genetic association, and software developers who have extensive experience in making cutting-edge exact statistical methods available in user-friendly software. In this project, we will extend the work begun during Phase 1 by (1) implementing a battery of exact multiple testing procedures for genetic association studies with case-control data, and making their performance significantly more efficient by using a parallel processing approach;(2) developing and implementing new multiple testing procedures for family-based association studies;(3) providing a framework that will allow our parallel processing programs to be as widely compatible as possible with modern personal computing hardware;and (4) incorporating the procedures additionally within a SAS PROC, and developing an interface that will allow users to access R functions and objects while using StatXact.
Studies of complex disease and genetics now commonly use thousands or even millions of different genetic markers. Conventional statistical analyses in such studies can suffer from a variety of challenges, connected primarily to controlling the rate of false positive findings when carrying out so many individual hypothesis tests. We propose to develop commercial software with computationally more efficient and robust procedures for modern genetic association studies.