The number of research efforts seeking to find genetic variants that predispose to human disease via genetic association studies has grown significantly since the completion of both the Human Genome Project and the International HapMap Project. In this research we consider alternate sample design methodologies for genetic association studies, with the goal of maximizing statistical power for testing genotype-phenotype association. Maximizing statistical power will allow researchers to more quickly and efficiently identify genetic variants predisposing individuals to complex human diseases. We will start by evaluating the cost-effectiveness of gathering duplicate genotype data. Duplicate genotype data is collected by twice genotyping some portion of individuals in a study using a method that may make classification errors (e.g. Single Nucleotide Polymorphisms (SNPs)). Current recommendations are for genetic association studies to duplicate genotype 5-10% of the individuals in the study. Recently, methods were proposed to include duplicate genotype data into genetic tests of association. However, no effort was made to evaluate whether or not gathering duplicate genotype data is cost-effective. We will evaluate the cost-effectiveness of gathering duplicate genotype data by examining power of sample designs which gather duplicate (or higher replicate) genotype data versus those that don't, on a fixed budget. In a similar manner we will consider the cost-effectiveness of obtaining conditional duplicate genotype data. Conditional duplicate genotype data is obtained by duplicate genotyping some individuals but at different rates, dependent upon the first observed genotype. We will also evaluate conditional double sampling, whereby fractions of individuals are sequenced (a near perfect method of genotyping) at rates dependent on the observed SNP genotype. We will synthesize these design recommendations with recommendations for the cost-effective implementation of double sampling. Double sampling involves sequencing a random fraction of individuals. Additionally, we will consider the cost- effectiveness of using classification methods which create informative missing data and demonstrate how informative missing data can be utilized in related tests of association. All design recommendations will be integrated into freely available web-tools so that researchers can quickly assess the cost-effectiveness of these alternative design strategies for their study. Research conclusions will be developed mathematically, confirmed via computer simulation and demonstrated on data from actual genetic association studies. Additionally, all research will be conducted with the active involvement of undergraduate research students. The number of research efforts seeking to find genetic variants that predispose to human disease via genetic association studies has grown significantly since the completion of both the Human Genome Project and the International HapMap Project. In this research we consider alternate sample design methodologies for genetic association studies, with the goal of maximizing statistical power for testing genotype-phenotype association. Maximizing statistical power will allow researchers to more quickly and efficiently identify genetic variants predisposing individuals to complex human diseases.
Showing the most recent 10 out of 19 publications