Nearly a decade ago, Risch and Merikangas suggested the possibility of conducting genome-wide association scans. Although the cost was prohibitive at the time, they predicted that these technological barriers would eventually be overcome. With the advent of 500K chip-based or bead technologies, at a cost of about 0.2 cents per genotype, that prediction has now become a reality. Nevertheless, these will still be expensive studies to conduct and there remain numerous methodological challenges to efficient and valid design of such studies. To address these issues, we convened a panel of 165 investigators from around the world at USC in April 2005. These discussions highlighted a number of study design and statistical analysis problems that we propose to continue working on as part of this Cooperative Agreement. Our team is also involved in conducting and planning several such studies for such conditions as breast, colon, and prostate cancer and age-related macular degeneration. We anticipate that this research will inform the conduct of these studies and be motivated by the needs of these projects (as well as the many others at other institutions). In particular, we propose to focus on the following methodological issues: (1) tag SNP selection and haplotype-based methods incorporating both case-control association and case-case sharing comparisons; (2) multiple testing procedures for multistage sampling designs, including hierarchical models for prioritizing SNPs for further consideration using external genomic data; (3) family- vs. population-based studies and allowance for population stratification and admixture; and (4) gene-gene and gene-environment interactions. To investigate these problems, we will apply the methods to real data from our own studies (the Multiethnic Cohort and the Los Angeles Latino Eye Study of age-related macular degeneration), as well as data available in public databases such as the HapMap Project. Since most genome-wide datasets are limited to relatively small samples and are not connected to any phenotype information, we will develop ways for using these real data to generate large populations that would contain realistic degrees of genetic diversity that would look like those seen in these small samples. We will then sample from these populations to simulate replicate case-control data sets under known phenotype models to investigate the statistical performance of alternative study designs and analysis methods. ? ? ? ?
Showing the most recent 10 out of 20 publications