In order to understand the genetic factors underlying complex diseases, disease association studies are performed, where cases and controls are collected and their DNA variants (SNPs) are compared. One of the main growing concerns in disease association studies is that population substructure may raise spurious discoveries, especially with the recent advances in technology where thousands of individuals are genotyped over the whole genome. In particular, if the cases and controls are collected from populations with different ethnic composition, the differences between the SNP variations in the two groups may be due to the population structure and not due to the disease. The main goal of this project is to develop efficient and accurate tools for population stratification methods, under different scenarios, and to integrate those tools in case control studies. The existing methods are either too slow or do not accurately predict the population substructure, and they lack rigorous analysis that proves their correctness. The new algorithms will consider cases in which the population is a collection of populations, populations in which individuals may have a mixed ancestry and use haplotype correlations to improve the algorithms. The impact of this proposed project stems from the fact that many current results reported in association studies are spurious due to population substructure, resulting in an incorrect understanding of the biological mechanisms causing a disease. The algorithms developed in this project will help to avoid such spurious results, thus improving our understanding of human biology and disease. The project involves the training of a graduate student and six summer students. The collaborative nature of the project will expose the students to the medical and genetics worlds, and at the same, it will improve their abilities to design and implement complex algorithmic problems. The software developed in this project will be integrated with the existing publicly available webserver HAP, which was developed by the PIs and has been used more than 9000 by geneticists worldwide.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0713254
Program Officer
Sylvia J. Spengler
Project Start
Project End
Budget Start
2007-08-15
Budget End
2010-07-31
Support Year
Fiscal Year
2007
Total Cost
$449,962
Indirect Cost
Name
International Computer Science Institute
Department
Type
DUNS #
City
Berkeley
State
CA
Country
United States
Zip Code
94704