The computational comparison of variations among genomic sequences sampled from a large number of unrelated individuals in a population is a very powerful way to address both fundamental and applied biological questions. The best known questions concern the location of genes and mutations that contribute to disease incidence and to variation in economically important traits. Nature and history have created a large variety of mosaic genomes among individuals and populations who can be studied today. The grand challenge is to exploit these natural experiments by finding patterns in and among the different mosaic genomes (the genotypes) that have significant and biologically meaningful associations with important traits (the phenotypes) of interest. With genomic level technologies, the needed data on population level variation is becoming available but challenging problems remain in the analysis of the data.

This proposal focuses on novel, critical computational problems that arise in population-scale genomic data acquisition and analysis. The algorithmic problems of concern are divided into biology-based problems and technology-based problems, but the interplay of technology and biology is critical. This research will be conducted by an interdisciplinary group of computer scientists, mathematicians and geneticists. The main biology-based algorithmic problems concern the computational deduction of the frequency, location, and the full temporal structure of historical recombination, gene-conversion and lateral gene- transfer. The main technology-based problems are concerned with important problems of missing data or error-prone data, and with the deduction of haplotype data from genotype data. The problems of missing or error-prone data are approached through the use of optimization techniques. The haplotype deduction problem uses a variety of techniques, based on exploiting more complete and realistic biological models of how the underlying haplotypes have evolved. One element is the incorporation of recombination into the models, connecting previous work on constructing histories of recombinations with work on deducing haplotypes from genotypes.

The algorithms and software will allow biologists to better understand the history and role of recombination, gene-conversion and lateral gene transfer, and to cope with problems in the data. As just two examples, the tools could facilitate the tasks of gene finding by association mapping, and in understanding how lateral gene-transfer helps bacteria to rapidly develop antibiotic resistance. The impact will be enhanced by our associated educational and outreach efforts.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0513910
Program Officer
Sylvia J. Spengler
Project Start
Project End
Budget Start
2005-08-01
Budget End
2010-07-31
Support Year
Fiscal Year
2005
Total Cost
$699,989
Indirect Cost
Name
University of California Davis
Department
Type
DUNS #
City
Davis
State
CA
Country
United States
Zip Code
95618