Our long-term objective is to develop an efficient, extensible, modular, and accessible software toolbox that facilitates statistical methods for analyzing complex pedigrees. The toolbox will consist of novel algorithms that extend state of the art algorithms from graph theory, statistics, artificial intelligence, and genetics. This tool will enhance capabilities to analyze genetic components of inherited diseases.
The specific aim of this project is to develop an extensible software system for efficiently computing pedigree likelihood for complex diseases in the presence of multiple polymorphic markers, and SNP markers, in fully general pedigrees taking into account qualitative (discrete) and quantitative traits and a variety of disease models. Our experience shows that by building on top of the insight gained within the last decade from the study of computational probability, in particular, from the theory of probabilistic networks, we can construct a software system whose functionality, speed, and extensibility is unmatched by current linkage software. We plan to integrate these new methods into an existing linkage analysis software, called superlink, which is already gaining momentum for analyzing large pedigrees. We will also continue to work with several participating genetic units in research hospitals and improve the software quality and reliability as we proceed with algorithmic improvements. In this project we will develop novel algorithms for more efficient likelihood calculations and more efficient maximization algorithms for the most general pedigrees. These algorithms will remove redundancy due to determinism, use cashing of partial results effectively, and determine close-to-optimal order of operations taking into account these enhancements. Time-space trade-offs will be computed that allow to use memory space in the most effective way, and to automatically determine on which portions of a complex pedigree exact computations are infeasible. In such cases, a combination of exact computations with intelligent use of approximation techniques, such as variational methods and sampling, will be employed. In particular we will focus on advancing sampling schemes such as MCMC used in the Morgan program and integrating it with exact computation. A serious effort will be devoted for quality control, interface design, and integration with complementing available software with the active help of current users of Superlink and Morgan. PUBLIC SUMMARY: The availability of extensive DMA measurements and new computational techniques provides the opportunity to decipher genetic components of inherited diseases. The main aim of this project is to deliver a fully tested and extremely strong software package to deliver the best computational techniques to genetics researchers.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG004175-03
Application #
7652508
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Brooks, Lisa
Project Start
2007-09-13
Project End
2012-06-30
Budget Start
2009-07-01
Budget End
2012-06-30
Support Year
3
Fiscal Year
2009
Total Cost
$363,929
Indirect Cost
Name
University of California Irvine
Department
Type
Other Domestic Higher Education
DUNS #
046705849
City
Irvine
State
CA
Country
United States
Zip Code
92697
Silberstein, Mark; Weissbrod, Omer; Otten, Lars et al. (2013) A system for exact and approximate genetic linkage analysis of SNP data in large pedigrees. Bioinformatics 29:197-205
Su, Ming; Thompson, Elizabeth A (2012) Computationally efficient multipoint linkage analysis on extended pedigrees for trait models with two contributing major Loci. Genet Epidemiol 36:602-11
Markus, Barak; Birk, Ohad S; Geiger, Dan (2011) Integration of SNP genotyping confidence scores in IBD inference. Bioinformatics 27:2880-7
Weissbrod, Omer; Geiger, Dan (2011) Genetic linkage analysis in the presence of germline mosaicism. Stat Appl Genet Mol Biol 10:
Otten, Lars; Dechter, Rina (2011) Finding most likely haplotypes in general pedigrees through parallel search with dynamic load balancing. Pac Symp Biocomput :26-37
Marinescu, R; Dechter, R (2010) Evaluating the impact of AND/OR search on 0-1 integer linear programming. Constraints 15:29-63
Mateescu, Robert; Kask, Kalev; Gogate, Vibhav et al. (2010) Join-Graph Propagation Algorithms. J Artif Intell Res 37:279-328
Mateescu, Robert; Dechter, Rina (2008) Mixed deterministic and probabilistic networks. Ann Math Artif Intell 54:3-51
Thompson, E A (2008) The IBD process along four chromosomes. Theor Popul Biol 73:369-73