Our long-term objective is to develop an efficient, extensible, modular, and accessible software toolbox that facilitates statistical methods for analyzing complex pedigrees. The toolbox will consist of novel algorithms that extend state of the art algorithms from graph theory, statistics, artificial intelligence, and genetics. This tool will enhance capabilities to analyze genetic components of inherited diseases.
The specific aim of this project is to develop an extensible software system for efficiently computing pedigree likelihood for complex diseases in the presence of multiple polymorphic markers, and SNP markers, in fully general pedigrees taking into account qualitative (discrete) and quantitative traits and a variety of disease models. Our experience shows that by building on top of the insight gained within the last decade from the study of computational probability, in particular, from the theory of probabilistic networks, we can construct a software system whose functionality, speed, and extensibility is unmatched by current linkage software. We plan to integrate these new methods into an existing linkage analysis software, called superlink, which is already gaining momentum for analyzing large pedigrees. We will also continue to work with several participating genetic units in research hospitals and improve the software quality and reliability as we proceed with algorithmic improvements. In this project we will develop novel algorithms for more efficient likelihood calculations and more efficient maximization algorithms for the most general pedigrees. These algorithms will remove redundancy due to determinism, use cashing of partial results effectively, and determine close-to-optimal order of operations taking into account these enhancements. Time-space trade-offs will be computed that allow to use memory space in the most effective way, and to automatically determine on which portions of a complex pedigree exact computations are infeasible. In such cases, a combination of exact computations with intelligent use of approximation techniques, such as variational methods and sampling, will be employed. In particular we will focus on advancing sampling schemes such as MCMC used in the Morgan program and integrating it with exact computation. A serious effort will be devoted for quality control, interface design, and integration with complementing available software with the active help of current users of Superlink and Morgan. PUBLIC SUMMARY: The availability of extensive DMA measurements and new computational techniques provides the opportunity to decipher genetic components of inherited diseases. The main aim of this project is to deliver a fully tested and extremely strong software package to deliver the best computational techniques to genetics researchers.