Family-based genetic studies use a number of statistical techniques to understand how genetic information flows from parent to offspring. The investigators will develop computationally efficient algorithms for identifying genetic information from family data using a technique known as a Hidden Markov model. This model has demonstrated considerable success in a broad range of scientific disciplines, including areas such as speech recognition in telephone conversations and face identification from sequences of images. When the complexity of the problem increases, computational algorithms confront challenges with finding the solution of the model in realistic and practical time frames. The investigators' research on novel computational and statistical approaches will provide efficient algorithms, and software tools, of broad scientific relevance. In their work the tools developed will yield a more powerful approach for finding genes underlying complex diseases. The collaborative team will also directly apply their techniques to forestry genetic data describing a multi-year plant-breeding program. Educationally, the trainees involved will be integrated into a vibrant interdisciplinary research team, gaining exposure to techniques for the solution of real world problems.
Multi-dimensional Markov processes are ubiquitous in the real world. Dependencies in interacting particle systems, images, videos, digitized documents, and gene transmission are all examples of multi-dimensional Markov processes. A Markov process is a random process (i.e., a statistical phenomenon in which the possible outcomes of a sequence of events or variables vary) in which the prediction of the future state is made just using the information of the current state, independent of both past history and unknown future states. This idea generalizes for higher dimensions. For example, gene transmission from parent to child is a two-dimensional Markov process, the determination of the future state, the child, depends on the neighbors in two dimensions, the parents. Most cases of interest, however, are hidden Markov processes, in which the current state of a variable cannot be observed, but must be inferred by consideration of all possible future states given all possible current states. As the number of quantities to be determined increases, the computational demand in both computer memory and computing time increases dramatically. To apply the concept, therefore, to many important real applications requires the development of novel and powerful computational algorithms. This project is focused on the design of a novel software package, specifically designed for the solution of hidden Markov models. The techniques developed will be relevant for the solution of problems from a broad array of disciplines, not only for wider family based genetic studies but also areas such as text analysis from images. Thus, this project will increase the computational ability to solve real-world problems across many engineering, geosciences and biological disciplines, with commensurate potential positive societal impact. In this case an interesting application arises in plant breeding through the collaboration with experts in forestry science.