A cornerstone of research in molecular ecology is the reconstruction of family groups (kinship analysis). Understanding how individuals in free-living populations are related to each other provides the best opportunity to study many important biological processes, ranging from sexual selection to patterns of dispersal and recruitment. Recent advances in molecular DNA technologies and computational methods have made these studies possible. However, many conceptual and computational challenges remain and need to be addressed in order to advance these studies. To date, existing research work on kinship analysis has primarily focused on computational methods that address a single relationship, such as parentage assignment or reconstruction of full sib groups. Inclusion of multiple objectives, such as half-sib reconstruction with minimum parentage assignment, or hierarchy over multiple generations, makes formulation of the underlying computational problem extremely challenging, and simple extensions of previous methods do not address in a practical, scalable, and robust manner the problem of kinship reconstruction for data sets that include multiple generations of species or involve multiple optimization functions.
The goal of the proposed research is to design robust, parsimonious, and versatile computational approaches for inferring multi-generation kinship relationships in wild populations from multiallelic markers. Parsimony assumption is fundamental to these approaches as it requires no prior knowledge, assumptions about sampling methodology, or existence of models, which is the case for most free-living populations. The diverse tasks of this project include formulating computational kinship inference problems based on existing biological studies, analyzing computational complexity of and providing solutions to the resulting combinatorial optimization problems, and designing robust, scalable and efficient high performance implementations. The resulting computational methods will be evaluated on datasets collected from existing biological studies and will be deployed to the biological community through the Kinalyzer web-based service, currently actively used for sibship inference only.
The research proposed in this project will greatly impact diverse application areas including funda- mental research in combinatorial optimization and data mining, and within biology, areas as diverse as behavioral ecology, evolutionary genetics, conservation, forensics, and epidemiology. The multidisci- plinary nature of the project and the research team will enhance curriculum design of related areas and introduce new cross-disciplinary courses. This cohesive, multidisciplinary project will provide training opportunities in biology, operation research, algorithms analysis, bioinformatics and high performance computing, within a single application framework. The project will leverage the diverse scientific ex- pertise and extensive mentoring experience of the team to foster a true interdisciplinary collaboration and to provide a thriving environment for a new generation of interdisciplinary scientists.