A cornerstone of research in molecular ecology is the reconstruction of family groups (kinship analysis). Understanding how individuals in free-living populations are related to each other provides the best opportunity to study many important biological processes, ranging from sexual selection to patterns of dispersal and recruitment. Recent advances in molecular DNA technologies and computational methods have made these studies possible. However, many conceptual and computational challenges remain and need to be addressed in order to advance these studies. To date, existing research work on kinship analysis has primarily focused on computational methods that address a single relationship, such as parentage assignment or reconstruction of full sib groups. Inclusion of multiple objectives, such as half-sib reconstruction with minimum parentage assignment, or hierarchy over multiple generations, makes formulation of the underlying computational problem extremely challenging, and simple extensions of previous methods do not address in a practical, scalable, and robust manner the problem of kinship reconstruction for data sets that include multiple generations of species or involve multiple optimization functions.

The goal of the proposed research is to design robust, parsimonious, and versatile computational approaches for inferring multi-generation kinship relationships in wild populations from multiallelic markers. Parsimony assumption is fundamental to these approaches as it requires no prior knowledge, assumptions about sampling methodology, or existence of models, which is the case for most free-living populations. The diverse tasks of this project include formulating computational kinship inference problems based on existing biological studies, analyzing computational complexity of and providing solutions to the resulting combinatorial optimization problems, and designing robust, scalable and efficient high performance implementations. The resulting computational methods will be evaluated on datasets collected from existing biological studies and will be deployed to the biological community through the Kinalyzer web-based service, currently actively used for sibship inference only.

The research proposed in this project will greatly impact diverse application areas including funda- mental research in combinatorial optimization and data mining, and within biology, areas as diverse as behavioral ecology, evolutionary genetics, conservation, forensics, and epidemiology. The multidisci- plinary nature of the project and the research team will enhance curriculum design of related areas and introduce new cross-disciplinary courses. This cohesive, multidisciplinary project will provide training opportunities in biology, operation research, algorithms analysis, bioinformatics and high performance computing, within a single application framework. The project will leverage the diverse scientific ex- pertise and extensive mentoring experience of the team to foster a true interdisciplinary collaboration and to provide a thriving environment for a new generation of interdisciplinary scientists.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1064681
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2011-08-01
Budget End
2017-07-31
Support Year
Fiscal Year
2010
Total Cost
$954,730
Indirect Cost
Name
University of Illinois at Chicago
Department
Type
DUNS #
City
Chicago
State
IL
Country
United States
Zip Code
60612