Recent genetic evidence shows that the human species is a smaller family than previously thought: arbitrary groups of people include many pairs of hidden relatives, that unknown to them share a recent ancestor a few generations back. The investigators develop novel computational methods to reveal these remote family ties, from large scale datasets that contains billions of snippets of genetic information. This research effort will use this information to compile a genealogy of thousands of otherwise-unrelated individuals.

Individuals with a common ancestor have a chance to share one or more long fragments of DNA, that are identical-by-descent (IBD) over several megabases. High throughput genetic data from commercial arrays of 300,000-1,000,000 Single Nucleotide Polymorphisms (SNPs) can therefore detect IBD with certainty. The computational challenge in large scale data of tens of thousands of individuals typed for these array is making all the quadratic number of pairwise comparisons in search for IBD. The investigators develop a per-locus hashing algorithm, that detects identical haplotypes across all O(n2) sample pairs, but operates in linear time. They are using this methodology to map hidden relatedness across publicly available samples, creating a useful tool for population-based linkage analysis in unrelateds, as well as inferences on population genetics of recent generations.

Project Start
Project End
Budget Start
2008-09-01
Budget End
2012-08-31
Support Year
Fiscal Year
2008
Total Cost
$247,694
Indirect Cost
Name
Columbia University
Department
Type
DUNS #
City
New York
State
NY
Country
United States
Zip Code
10027