Utilizing large-scale bio bank studies to understand disease and health outcomes requires understanding the fine-scale genetic relationships between individuals. Recent, fine-scale genetic relationships can be detected using short segments that are inherited identical by descent (IBD) from a common ancestor between purportedly ?unrelated? pairs of individuals in a data set. Such IBD segments are a hallmark of cryptic relatedness, which is expected to be ubiquitous in any large-scale human cohort and confounds genotype- phenotype studies by inducing subtle population stratification that lead to false positive associations. At the same time, IBD segments resulting from these relationships capture signal from rare variants and haplotypes that are not directly assayed on genotyping arrays. Understanding IBD variation is thus critical for genome- wide association studies, analyses of heritability, and genetic risk prediction. Here, we propose novel computational methods to efficiently identify pairwise IBD segments for millions of individuals and accurately quantify their detailed coalescent distributions.
Understanding genetic relatedness between individuals has important applications for disease association studies, phenotypic prediction, and estimates of natural selection. We propose methods to efficiently identify shared segments in large-scale data.