Considerable genetic heterogeneity must be expected with any complex disease such as oral clefts, where rare variants could explain part of the so-called missing heritability. In extended families with multiple affected members, there is a high probability that several of these affected relatives carry the same rare, high penetrance risk variant if such a variant is found in one affected individual. We recently developed a general framework for calculating rare nucleotide variant sharing probabilities when two or more affected subjects from an extended family are sequenced, and show how information from multiple families can be combined by calculating a p-value as the sum of the probabilities of sharing events equal or more extreme. We also examined the impact of unknown relationships (i.e. cryptic relatedness), and proposed methods to approximate sharing probabilities based on empirical estimates of kinship between family members obtained from genome-wide marker data. We applied this method to the whole exome sequence data in a study of 55 multiplex cleft families with apparent non-syndromic forms of oral clefts from four distinct populations, and identified a genome-wide significant rare variant in the gene ADAMTS9 shared by affected relatives in three Indian families. An additional, more targeted analysis focused on 348 oral cleft candidate genes identified an additional potentially damaging SNV in CDH1 in a single family. In this application, we propose to extend this approach to rare DNA copy number variants, to implement an open source software package for genomic array and sequencing data, scalable and suitable for reproducible genomic research, and to use existing oral cleft data to identify novel and rare high penetrance genetic variants underlying oral cleft risk.
In extended families with multiple affected members there is a high probability that several affected relatives carry the same rare, high penetrance risk variant if such a variant is found in one affected individual. We recently developed a general framework for calculating rare nucleotide variant sharing probabilities when two or more affected subjects from an extended family are sequenced (either with whole exome or whole genome), and show how information from multiple families can be combined by calculating the sum of the probabilities of sharing events equal or more extreme. The goal of this application to develop new methods to infer DNA copy number variants from sequencing and array data in extended multiplex families, to implement an open source software package based on these new methods of assessing variant sharing for the analysis of genomic data in these families, and to use existing oral cleft data to identify novel and rare high penetrance genetic variants underlying ora cleft risk.