Rare variants may be responsible for a significant amount of the uncharacterized genetic risk underlying many diseases. Large cohorts are necessary to have sufficient power to test such variants. However, assessing rare variants with next generation sequencing is still too cost and time prohibitive to be used on a very large scale. We propose an innovative project to impute rare variants using the existing and ever-growing amounts of whole-genome and whole-exome sequence data into an extremely large cohort of 100,000 individuals who have been genotyped at over 650,000 single nucleotide polymorphisms (SNPs). It is well known that the ability to impute a variant depends on the number of individuals carrying that variant in the reference panel, but it is still not clear how well imputation can wor for very rare variants. By combining all the available public reference panels we aim to increase the number of referent subjects 10-fold beyond the 1,092 individuals typically used from the 1000 Genomes Project. We will test the validity of our approach by application to telomere length, which has been measured in the same 100,000 individuals that were genotyped. Telomere length is an important characteristic reflecting cellular aging. It is known to decline with age, and has demonstrated associations with cardiovascular disease and its risk factors, cancer, diabetes, and mortality, but the heritability of telomere length has not been fully explained. Understanding the genetic factors underlying telomere length will lead to a better understanding of telomere biology, with obvious health implications.
Rare variants may explain the missing heritability (genetic risk) of many diseases. We aim to use existing public reference genome sequence data to statistically impute rare variants into an existing cohort of 100,000 well-phenotyped individuals. We will test imputed SNPs for association with telomere length, and they may be examined for association with many other phenotypes in the cohort by other researchers.