Rare variants may be responsible for a significant amount of the uncharacterized genetic risk underlying many diseases. Large cohorts are necessary to have sufficient power to test such variants. However, assessing rare variants with next generation sequencing is still too cost and time prohibitive to be used on a very large scale. We propose an innovative project to impute rare variants using the existing and ever-growing amounts of whole-genome and whole-exome sequence data into an extremely large cohort of 100,000 individuals who have been genotyped at over 650,000 single nucleotide polymorphisms (SNPs). It is well known that the ability to impute a variant depends on the number of individuals carrying that variant in the reference panel, but it is still not clear how well imputation can wor for very rare variants. By combining all the available public reference panels we aim to increase the number of referent subjects 10-fold beyond the 1,092 individuals typically used from the 1000 Genomes Project. We will test the validity of our approach by application to telomere length, which has been measured in the same 100,000 individuals that were genotyped. Telomere length is an important characteristic reflecting cellular aging. It is known to decline with age, and has demonstrated associations with cardiovascular disease and its risk factors, cancer, diabetes, and mortality, but the heritability of telomere length has not been fully explained. Understanding the genetic factors underlying telomere length will lead to a better understanding of telomere biology, with obvious health implications.

Public Health Relevance

Rare variants may explain the missing heritability (genetic risk) of many diseases. We aim to use existing public reference genome sequence data to statistically impute rare variants into an existing cohort of 100,000 well-phenotyped individuals. We will test imputed SNPs for association with telomere length, and they may be examined for association with many other phenotypes in the cohort by other researchers.

National Institute of Health (NIH)
National Institute on Aging (NIA)
Exploratory/Developmental Grants (R21)
Project #
Application #
Study Section
Genetics of Health and Disease Study Section (GHD)
Program Officer
Guo, Max
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California San Francisco
Internal Medicine/Medicine
Schools of Medicine
San Francisco
United States
Zip Code
Hoffmann, Thomas J; Choquet, Hélène; Yin, Jie et al. (2018) A Large Multiethnic Genome-Wide Association Study of Adult Body Mass Index Identifies Novel Loci. Genetics 210:499-515
Hoffmann, Thomas J; Theusch, Elizabeth; Haldar, Tanushree et al. (2018) A large electronic-health-record-based genome-wide study of serum lipids. Nat Genet 50:401-413
Jorgenson, Eric; Melles, Ronald B; Hoffmann, Thomas J et al. (2016) Common coding variants in the HLA-DQB1 region confer susceptibility to age-related macular degeneration. Eur J Hum Genet 24:1049-55
Shen, Ling; Hoffmann, Thomas J; Melles, Ronald B et al. (2015) Differences in the Genetic Susceptibility to Age-Related Macular Degeneration Clinical Subtypes. Invest Ophthalmol Vis Sci 56:4290-9
Hoffmann, Thomas J; Sakoda, Lori C; Shen, Ling et al. (2015) Imputation of the rare HOXB13 G84E mutation and cancer risk in a large population-based cohort. PLoS Genet 11:e1004930
Hoffmann, Thomas J; Van Den Eeden, Stephen K; Sakoda, Lori C et al. (2015) A large multiethnic genome-wide association study of prostate cancer identifies novel risk variants and substantial ethnic differences. Cancer Discov 5:878-91
Hoffmann, Thomas J; Witte, John S (2015) Strategies for Imputing and Analyzing Rare Variants in Association Studies. Trends Genet 31:556-563