To help to analyze and understand aging-related complex traits that are affected by many genes and environmental factors, we propose to develop three statistical algorithms for the analyses of genome-wide genotyping and high-throughput sequencing studies. Our proposed new computational tools provide means to analyze additional types of data e.g., to estimate mitochondrial DNA (mtDNA) copy number efficiently from whole-genome sequences, or to identify mtDNA variants from whole-exome sequencing. To test these algorithms, we take advantage of the special features of the SardiNIA project (see Annual Report AG000675), which has collected longitudinal data for >600 quantitative traits together with the whole-genome genetic data in the founder Sardinia population. In the past year, for example, this has involved us in epidemiological analyses for frailty-related traits (walking speed, grip strength and bone density) and hearing capacity as a function of age and sex. In order to conduct analyses on large-scale consortium data to study mtDNA copy number as a critical determinant of mitochondrial function and a potential biomarker for disease, we are developing an ultra-fast program to estimate mtDNA copy number from whole-genome sequencing (WGS) data. Previously we and other groups have shown that the mtDNA copy number per cell can be directly estimated from WGS. The computation is based on the rationale that sequencing coverage should be proportional to the underlying DNA copy number for autosomal and mitochondrial DNA, and most computing time is spent calculating the average autosomal DNA coverage across 3 billion bases. That makes analyzing tens of thousands of available samples very slow. We are developing fastMitoCalc, a program that takes advantage of the indexing of sequencing alignment files and uses a randomly selected small subset (0.1%) of the nuclear genome to estimate autosomal DNA coverage accurately. It is more than 100 times faster than current programs. Consequently, a computer cluster with 50 CPUs can now finish analyzing 10,000 low-pass sequencing samples in about 3 hours rather than the 25 days required originally. Using fastMitoCalc, it becomes much more feasible now to analyze hundreds of thousands of genomes to test for association of mtDNA copy number with quantitative traits or nuclear variants. In order to take advantage of the available large-scale whole-exome sequencing (WES) data sets, we are developing and testing algorithms that can use off-target sequences from WES to identify mtDNA variants. WES technology uses exome capture kits to pre-select the protein-coding DNA regions (targeted regions) of the genome, and then carries out sequencing reactions. Although one might expect that the mtDNA genome would not be substantially covered by the sequencing reactions, multiple studies have shown that off-target mtDNA sequences can be reliably obtained from WES that fully cover the mtDNA genome. We propose to extract all the sequence reads aligned to the mtDNA reference genome, including off-target reads, and then use our program mitoCaller to identify mtDNA variants. To validate the variant calling from WES data, we will take advantage of the individuals in 1000 Genomes Project that were sequenced by both WES and WGS and use the results called from WGS as the standard. We expect very high concordance between the variant genotypes identified by WES and WGS. To investigate and improve the prediction of phenotypes, which is a major goal in personalized medicine, in ongoing work, we are implementing linear mixed models to evaluate the prediction accuracy of a certain phenotype with increasingly more comprehensive genetic data (e.g., from HapMap imputed genotypes and sequencing-based genetic data), together with demographic data (e.g., family structure) and other related phenotypic traits.

Agency
National Institute of Health (NIH)
Institute
National Institute on Aging (NIA)
Type
Investigator-Initiated Intramural Research Projects (ZIA)
Project #
1ZIAAG000693-06
Application #
9345254
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
6
Fiscal Year
2016
Total Cost
Indirect Cost
Name
Aging
Department
Type
DUNS #
City
State
Country
Zip Code
Moore, Ann Zenobia; Ding, Jun; Tuke, Marcus A et al. (2018) Influence of cell distribution and diabetes status on the association between mitochondrial DNA copy number and aging phenotypes in the InCHIANTI study. Aging Cell 17:
Qian, Yong; Butler, Thomas J; Opsahl-Ong, Krista et al. (2017) fastMitoCalc: an ultra-fast program to estimate mitochondrial DNA copy number from whole-genome sequences. Bioinformatics 33:1399-1401
Okbay, Aysu (see original citation for additional authors) (2016) Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533:539-42
van den Berg, Stéphanie M; de Moor, Marleen H M; Verweij, Karin J H et al. (2016) Meta-analysis of Genome-Wide Association Studies for Extraversion: Findings from the Genetics of Personality Consortium. Behav Genet 46:170-82
Okbay, Aysu; Baselmans, Bart M L; De Neve, Jan-Emmanuel et al. (2016) Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses. Nat Genet 48:624-33
Ding, Jun; Sidore, Carlo; Butler, Thomas J et al. (2015) Assessing Mitochondrial DNA Variation and Copy Number in Lymphocytes of ~2,000 Sardinians Using Tailored Sequencing Analysis Tools. PLoS Genet 11:e1005306
Terracciano, Antonio; Strait, James; Scuteri, Angelo et al. (2014) Personality traits and circadian blood pressure patterns: a 7-year prospective study. Psychosom Med 76:237-43
Pelosi, Emanuele; Omari, Shakib; Michel, Marc et al. (2013) Constitutively active Foxo3 in oocytes preserves ovarian reserve in mice. Nat Commun 4:1843
Hek, Karin; Demirkan, Ayse; Lahti, Jari et al. (2013) A genome-wide association study of depressive symptoms. Biol Psychiatry 73:667-78
Meirelles, Osorio D; Ding, Jun; Tanaka, Toshiko et al. (2013) SHAVE: shrinkage estimator measured for multiple visits increases power in GWAS of quantitative traits. Eur J Hum Genet 21:673-9

Showing the most recent 10 out of 14 publications