To help to analyze and understand aging-related complex traits that are affected by many genes and environmental factors, we propose to develop three statistical algorithms for the analyses of genome-wide genotyping and high-throughput sequencing studies. Our proposed new statistical methods provide means to analyze additional types of data e.g., mitochondrial DNA (mtDNA) variants from sequencing, or variants on the X chromosome for genome-wide association studies (GWAS). To test these algorithms, we take advantage of the special features of the SardiNIA project (see Annual Report AG000675), which has collected longitudinal data for >300 quantitative traits together with the whole-genome genetic data in the founder Sardinia population. To analyze mitochondrial DNA variation and its possible effects on aging-related traits, the genotype-calling and analytic programs developed for nuclear DNA are not adequate, because each cell has 100-10,000 mtDNA copies that can vary at any site (heteroplasmy), and can therefore have each of the 4 bases at any position in various copies. We have developed an algorithm that is targeted to identify variants in mtDNA; it incorporates the sequencing error rate of each base in each sequence read and is flexible to allow for different allele fractions at a variant site across all individuals. Our procedure is further adapted to the circular mitochondrial genome, a key difference from the linear chromosomes assumed by most mapping algorithms. We are assessing homoplasmies and heteroplasmies in mtDNA sequences of leukocytes from whole-genome sequencing of 2,000 SardiNIA Project participants. The results to date provide information about mtDNA haplogroups and the inheritance of homo- and heteroplasmies in Sardinia. As expected, mothers and their children share essentially all homoplasmies but a lesser proportion of heteroplasmies. The overall heteroplasmy increases with age, but the slope is small in the estimates thus far, yielding an average increase of 1 heteroplasmy between ages 20 and 80 with the minor allele fraction threshold at 4%. We have also made a sequence-based estimate of mtDNA copy number based on the observed ratio of sequence coverage between mtDNA and autosomal DNA. We find that mtDNA copy number averages 110 copies/leukocyte and is 54% heritable, implying substantial genetic regulation of the level of mtDNA. Copy numbers also decrease modestly but significantly with age, and females on average have significantly more copies than males. The mtDNA copy numbers are significantly associated with waist circumference and waist-hip ratio, but not with body mass index, indicating an association with central fat distribution. We are currently performing GWAS of mtDNA copy number, aiming to identify genetic variants that regulate mtDNA levels. To accurately analyze X-linked genetic variants in GWAS, we propose to use RNA-seq data to help identify X-inactivated genes and genes escaping X-inactivation before performing GWAS accordingly. In preliminary work, we have used mRNA-Seq data from 80 skin samples available from a genetic study conducted at University of Michigan. We are able to predict gene inactivation status with relatively high accuracy by comparing our predictions to results from Carrel and Willard, Nature (2005) as a gold standard (in their work, inactivation status of a gene was determined experimentally), and improve the prediction of escaping genes by 5-fold from 15%, estimated by Carrel and Willard, Nature (2005) with an incomplete list of X-linked genes, to 75%. To investigate and improve the prediction of phenotypes, which is a major goal in personalized medicine, in ongoing work, we are implementing linear mixed models to evaluate the prediction accuracy of a certain phenotype with increasingly more comprehensive genetic data (e.g., from HapMap imputed genotypes and sequencing-based genetic data), together with demographic data (e.g., family structure) and other related phenotypic traits.
Showing the most recent 10 out of 14 publications