To help to analyze and understand aging-related """"""""complex"""""""" traits that are affected by many genes and environmental factors, we propose to develop three statistical algorithms for the analyses of genome-wide genotyping and high-throughput sequencing studies. Our proposed new statistical methods provide means to analyze additional types of data e.g., mitochondrial DNA (mtDNA) variants from sequencing, or variants on the X chromosome for genome-wide association studies (GWAS) and data with more complicated structures (e.g., multiple related traits). To test these algorithms, we take advantage of the special features of the SardiNIA project (see Annual Report AG000675-07), which has collected longitudinal data for >300 quantitative traits together with the whole-genome genetic data in the founder Sardinia population. To analyze mitochondrial DNA variation and its possible effects on aging-related traits, the genotype-calling and analytic programs developed for nuclear DNA are not adequate, because each cell has 100-10,000 mtDNA copies that can vary at any site (heteroplasmy), and can therefore have each of the 4 bases at any position in various copies. We have developed an algorithm that is targeted to identify variants in mtDNA;it incorporates the sequencing error rate of each base in each sequence read and is flexible to allow for different allele fractions at a variant site across all individuals. Our procedure is further adapted to the circular mitochondrial genome, a key difference from the linear chromosomes assumed by most mapping algorithms. We are assessing homoplasmies and heteroplasmies in mtDNA sequences of lymphocytes from whole-genome sequencing of 2,000 SardiNIA Project participants. The results to date provide information about mtDNA haplogroups and the inheritance of homo- and heteroplasmies in Sardinia. As expected, mothers and their children share essentially all homoplasmies but a lesser proportion of heteroplasmies. The overall heteroplasmy increases with age, but the slope is small in the estimates thus far, yielding an average increase of 1 heteroplasmy between ages 20 and 80 with the minor allele fraction threshold at 4%. To take advantage of correlations between related traits and hence to increase statistical power for genetic studies, we are developing a method to search for genes/variants that have pleiotropic effects on multiple quantitative traits. Our method projects a group of related traits and a set of SNPs (defined, for example, by gene boundaries) into their respective orthogonal principal components such that we are able to jointly test the association between traits and SNPs using a new summary statistic. Because of the orthogonality, the significance of association can be efficiently evaluated using simulations under the null hypothesis instead of more computationally intensive permutations. We apply our method to the SardiNIA project data, where we have first focused on three lipid traits HDL, LDL, and Triglycerides, and use RefSeq gene boundaries to group SNPs into gene units. To show that our method is able to identify genes that are associated with more than one blood lipid trait, we use results published in Teslovich et al., Nature (2010), in which 95 loci for blood lipids were identified and 21 loci were associated with multiple lipid traits. We are able to show that our method can enrich those pleiotropic loci: for example, when the 95 loci are ranked by our method, the top 20% loci include 40% of the pleiotropic loci in the original study. Our method is able to detect joint associations between multiple traits and multiple genetic variants. It will have significant advantages when a gene has moderate effects on multiple traits that a standard GWAS is unable to detect.
Showing the most recent 10 out of 14 publications