This project aims to explore the genetic component of cardiovascular diseases contributed by repeats in human genomes. Cardiovascular diseases, the leading cause of death in the US, have sizeable genetic components which remain unexplained by association studies. Recent studies of coronary artery disease, ischemic stroke, and atrial fibrillation explain less than a quarter of the heritability in these diseases observed in family studies. The gap between observed and explained heritability has persisted despite large increases in the sample sizes in genome-wide association studies. This ?missing heritability? hinders the understanding of the genetic basis for cardiovascular disease, and the development of genetically-informed therapies. A potential contributor to the missing heritability is structural variation in genomes, which is usually omitted from association studies. Genetic association studies typically focus on single nucleotide polymorphisms (SNPs)?i.e., single base pair changes?and do not account for structural variants?i.e., mutations affecting large stretches of the genome. Structural variants are difficult to resolve using short-read sequencing or array- based genotyping technologies. While structural variants are rarer than SNPs, they are responsible for more base pairs of variation per individual due to their large length. The proposed research program will quantify and characterize the cardiovascular impact of variable number tandem repeats (VNTRs), an understudied class of structural variants in which a specific nucleotide sequence is repeated a varying number of times in different individuals. The human genome contains thousands of VNTR regions, a few of which are already known to influence common diseases. The proposed research will leverage existing genotyped cohorts consisting of hundreds of thousands of individuals to conduct a systematic study of the role of VNTR length variation in cardiovascular diseases. These cohorts are genotyped using arrays which do not directly assay VNTR lengths. The PI will develop statistical methods to impute VNTR lengths into large cohorts, and characterize the contribution of VNTR variation to cardiovascular disease. Additionally, the PI will refine the genetic architecture of Lipoprotein(a), a protein encoded by a gene with a VNTR whose length is known to influence cardiovascular disease risk.
The proposed work will investigate the role that repetitive regions in the human genome play in cardiovascular disease risk. New statistical and computational approaches will be developed to estimate the lengths of repeats in the genome in existing genotyped cohorts, lengths which are hard to ascertain directly from the available DNA microarray data. The fraction of heritability of cardiovascular disease which is explained by variance in repeat lengths will be quantified, shedding light on the genetic bases for these diseases.