The human genome exhibits extensive copy number variation (CNV). We today understand only the simplest form of copy number variation (CNV) - simple deletions and duplications. A large, functionally important and still-uncharacterized form of genome structural variation is multi-allelic copy-number variation (mCNV), involving genes and other functional elements for which three or more segregating alleles give rise to a wide range of copy numbers (such as 2 to 10) per diploid human genome. mCNVs have been refractory to widely used analysis methods and are not assessed in the genome-scale molecular or statistical approaches used to study genetically complex phenotypes in humans. In this work, we will develop approaches and supporting data sets that enable mCNVs to be routinely and rigorously analyzed for relationship to variation in human phenotypes. We will accurately analyze mCNVs in reference populations, using two new approaches, one computational (based on analysis of available whole-genome sequence data) and one molecular (based on PCR in digitally counted microdroplets) for accurately analyzing mCNVs in cohorts (Aim 1). By analyzing these data in a statistical framework that incorporates information about genotypes, allele frequencies, inheritance, and haplotypes, we will place mCNV alleles onto the haplotype maps created by HapMap and 1000 Genomes, and render mCNVs accessible to genotype imputation to the fullest extent possible (Aim 2). We will deeply characterize mCNVs at ten biomedically important loci, to understand these polymorphisms at the levels of population genetics, mutational rates and histories, and relationships to clinical phenotypes (Aim 3). Finally, we will pilot inexpensive in silico genome-wide association studies for mCNVs based on statistical imputation into existing GWAS data sets (Aim 4). The successful completion of this work will lead to the discovery of relationships between disease risk and gene dosage, helping to reveal the molecular etiology of human disease.

Public Health Relevance

Variation in the human genome influences risk of disease and can be used to find the genes underlying each disease, leading to new ideas for therapies. Many genes can exist in very different numbers of copies (such as 0 to 12) in different peoples'genomes;this form of variation is today not understood well. Our work will help to understand this form of genome variation and enable many human geneticists to find specific genes that relate to each disease.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Brooks, Lisa
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Harvard University
Schools of Medicine
United States
Zip Code
Genovese, Giulio; Kähler, Anna K; Handsaker, Robert E et al. (2014) Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N Engl J Med 371:2477-87
Genovese, Giulio; Handsaker, Robert E; Li, Heng et al. (2013) Mapping the human reference genome's missing sequence by three-way admixture in Latino genomes. Am J Hum Genet 93:411-21
McCarroll, Steven A; Hyman, Steven E (2013) Progress in the genetics of polygenic brain disorders: significant new challenges for neurobiology. Neuron 80:578-87
Genovese, Giulio; Handsaker, Robert E; Li, Heng et al. (2013) Using population admixture to help complete maps of the human genome. Nat Genet 45:406-14, 414e1-2