Recently, genome-wide association studies using single nucleotide polymorphisms (SNPs) have gained some success in detecting genetic variants associated with diseases. Copy number variation (CNV) is another widespread characteristic of the human genome that has been shown to be related to various human phenotypes. The ongoing HapMap project that is constructing a database of validated CNVs will provide valuable information for studying associations of CNVs with disease risk, the effects of CNVs on response to drug treatment, and the role of structural variation in human evolution. However, limited by the available statistical methods, current practice in studies to detect associations between human diseases and genetic variants is separate calling of SNP genotypes and CNVs followed by separate analyses. Two studies published in Nature last year (Korn et al, 2008;McCaroll et al., 2008) have suggested that combining SNP allele and copy number information can lead to accurate inference of both copy numbers and genotypes and thus affect the results of the association studies. New methods are greatly needed for simultaneous inference of SNP and CNV and testing of their joint influences on complex diseases. We therefore propose to develop novel statistical and computational methods and software for whole-genome association studies using integrated CNV and SNP information.
The specific aims of this project are (1) to develop calling algorithms for allele-specific copy numbers that integrate copy number and SNP allele information, (2) to develop single-locus and multi-locus methods for joint genotype and copy number association testing, (3) to develop haplotype association methods incorporating copy numbers information, and (4) to release a user-friendly software package in R. The proposed methods will be evaluated through simulations as well as with real data, which will include (but will not be limited to) the publicly available HapMap data and human data sets from our collaborators studying genetic effects on left ventricular hypertrophy, triglycerides, and blood pressure. The proposed methods will greatly facilitate the study of human genetic variations and their association with complex diseases.

Public Health Relevance

The proposed methods will aid in the discovery of genetic variants responsible for complex human diseases, will help us to better understand these diseases, and finally will enhance our ability to prevent, diagnose, and treat these diseases.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Krasnewich, Donna M
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pennsylvania
Biostatistics & Other Math Sci
Schools of Medicine
United States
Zip Code
Lin, Dongyu; Weinberg, Clarice R; Feng, Rui et al. (2013) A multi-locus likelihood method for assessing parent-of-origin effects using case-control mother-child pairs. Genet Epidemiol 37:152-62
Xu, Yaji; Wu, Yinghua; Song, Chi et al. (2013) Simulating realistic genomic data with rare variants. Genet Epidemiol 37:163-72
Feng, Rui; Wu, Yinghua; Jang, Gun Ho et al. (2011) A powerful test of parent-of-origin effects for quantitative traits using haplotypes. PLoS One 6:e28909