Recently, genome-wide association studies using single nucleotide polymorphisms (SNPs) have gained some success in detecting genetic variants associated with diseases. Copy number variation (CNV) is another widespread characteristic of the human genome that has been shown to be related to various human phenotypes. The ongoing HapMap project that is constructing a database of validated CNVs will provide valuable information for studying associations of CNVs with disease risk, the effects of CNVs on response to drug treatment, and the role of structural variation in human evolution. However, limited by the available statistical methods, current practice in studies to detect associations between human diseases and genetic variants is separate calling of SNP genotypes and CNVs followed by separate analyses. Two studies published in Nature last year (Korn et al, 2008;McCaroll et al., 2008) have suggested that combining SNP allele and copy number information can lead to accurate inference of both copy numbers and genotypes and thus affect the results of the association studies. New methods are greatly needed for simultaneous inference of SNP and CNV and testing of their joint influences on complex diseases. We therefore propose to develop novel statistical and computational methods and software for whole-genome association studies using integrated CNV and SNP information.
The specific aims of this project are (1) to develop calling algorithms for allele-specific copy numbers that integrate copy number and SNP allele information, (2) to develop single-locus and multi-locus methods for joint genotype and copy number association testing, (3) to develop haplotype association methods incorporating copy numbers information, and (4) to release a user-friendly software package in R. The proposed methods will be evaluated through simulations as well as with real data, which will include (but will not be limited to) the publicly available HapMap data and human data sets from our collaborators studying genetic effects on left ventricular hypertrophy, triglycerides, and blood pressure. The proposed methods will greatly facilitate the study of human genetic variations and their association with complex diseases.

Public Health Relevance

The proposed methods will aid in the discovery of genetic variants responsible for complex human diseases, will help us to better understand these diseases, and finally will enhance our ability to prevent, diagnose, and treat these diseases.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM088566-03
Application #
8247786
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Krasnewich, Donna M
Project Start
2010-05-07
Project End
2014-03-31
Budget Start
2012-04-01
Budget End
2013-03-31
Support Year
3
Fiscal Year
2012
Total Cost
$347,652
Indirect Cost
$123,689
Name
University of Pennsylvania
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
042250712
City
Philadelphia
State
PA
Country
United States
Zip Code
19104
Lin, Dongyu; Weinberg, Clarice R; Feng, Rui et al. (2013) A multi-locus likelihood method for assessing parent-of-origin effects using case-control mother-child pairs. Genet Epidemiol 37:152-62
Xu, Yaji; Wu, Yinghua; Song, Chi et al. (2013) Simulating realistic genomic data with rare variants. Genet Epidemiol 37:163-72
Feng, Rui; Wu, Yinghua; Jang, Gun Ho et al. (2011) A powerful test of parent-of-origin effects for quantitative traits using haplotypes. PLoS One 6:e28909