Copy number variants (CNV) are common and can be measured on a genomic scale using high throughput genotyping platforms. However, copy number estimates from existing algorithms are often inaccurate and imprecise. During the mentored phase of this Award, first generation algorithms for the locus-level estimation of copy number and hidden Markov models to identify regions of copy number gain and loss were developed. During the ROO phase these algorithms will be further improved by generalizations that broaden the scope of the problems and final versions of open source software. Specific methodologic areas areas that will be targeted during the ROO phase of this Award include improved estimates of uncertainty that distinguish between outliers and CNV and statistical models that 'borrowstrength' across loci and across samples. In parallel with the development of these methods, we will explore new approaches for CNV-phenotype inference that accommodate features of the study design (e.g., unrelated subjects versus case-parent trios) and biological characteristics of the disease. For instance, we expect that extensions of segmentation methods and principal component analysis will benefit the study of cancer genomes. Activities during the K99 phase of this Award will be helpful for achieving these goals. First, 1 successfully identified a tenure-track faculty position at the rank of Assistant Professor. Faculty in the Department of Oncology are supportive of the development of these methods. Secondly, I developed new collaborations during the K99 phase that will spur the development of statistical methods in the above areas. Finally, I participated in workshops, statistical conferences, and completed coursework in subject-relevant areas that will be helpful as I transition to the independent phase. I look forward to competing for R01 funding opportunities as new technologies and applications for genomic research emerge.

Public Health Relevance

Approximately 10% of the genome is thought to be variable in copy number. Statistical approaches to measure this variation on a genomic scale and assess its contribution to physical traits are needed.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Transition Award (R00)
Project #
Application #
Study Section
Special Emphasis Panel (NSS)
Program Officer
Brooks, Lisa
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Johns Hopkins University
Internal Medicine/Medicine
Schools of Medicine
United States
Zip Code
Scharpf, Robert B; Mireles, Lynn; Yang, Qiong et al. (2014) Copy number polymorphisms near SLC2A9 are associated with serum uric acid concentrations. BMC Genet 15:81
Scharpf, Robert B; Ruczinski, Ingo; Carvalho, Benilton et al. (2011) A multilevel model to address batch effects in copy number estimation using SNP arrays. Biostatistics 12:33-50
Halper-Stromberg, Eitan; Frelin, Laurence; Ruczinski, Ingo et al. (2011) Performance assessment of copy number microarray platforms using a spike-in experiment. Bioinformatics 27:1052-60