Fine-scale nucleotide changes, along with genetic recombination, are often cited as the major source of human genetic variation [1, 13, 14]. Less is known about larger scale (>10kb) genomic structural variations. As genomic technologies improve, we are detecting structural variation in ever-increasing numbers, including genomic inversions [24, 48, 71, 65, 31];insertion/deletion polymorphisms [12, 26, 42];and, copy number polymorphisms [28, 59, 60]. These large variations can completely disrupt coding and regulatory sites and copy number of genes, and thereby have a huge impact on human phenotypes and disease susceptibility [23, 61]. Deleterious effects have indeed been observed in cancer and other diseases [70, 43]. Our understanding of the scale and impact of these variations can be enhanced by improving computational tools for mining the data from these technologies. Here, I propose the development of algorithms and computational tools to improve detection and resolution (location of breakpoints) of structural variation. Specifically, I will develop algorithms for (a) experimental design of sequencing projects for detecting and resolving structural variations;(b) fine-mapping of breakpoints using end sequence profiling, to detect gene-disruption and gene-fusions;(c) reconstructing tumor genome architectures;(d) detection of targeted genomic variations in a heterogeneous mix of normal versus mutated cells via multiplex PCR;and (e) detection of balanced structural variation in genotype data. The tools will be designed using techniques from statistical machine learning and combinatorial algorithms. Validation will be performed using known structural variations, simulation studies, and extensive experimental collaborations with technology developers and early technology adopters. All of the data, and software will be freely available for academic and non-commercial uses.

Public Health Relevance

The proposed computational tools will be used to detect structural variations in human populations as a starting point for understanding their role in normal evolution and disease, specifically cancer. The architecture of tumor genomes will help reveal genes that are disrupted and differentially expressed in tumor cells. The targeted detection of genomic lesions in a heterogeneous mix of mutated and wildtype cells, will find application as an early diagnostic for cancer. Thus, our computational methods will have an immediate and long term effect on human health.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Brooks, Lisa
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California San Diego
Biostatistics & Other Math Sci
Schools of Arts and Sciences
La Jolla
United States
Zip Code
Kinsella, Marcus; Patel, Anand; Bafna, Vineet (2014) The elusive evidence for chromothripsis. Nucleic Acids Res 42:8231-42
Patel, Anand; Schwab, Richard; Liu, Yu-Tsueng et al. (2014) Amplification and thrifty single-molecule sequencing of recurrent somatic structural variations. Genome Res 24:318-28
Ronen, Roy; Zhou, Dan; Bafna, Vineet et al. (2014) The genetic basis of chronic mountain sickness. Physiology (Bethesda) 29:403-12
Kozanitis, Christos; Heiberg, Andrew; Varghese, George et al. (2014) Using Genome Query Language to uncover genetic variation. Bioinformatics 30:1-8
Zakov, Shay; Kinsella, Marcus; Bafna, Vineet (2013) An algorithmic approach for breakage-fusion-bridge detection in tumor genomes. Proc Natl Acad Sci U S A 110:5546-51
Zhou, Dan; Udpa, Nitin; Ronen, Roy et al. (2013) Whole-genome sequencing uncovers the genetic basis of chronic mountain sickness in Andean highlanders. Am J Hum Genet 93:452-62
Lo, Christine; Kim, Sangwoo; Zakov, Shay et al. (2013) Evaluating genome architecture of a complex region via generalized bipartite matching. BMC Bioinformatics 14 Suppl 5:S13
Ronen, Roy; Udpa, Nitin; Halperin, Eran et al. (2013) Learning natural selection from the site frequency spectrum. Genetics 195:181-93
Kim, Sangwoo; Medvedev, Paul; Paton, Tara A et al. (2013) Reprever: resolving low-copy duplicated sequences using template driven assembly. Nucleic Acids Res 41:e128
Hon, Gary C; Hawkins, R David; Caballero, Otavia L et al. (2012) Global DNA hypomethylation coupled to repressive chromatin domain formation and gene silencing in breast cancer. Genome Res 22:246-58

Showing the most recent 10 out of 21 publications