Fine-scale nucleotide changes, along with genetic recombination, are often cited as the major source of human genetic variation [1, 13, 14]. Less is known about larger scale (>10kb) genomic structural variations. As genomic technologies improve, we are detecting structural variation in ever-increasing numbers, including genomic inversions [24, 48, 71, 65, 31];insertion/deletion polymorphisms [12, 26, 42];and, copy number polymorphisms [28, 59, 60]. These large variations can completely disrupt coding and regulatory sites and copy number of genes, and thereby have a huge impact on human phenotypes and disease susceptibility [23, 61]. Deleterious effects have indeed been observed in cancer and other diseases [70, 43]. Our understanding of the scale and impact of these variations can be enhanced by improving computational tools for mining the data from these technologies. Here, I propose the development of algorithms and computational tools to improve detection and resolution (location of breakpoints) of structural variation. Specifically, I will develop algorithms for (a) experimental design of sequencing projects for detecting and resolving structural variations;(b) fine-mapping of breakpoints using end sequence profiling, to detect gene-disruption and gene-fusions;(c) reconstructing tumor genome architectures;(d) detection of targeted genomic variations in a heterogeneous mix of normal versus mutated cells via multiplex PCR;and (e) detection of balanced structural variation in genotype data. The tools will be designed using techniques from statistical machine learning and combinatorial algorithms. Validation will be performed using known structural variations, simulation studies, and extensive experimental collaborations with technology developers and early technology adopters. All of the data, and software will be freely available for academic and non-commercial uses.

Public Health Relevance

The proposed computational tools will be used to detect structural variations in human populations as a starting point for understanding their role in normal evolution and disease, specifically cancer. The architecture of tumor genomes will help reveal genes that are disrupted and differentially expressed in tumor cells. The targeted detection of genomic lesions in a heterogeneous mix of mutated and wildtype cells, will find application as an early diagnostic for cancer. Thus, our computational methods will have an immediate and long term effect on human health.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG004962-03
Application #
8035949
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Brooks, Lisa
Project Start
2009-04-01
Project End
2013-02-28
Budget Start
2011-03-01
Budget End
2012-02-29
Support Year
3
Fiscal Year
2011
Total Cost
$321,670
Indirect Cost
Name
University of California San Diego
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
804355790
City
La Jolla
State
CA
Country
United States
Zip Code
92093
Azad, Priti; Zhao, Huiwen W; Cabrales, Pedro J et al. (2016) Senp1 drives hypoxia-induced polycythemia via GATA1 and Bcl-xL in subjects with Monge's disease. J Exp Med 213:2729-2744
Zakov, Shay; Bafna, Vineet (2015) Reconstructing breakage fusion bridge architectures using noisy copy numbers. J Comput Biol 22:577-94
Kozanitis, Christos; Heiberg, Andrew; Varghese, George et al. (2014) Using Genome Query Language to uncover genetic variation. Bioinformatics 30:1-8
Ronen, Roy; Zhou, Dan; Bafna, Vineet et al. (2014) The genetic basis of chronic mountain sickness. Physiology (Bethesda) 29:403-12
Patel, Anand; Schwab, Richard; Liu, Yu-Tsueng et al. (2014) Amplification and thrifty single-molecule sequencing of recurrent somatic structural variations. Genome Res 24:318-28
Kinsella, Marcus; Patel, Anand; Bafna, Vineet (2014) The elusive evidence for chromothripsis. Nucleic Acids Res 42:8231-42
Udpa, Nitin; Ronen, Roy; Zhou, Dan et al. (2014) Whole genome sequencing of Ethiopian highlanders reveals conserved hypoxia tolerance genes. Genome Biol 15:R36
Lo, Christine; Liu, Rui; Lee, Jehyuk et al. (2013) On the design of clone-based haplotyping. Genome Biol 14:R100
Bafna, Vineet; Kozanitis, Christos; Deutsch, Alin et al. (2013) Abstractions for Genomics. Commun ACM 56:83-93
Lo, Christine; Kim, Sangwoo; Zakov, Shay et al. (2013) Evaluating genome architecture of a complex region via generalized bipartite matching. BMC Bioinformatics 14 Suppl 5:S13

Showing the most recent 10 out of 28 publications