Fine-scale nucleotide changes, along with genetic recombination, are often cited as the major source of human genetic variation [1, 13, 14]. Less is known about larger scale (>10kb) genomic structural variations. As genomic technologies improve, we are detecting structural variation in ever-increasing numbers, including genomic inversions [24, 48, 71, 65, 31];insertion/deletion polymorphisms [12, 26, 42];and, copy number polymorphisms [28, 59, 60]. These large variations can completely disrupt coding and regulatory sites and copy number of genes, and thereby have a huge impact on human phenotypes and disease susceptibility [23, 61]. Deleterious effects have indeed been observed in cancer and other diseases [70, 43]. Our understanding of the scale and impact of these variations can be enhanced by improving computational tools for mining the data from these technologies. Here, I propose the development of algorithms and computational tools to improve detection and resolution (location of breakpoints) of structural variation. Specifically, I will develop algorithms for (a) experimental design of sequencing projects for detecting and resolving structural variations;(b) fine-mapping of breakpoints using end sequence profiling, to detect gene-disruption and gene-fusions;(c) reconstructing tumor genome architectures;(d) detection of targeted genomic variations in a heterogeneous mix of normal versus mutated cells via multiplex PCR;and (e) detection of balanced structural variation in genotype data. The tools will be designed using techniques from statistical machine learning and combinatorial algorithms. Validation will be performed using known structural variations, simulation studies, and extensive experimental collaborations with technology developers and early technology adopters. All of the data, and software will be freely available for academic and non-commercial uses.
The proposed computational tools will be used to detect structural variations in human populations as a starting point for understanding their role in normal evolution and disease, specifically cancer. The architecture of tumor genomes will help reveal genes that are disrupted and differentially expressed in tumor cells. The targeted detection of genomic lesions in a heterogeneous mix of mutated and wildtype cells, will find application as an early diagnostic for cancer. Thus, our computational methods will have an immediate and long term effect on human health.
|Azad, Priti; Zhao, Huiwen W; Cabrales, Pedro J et al. (2016) Senp1 drives hypoxia-induced polycythemia via GATA1 and Bcl-xL in subjects with Monge's disease. J Exp Med 213:2729-2744|
|Zakov, Shay; Bafna, Vineet (2015) Reconstructing breakage fusion bridge architectures using noisy copy numbers. J Comput Biol 22:577-94|
|Kozanitis, Christos; Heiberg, Andrew; Varghese, George et al. (2014) Using Genome Query Language to uncover genetic variation. Bioinformatics 30:1-8|
|Ronen, Roy; Zhou, Dan; Bafna, Vineet et al. (2014) The genetic basis of chronic mountain sickness. Physiology (Bethesda) 29:403-12|
|Patel, Anand; Schwab, Richard; Liu, Yu-Tsueng et al. (2014) Amplification and thrifty single-molecule sequencing of recurrent somatic structural variations. Genome Res 24:318-28|
|Kinsella, Marcus; Patel, Anand; Bafna, Vineet (2014) The elusive evidence for chromothripsis. Nucleic Acids Res 42:8231-42|
|Udpa, Nitin; Ronen, Roy; Zhou, Dan et al. (2014) Whole genome sequencing of Ethiopian highlanders reveals conserved hypoxia tolerance genes. Genome Biol 15:R36|
|Lo, Christine; Liu, Rui; Lee, Jehyuk et al. (2013) On the design of clone-based haplotyping. Genome Biol 14:R100|
|Bafna, Vineet; Kozanitis, Christos; Deutsch, Alin et al. (2013) Abstractions for Genomics. Commun ACM 56:83-93|
|Lo, Christine; Kim, Sangwoo; Zakov, Shay et al. (2013) Evaluating genome architecture of a complex region via generalized bipartite matching. BMC Bioinformatics 14 Suppl 5:S13|
Showing the most recent 10 out of 28 publications