Cancer genomes present amazing puzzles for genomicists to solve in terms of their structures. The size of the data pieces for attempting assembly into a complete view are both very large (cytogenetic) and very small (sequence data), often differing in scale by more than a thousand-fold. Add to this, genomic dispersity within a tumor and breakpoints within interspersed repeats, and the puzzle solution grows much more difficult. As such, the aims of this application are to effectively seamlessly scale data piece size by a hierarchical framework, through new algorithms and computational pipelines that will engage both long-range physical maps constructed by significant advancements to Nanocoding, and sequence data to create scalable views of cancer genomes that span from nucleotide-to-chromosome. This multipronged project will involve synergistic advancements to: DNA labeling, presentation of very large genomic DNA molecules, scanners for single molecule analytes, and machine vision-all system components that will be informed by advanced bioinformatic analysis techniques, developed for single molecule analysis, and cutting-edge computer simulations of DNA conformations within the devices that will be the foundry for large datasets. This highly integrated system will be aimed at the discovery of novel structural variants within four paired multiple myeloma / normal samples for tabulation of previously undetectable events as candidates for validation and further study. The resulting platform, comprising new single molecule technologies, melded with advanced bioinformatics techniques, portends scalable, comprehensive, fast genome analysis for navigating cancer genomes.

Public Health Relevance

Cancer genomes are very difficult to finely view as a whole. We are proposing to build an integrated system that will analyze very large chunks of genetic material, which will allow entire cancer genomes to be viewed like Google Earth: The ability to seamlessly zoom from a planet view to a street corner.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Exploratory/Developmental Grants Phase II (R33)
Project #
5R33CA182360-03
Application #
9318471
Study Section
Special Emphasis Panel (ZCA1)
Program Officer
Li, Jerry
Project Start
2015-08-01
Project End
2019-07-31
Budget Start
2017-08-01
Budget End
2019-07-31
Support Year
3
Fiscal Year
2017
Total Cost
Indirect Cost
Name
University of Wisconsin Madison
Department
Miscellaneous
Type
Graduate Schools
DUNS #
161202122
City
Madison
State
WI
Country
United States
Zip Code
53715
Rajaraman, Ashok; Ma, Jian (2018) Toward Recovering Allele-specific Cancer Genome Graphs. J Comput Biol 25:624-636
Kounovsky-Shafer, Kristy L; Hernandez-Ortiz, Juan P; Potamousis, Konstantinos et al. (2017) Electrostatic confinement and manipulation of DNA molecules for genome analysis. Proc Natl Acad Sci U S A 114:13400-13405
Hou, Jack P; Emad, Amin; Puleo, Gregory J et al. (2016) A new correlation clustering method for cancer mutation analysis. Bioinformatics 32:3717-3728
Tian, Dechao; Gu, Quanquan; Ma, Jian (2016) Identifying gene regulatory network rewiring using latent differential graphical models. Nucleic Acids Res 44:e140
He, Feifei; Li, Yang; Tang, Yu-Hang et al. (2016) Identifying micro-inversions using high-throughput sequencing reads. BMC Genomics 17 Suppl 1:4
Li, Yang; Zhou, Shiguo; Schwartz, David C et al. (2016) Allele-Specific Quantification of Structural Variations in Cancer Genomes. Cell Syst 3:21-34
Lee, Seonghyun; Oh, Yeeun; Lee, Jungyoon et al. (2016) DNA binding fluorescent proteins for the direct visualization of large DNA molecules. Nucleic Acids Res 44:e6
Gupta, Aditya; Place, Michael; Goldstein, Steven et al. (2015) Single-molecule analysis reveals widespread structural variation in multiple myeloma. Proc Natl Acad Sci U S A 112:7689-94