This proposal combines an extensive mentored training program for the PI with a research project that aims to develop novel approaches for visualization and exploration that will accelerate the identification and validation of disease-associated variants in large and complex genomics and epigenomics data sets. An increasing number of such variants are discovered in studies that generate and analyze a wide range of molecular data types for thousands of patients or samples. This progress is enabled by the availability of computational analysis pipelines that employ sophisticated statistical methods for next-generation sequencing (NGS) data. Interpretation of analysis results by biological and clinical domain experts, however, is emerging as a major bottle- neck due to the amount and complexity of the pipeline outputs. To address this, we propose to develop inter- active visualization methods and a web-based infrastructure that will enable domain experts to identify disease-associated variants in large (epi)genomic data sets through visual exploration of computational predictions and the underlying data. This will have a significant impact on the rate at which predictions can be verified, interpreted and translated into clinically actionable finding. Our first priority is the design of methods and tools to visualize (epi) genomic data in a range of different contexts, for instance by grouping and representing features based on their function, chromatin state, transcriptional activity or genomic coordinates. We will also develop new non-linear genome representations to compare structural variants across genomes, complementing the functionality of the highly successful genome browsers. We then investigate how information external to the primary data - for instance from other studies, drug target or biomarker databases - can be applied to guide investigators through the data set. Finally, we implement a web-based exploration system for biological and clinical domain an expert that combines our interactive visualizations with large-scale public (epi) genomic data sets. The methods and tools developed under this proposal will be generally applicable and driving biological examples are chosen from The Cancer Genome Atlas (TCGA) and the Encyclopedia of DNA Elements (ENCODE and modENCODE).
The visualization methods and tools developed under this proposal will accelerate the identification and verification of disease-associated variants in larg genomic and epigenomic data sets, thereby reducing the effort required to translate findings into clinically actionable results. Furthermore, under this proposal, the PI will acquire the skills required to be a productive independent investigator in the biomedical field through further mentored training in genomics and epigenomics as well as in research management.
|Lekschas, Fritz; Gehlenborg, Nils (2018) SATORI: a system for ontology-guided visual exploration of biomedical data repositories. Bioinformatics 34:1200-1207|
|Nobre, Carolina; Gehlenborg, Nils; Coon, Hilary et al. (2018) Lineage: Visualizing Multivariate Clinical Data in Genealogy Graphs. IEEE Trans Vis Comput Graph :|
|Lekschas, Fritz; Bach, Benjamin; Kerpedjiev, Peter et al. (2018) HiPiler: Visual Exploration of Large Genome Interaction Matrices with Interactive Small Multiples. IEEE Trans Vis Comput Graph 24:522-531|
|Kerpedjiev, Peter; Abdennur, Nezar; Lekschas, Fritz et al. (2018) HiGlass: web-based visual exploration and analysis of genome interaction maps. Genome Biol 19:125|
|Conway, Jake R; Lex, Alexander; Gehlenborg, Nils (2017) UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinformatics 33:2938-2940|
|Kern, Michael; Lex, Alexander; Gehlenborg, Nils et al. (2017) Interactive visual exploration and refinement of cluster assignments. BMC Bioinformatics 18:406|
|Manrai, Arjun K; Patel, Chirag J; Gehlenborg, Nils et al. (2016) METHODS TO ENHANCE THE REPRODUCIBILITY OF PRECISION MEDICINE. Pac Symp Biocomput 21:180-182|
|Stitz, H; Luger, S; Streit, M et al. (2016) AVOCADO: Visualization of Workflow-Derived Data Provenance for Reproducible Biomedical Research. Comput Graph Forum 35:481-490|