This proposal combines an extensive mentored training program for the PI with a research project that aims to develop novel approaches for visualization and exploration that will accelerate the identification and validation of disease-associated variants in large and complex genomics and epigenomics data sets. An increasing number of such variants are discovered in studies that generate and analyze a wide range of molecular data types for thousands of patients or samples. This progress is enabled by the availability of computational analysis pipelines that employ sophisticated statistical methods for next-generation sequencing (NGS) data. Interpretation of analysis results by biological and clinical domain experts, however, is emerging as a major bottle- neck due to the amount and complexity of the pipeline outputs. To address this, we propose to develop inter- active visualization methods and a web-based infrastructure that will enable domain experts to identify disease-associated variants in large (epi)genomic data sets through visual exploration of computational predictions and the underlying data. This will have a significant impact on the rate at which predictions can be verified, interpreted and translated into clinically actionable finding. Our first priority is the design of methods and tools to visualize (epi) genomic data in a range of different contexts, for instance by grouping and representing features based on their function, chromatin state, transcriptional activity or genomic coordinates. We will also develop new non-linear genome representations to compare structural variants across genomes, complementing the functionality of the highly successful genome browsers. We then investigate how information external to the primary data - for instance from other studies, drug target or biomarker databases - can be applied to guide investigators through the data set. Finally, we implement a web-based exploration system for biological and clinical domain an expert that combines our interactive visualizations with large-scale public (epi) genomic data sets. The methods and tools developed under this proposal will be generally applicable and driving biological examples are chosen from The Cancer Genome Atlas (TCGA) and the Encyclopedia of DNA Elements (ENCODE and modENCODE).

Public Health Relevance

The visualization methods and tools developed under this proposal will accelerate the identification and verification of disease-associated variants in larg genomic and epigenomic data sets, thereby reducing the effort required to translate findings into clinically actionable results. Furthermore, under this proposal, the PI will acquire the skills required to be a productive independent investigator in the biomedical field through further mentored training in genomics and epigenomics as well as in research management.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Transition Award (R00)
Project #
4R00HG007583-03
Application #
9123773
Study Section
Special Emphasis Panel (NSS)
Program Officer
Gilchrist, Daniel A
Project Start
2015-08-18
Project End
2018-07-31
Budget Start
2015-08-18
Budget End
2016-07-31
Support Year
3
Fiscal Year
2015
Total Cost
$248,995
Indirect Cost
$99,840
Name
Harvard Medical School
Department
Miscellaneous
Type
Schools of Medicine
DUNS #
047006379
City
Boston
State
MA
Country
United States
Zip Code
02115
Lekschas, Fritz; Gehlenborg, Nils (2018) SATORI: a system for ontology-guided visual exploration of biomedical data repositories. Bioinformatics 34:1200-1207
Nobre, Carolina; Gehlenborg, Nils; Coon, Hilary et al. (2018) Lineage: Visualizing Multivariate Clinical Data in Genealogy Graphs. IEEE Trans Vis Comput Graph :
Lekschas, Fritz; Bach, Benjamin; Kerpedjiev, Peter et al. (2018) HiPiler: Visual Exploration of Large Genome Interaction Matrices with Interactive Small Multiples. IEEE Trans Vis Comput Graph 24:522-531
Kerpedjiev, Peter; Abdennur, Nezar; Lekschas, Fritz et al. (2018) HiGlass: web-based visual exploration and analysis of genome interaction maps. Genome Biol 19:125
Conway, Jake R; Lex, Alexander; Gehlenborg, Nils (2017) UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinformatics 33:2938-2940
Kern, Michael; Lex, Alexander; Gehlenborg, Nils et al. (2017) Interactive visual exploration and refinement of cluster assignments. BMC Bioinformatics 18:406
Manrai, Arjun K; Patel, Chirag J; Gehlenborg, Nils et al. (2016) METHODS TO ENHANCE THE REPRODUCIBILITY OF PRECISION MEDICINE. Pac Symp Biocomput 21:180-182
Stitz, H; Luger, S; Streit, M et al. (2016) AVOCADO: Visualization of Workflow-Derived Data Provenance for Reproducible Biomedical Research. Comput Graph Forum 35:481-490