This proposal combines an extensive mentored training program for the PI with a research project that aims to develop novel approaches for visualization and exploration that will accelerate the identification and validation of disease-associated variants in large and complex genomics and epigenomics data sets. An increasing number of such variants are discovered in studies that generate and analyze a wide range of molecular data types for thousands of patients or samples. This progress is enabled by the availability of computational analysis pipelines that employ sophisticated statistical methods for next-generation sequencing (NGS) data. Interpretation of analysis results by biological and clinical domain experts, however, is emerging as a major bottle- neck due to the amount and complexity of the pipeline outputs. To address this, we propose to develop inter- active visualization methods and a web-based infrastructure that will enable domain experts to identify disease-associated variants in large (epi)genomic data sets through visual exploration of computational predictions and the underlying data. This will have a significant impact on the rate at which predictions can be verified, interpreted and translated into clinically actionable finding. Our first priority is the design of methods and tools to visualize (epi) genomic data in a range of different contexts, for instance by grouping and representing features based on their function, chromatin state, transcriptional activity or genomic coordinates. We will also develop new non-linear genome representations to compare structural variants across genomes, complementing the functionality of the highly successful genome browsers. We then investigate how information external to the primary data - for instance from other studies, drug target or biomarker databases - can be applied to guide investigators through the data set. Finally, we implement a web-based exploration system for biological and clinical domain an expert that combines our interactive visualizations with large-scale public (epi) genomic data sets. The methods and tools developed under this proposal will be generally applicable and driving biological examples are chosen from The Cancer Genome Atlas (TCGA) and the Encyclopedia of DNA Elements (ENCODE and modENCODE).

Public Health Relevance

The visualization methods and tools developed under this proposal will accelerate the identification and verification of disease-associated variants in larg genomic and epigenomic data sets, thereby reducing the effort required to translate findings into clinically actionable results. Furthermore, under this proposal, the PI will acquire the skills required to be a productive independent investigator in the biomedical field through further mentored training in genomics and epigenomics as well as in research management.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Career Transition Award (K99)
Project #
5K99HG007583-02
Application #
8788052
Study Section
Ethical, Legal, Social Implications Review Committee (GNOM)
Program Officer
Gilchrist, Daniel A
Project Start
2014-01-01
Project End
2015-12-31
Budget Start
2015-01-01
Budget End
2015-12-31
Support Year
2
Fiscal Year
2015
Total Cost
$87,062
Indirect Cost
$6,264
Name
Harvard Medical School
Department
Miscellaneous
Type
Schools of Medicine
DUNS #
047006379
City
Boston
State
MA
Country
United States
Zip Code
02115
Kern, Michael; Lex, Alexander; Gehlenborg, Nils et al. (2017) Interactive visual exploration and refinement of cluster assignments. BMC Bioinformatics 18:406
Lex, Alexander; Gehlenborg, Nils; Strobelt, Hendrik et al. (2014) UpSet: Visualization of Intersecting Sets. IEEE Trans Vis Comput Graph 20:1983-92
Gratzl, Samuel; Gehlenborg, Nils; Lex, Alexander et al. (2014) Domino: Extracting, Comparing, and Manipulating Subsets Across Multiple Tabular Datasets. IEEE Trans Vis Comput Graph 20:2023-32
Streit, Marc; Lex, Alexander; Gratzl, Samuel et al. (2014) Guided visual exploration of genomic stratifications in cancer. Nat Methods 11:884-885