Integrative visual and computational exploratory analysis of genomics data High-throughput genomics is now shifting from a data generation field to a data analysis field. Rapid advances in sequencing technologies and their use in large consortium projects like Encode, 1000 genomes project and the Human Epigenome Roadmap, among others, hold promise for biomedical scientists to posit and test hypothesis on complex mechanisms of development and disease by integrating massive publicly available data as context for their own experimental data. The R / Bioconductor project is a success story in the field of high- throughput genomics data analysis, with a large software repository, well-established software development and dissemination practices, and extensive user base. The core project provides infrastructure for leading edge analysis of a wide range of genomics data, chiefly high-throughput sequencing and microarrays. Bioconductor is well-suited to primary and integrative analysis of, e.g., RNA-seq differential expression, copy number, SNP and other variants, and methylation and other epigenetic data. Significant opportunity exists to develop integrative and interactive visualization facilities based on the infrastructure provided by Bioconductor. Such tools would be immediately accessible to the large number of international software developers using Bioconductor to implement analytic methods, and to established and nascent user communities hungry for effective, flexible, statistically informed visualization tools Our group has extensive experience in the development of statistical, computational and visualization tools for genomics data. It also collaborates closely with biomedical researchers in substantive cutting-edge research providing first-hand knowledge of the needs of this community. Our group has consistently demonstrated a commitment to the public dissemination of tools as open-source publicly available software. In this project we will develop interactive visualization methods and systems that provide tight-knit coupling with computational and statistical modeling and data analysis. We will use this framework to transition and implement cutting-edge methods for visualization of large datasets and apply these to three important areas in genomics: epigenomics, transcriptomics and metagenomics, all holding great promise for the understanding of human development and disease.

Public Health Relevance

Integrative visual and computational exploratory analysis of genomics data High-throughput genomics is now shifting from a data generation field to a data analysis field. Biomedical scientists need software tools that support flexible data integration and analysis, collaboration and dissemination so that the full promise of genomics as a data analysis field is successfully met. Our proposal is to leverage our extensive experience in building computational, statistical and visualization systems to create tools that support exploratory, nimble and creative data analysis workflows in an integrative, reproducible, collaborative environment.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM114267-02
Application #
9132321
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
2015-09-01
Project End
2019-08-31
Budget Start
2016-09-01
Budget End
2017-08-31
Support Year
2
Fiscal Year
2016
Total Cost
$446,711
Indirect Cost
$133,374
Name
University of Maryland College Park
Department
Biostatistics & Other Math Sci
Type
Schools of Earth Sciences/Natur
DUNS #
790934285
City
College Park
State
MD
Country
United States
Zip Code
20742
Park, Deokgun; Drucker, Steven Mark; Fernandez, Roland et al. (2018) ATOM: A Grammar for Unit Visualizations. IEEE Trans Vis Comput Graph 24:3032-3043
Hicks, Stephanie C; Okrah, Kwame; Paulson, Joseph N et al. (2018) Smooth quantile normalization. Biostatistics 19:185-198
Kumar, M Senthil; Slud, Eric V; Okrah, Kwame et al. (2018) Analysis and correction of compositional bias in sparse sequencing count data. BMC Genomics 19:799
Kancherla, Jayaram; Zhang, Alexander; Gottfried, Brian et al. (2018) Epiviz Web Components: reusable and extensible component library to visualize functional genomic datasets. F1000Res 7:1096
Wagner, Justin; Chelaru, Florin; Kancherla, Jayaram et al. (2018) Metaviz: interactive statistical and visual analysis of metagenomic data. Nucleic Acids Res 46:2777-2787
Kim, Minjeong; Kang, Kyeongpil; Park, Deokgun et al. (2017) TopicLens: Efficient Multi-Level Visual Topic Exploration of Large-Scale Document Collections. IEEE Trans Vis Comput Graph 23:151-160
Braid, Susan M; Okrah, Kwame; Shetty, Amol et al. (2017) DNA Methylation Patterns in Cord Blood of Neonates Across Gestational Age: Association With Cell-Type Proportions. Nurs Res 66:115-122
Manimaran, Solaiappan; Selby, Heather Marie; Okrah, Kwame et al. (2016) BatchQC: interactive software for evaluating sample and batch effects in genomic data. Bioinformatics 32:3836-3838
Sharmin, Mahfuza; Bravo, H├ęctor Corrada; Hannenhalli, Sridhar (2016) Heterogeneity of transcription factor binding specificity models within and across cell lines. Genome Res 26:1110-23
Fernandes, Maria Cecilia; Dillon, Laura A L; Belew, Ashton Trey et al. (2016) Dual Transcriptome Profiling of Leishmania-Infected Human Macrophages Reveals Distinct Reprogramming Signatures. MBio 7:

Showing the most recent 10 out of 15 publications