Compressive Structural BioInformatics: High Efficiency 3D Structure Compression

Rose, Peter

Abstract

The Protein Data Bank (PDB) archive has doubled in size since 2008 and exceeded 100,000 entries in 2014. At the same time, the size and complexity of structures are increasing dramatically, for example the recently determined structure of the HIV-capsid contains about 2.5 million atoms. The emerging techniques of integrative Structural Biology are starting to determine structures of molecular machines in the mega-Dalton range by combining cryo-Electron Microscopy, Small-Angle X-ray Scattering, X-ray, and NMR at increasingly higher resolution. Interactive visualization of large complexes exceeds available network bandwidth and memory of typical scientists' desktops, laptops, or mobile devices. Large-scale structural analyses and queries of the archive have become a Big Data challenge. To make these structures accessible to all scientists, educators, and students, new ways of representing these data are required. In domains such as high-definition television, satellite communication, video or audio streaming, high-efficiency compression has been key to deliver interactive media to phones, tablets, laptops, and desktops. A similar trend has emerged in the handling of whole genome sequence data. An entire discipline Compressive Genomics has been developed to deal with data compression and development of algorithms to process these data. This proposal introduces the concept of Compressive Structural Bioinformatics, a set of compression algorithms, applications, and workflows that analyze and visualize large structures and large sets of structures at an unprecedented speed (100-1000 fold speedup) and with minimal client side overhead.
The aims of this project are: 1. Develop a compact and extensible representation of 3-D biomolecular structures, 2. Enable interactive visualization of large complexes by reducing network bandwidth and enabling data streaming, 3. Enable large-scale analyses of the PDB archive for I/O bound workflows, and 4. Develop open source software libraries. Through collaboration with developers of widely used visualization applications and distributed data-parallel workflow systems, the new techniques will be implemented, benchmarked, and reference implementations will be provided in several programming languages for easy adoption. It is expect that these new Compressive Structural Bioinformatics tools will enable transformative research as intended by the NIH's Big Data to Knowledge initiative.

Public Health Relevance

The 3-D structures (shapes) of proteins and nucleic acids, the building blocks of life, are fundamental to the understanding of disease processes, the mechanism of drug actions, and the development of new medicines. We develop data compression and streaming techniques for large 3-D structures, similar to what YouTube does for videos, to enable access, large-scale analysis, and interactive visualization of very large biomolecules by scientists, educators, students, and educators.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Research Project--Cooperative Agreements (U01)
Project #: 1U01CA198942-01
Application #: 8870891
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Li, Jerry

Project Start: 2015-06-01
Project End: 2018-05-31
Budget Start: 2015-06-01
Budget End: 2016-05-31
Support Year: 1
Fiscal Year: 2015
Total Cost
Indirect Cost

Institution

Name: University of California San Diego
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 804355790

City: La Jolla
State: CA
Country: United States
Zip Code: 92093

Related projects


NIH 2017 U01 CA	Compressive Structural BioInformatics: High Efficiency 3D Structure Compression Rose, Peter W. / University of California San Diego	$445,075
NIH 2016 U01 CA	Compressive Structural BioInformatics: High Efficiency 3D Structure Compression Rose, Peter W. / University of California San Diego
NIH 2016 U01 CA	Compressive Structural BioInformatics: High Efficiency 3D Structure Compression Rose, Peter W. / University of California San Diego	$252,044
NIH 2015 U01 CA	Compressive Structural BioInformatics: High Efficiency 3D Structure Compression Rose, Peter W. / University of California San Diego

Publications

Rose, Alexander S; Bradley, Anthony R; Valasatava, Yana et al. (2018) NGL viewer: web-based molecular graphics for large complexes. Bioinformatics 34:3755-3758

Rose, Peter W; Prli?, Andreas; Altunkaya, Ali et al. (2017) The RCSB protein data bank: integrative view of protein, gene and 3D structural information. Nucleic Acids Res 45:D271-D281

Valasatava, Yana; Bradley, Anthony R; Rose, Alexander S et al. (2017) Towards an efficient compression of 3D coordinates of macromolecular structures. PLoS One 12:e0174846

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: