Database-centric data analysis of molecular simulations

Tu, Yicheng

Abstract

Molecular simulations (MS) have become an integral part of molecular and structural biology. By pro- viding model descriptions for biochemical and biophysical processes at nano-scopic scale, MS can provide fundamental understanding of diseases and help discovery of drugs. MS, by their nature, generate large amounts of data. Although many of the MS software are carefully designed to achieve maximum computational performance in simulation, they seriously fall short on storage and handling of the large scale data output. The objective of the proposed research is to use database technologies to improve the efficiency, ease of maintenance, and security of MS data analysis. We propose to accomplish this by developing novel data structures and query processing algorithms in the kernel of the database management system (DBMS), in addition to leveraging the advantages of such systems in their current forms. Based on the success of above database-centric techniques, we will also develop automatic feedback control mechanisms in MS to improve the online tuning of simulations that is needed in studying many biochemical processes. The project has three specific aims: 7 Development of a Database-centric MS (DCMS) data analysis framework that stores simulation data collected from various sources, provides standard application programming interfaces (APIs) for data retrieval, and allows global data access to research community while ensuring fine data security policies. 7 Augmenting DCMS with novel data structures and algorithms for efficient data retrieval and query processing. We focus on creative indexing and data organization techniques, and query processing and optimization strategies. 7 Integration of DCMS and steering-based MS programs into one unified simulation framework that can greatly improve the efficiency of the MS process. This framework will be demonstrated as part of the efforts to solve real biomedical problems. We believe DCMS will produce a revolutionary high throughput technique for MS researchers and accelerate the discovery process in medical research. Such innovations will bring significant intellectual merit from which both the biomedical and database management communities will benefit.

Public Health Relevance

We propose a novel Database-centric Molecular Simulation (DCMS) framework that can connect to high-efficiency computational power of existing molecular simulation (MS) software and augment it with strong points of database systems for post-simulation data storage and analysis. It also provides significant improvement on the efficiency of the MS process itself. Such technologies will produce a high throughput platform for MS researchers to study the structure, dynamics, and thermodynamics of biomolecules. This will greatly impact many fields of medical research such as discovery of new medicines, and pathology/diagnosis of genetic diseases. As part of the project, the developed tech- niques will be used to study the structures of biolipid systems and Type I collagen fibrils.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM086707-03
Application #: 8259152
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Lyster, Peter

Project Start: 2010-04-15
Project End: 2015-03-31
Budget Start: 2012-04-01
Budget End: 2013-03-31
Support Year: 3
Fiscal Year: 2012
Total Cost: $173,300
Indirect Cost: $44,602

Institution

Name: University of South Florida
Department: Biostatistics & Other Math Sci
Type: Schools of Engineering
DUNS #: 069687242

City: Tampa
State: FL
Country: United States
Zip Code: 33612

Related projects


NIH 2014 R01 GM	Database-centric data analysis of molecular simulations Tu, Yicheng / University of South Florida	$141,188
NIH 2013 R01 GM	Database-centric data analysis of molecular simulations Tu, Yicheng / University of South Florida	$163,741
NIH 2012 R01 GM	Database-centric data analysis of molecular simulations Tu, Yicheng / University of South Florida	$173,300
NIH 2011 R01 GM	Database-centric data analysis of molecular simulations Tu, Yicheng / University of South Florida	$174,145
NIH 2010 R01 GM	Database-centric data analysis of molecular simulations Tu, Yicheng / University of South Florida	$210,299

Publications

Kruczek, James; Chiu, See-Wing; Jakobsson, Eric et al. (2017) Effects of Lithium and Other Monovalent Ions on Palmitoyl Oleoyl Phosphatidylcholine Bilayer. Langmuir 33:1105-1115

Kumar, Anand; Grupcev, Vladimir; Berrada, Meryem et al. (2015) DCMS: A data analytics and management system for molecular simulation. J Big Data 2:9

Fogarty, Joseph C; Arjunwadkar, Mihir; Pandit, Sagar A et al. (2015) Atomically detailed lipid bilayer models for the interpretation of small angle neutron and X-ray scattering data. Biochim Biophys Acta 1848:662-72

Kumar, Anand; Ligatti, Jay; Tu, Yi-Cheng (2015) Query Monitoring and Analysis for Database Privacy - A Security Automata Model Approach. Proc Int Conf Web Inf Syst Eng 9419:458-472

Lu, Yin; Shen, Dan; Pietsch, Maxwell et al. (2015) A novel algorithm for analyzing drug-drug interactions from MEDLINE literature. Sci Rep 5:17357

Li, Hao; Yu, Di; Kumar, Anand et al. (2014) Performance Modeling in CUDA Streams - A Means for High-Throughput Data Processing. Proc IEEE Int Conf Big Data 2014:301-310

Kumar, Anand; Grupcev, Vladimir; Yuan, Yongke et al. (2014) Computing Spatial Distance Histograms for Large Scientific Datasets On-the-Fly. IEEE Trans Knowl Data Eng 26:2410-2424

Fogarty, Joseph C; Chiu, See-Wing; Kirby, Peter et al. (2014) Automated optimization of water-water interaction parameters for a coarse-grained model. J Phys Chem B 118:1603-11

Wu, Xindong; Zhu, Xingquan; He, Yu et al. (2013) PMBC: pattern mining from biological sequences with wildcard constraints. Comput Biol Med 43:481-92

Bennett, C Brad; Kruczek, James; Rabson, D A et al. (2013) The effect of cross-link distributions in axially-ordered, cross-linked networks. J Phys Condens Matter 25:285101

Showing the most recent 10 out of 19 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: