Molecular simulations (MS) have become an integral part of molecular and structural biology. By pro- viding model descriptions for biochemical and biophysical processes at nano-scopic scale, MS can provide fundamental understanding of diseases and help discovery of drugs. MS, by their nature, generate large amounts of data. Although many of the MS software are carefully designed to achieve maximum computational performance in simulation, they seriously fall short on storage and handling of the large scale data output. The objective of the proposed research is to use database technologies to improve the efficiency, ease of maintenance, and security of MS data analysis. We propose to accomplish this by developing novel data structures and query processing algorithms in the kernel of the database management system (DBMS), in addition to leveraging the advantages of such systems in their current forms. Based on the success of above database-centric techniques, we will also develop automatic feedback control mechanisms in MS to improve the online tuning of simulations that is needed in studying many biochemical processes. The project has three specific aims: 7 Development of a Database-centric MS (DCMS) data analysis framework that stores simulation data collected from various sources, provides standard application programming interfaces (APIs) for data retrieval, and allows global data access to research community while ensuring fine data security policies. 7 Augmenting DCMS with novel data structures and algorithms for efficient data retrieval and query processing. We focus on creative indexing and data organization techniques, and query processing and optimization strategies. 7 Integration of DCMS and steering-based MS programs into one unified simulation framework that can greatly improve the efficiency of the MS process. This framework will be demonstrated as part of the efforts to solve real biomedical problems. We believe DCMS will produce a revolutionary high throughput technique for MS researchers and accelerate the discovery process in medical research. Such innovations will bring significant intellectual merit from which both the biomedical and database management communities will benefit.

Public Health Relevance

We propose a novel Database-centric Molecular Simulation (DCMS) framework that can connect to high-efficiency computational power of existing molecular simulation (MS) software and augment it with strong points of database systems for post-simulation data storage and analysis. It also provides significant improvement on the efficiency of the MS process itself. Such technologies will produce a high throughput platform for MS researchers to study the structure, dynamics, and thermodynamics of biomolecules. This will greatly impact many fields of medical research such as discovery of new medicines, and pathology/diagnosis of genetic diseases. As part of the project, the developed tech- niques will be used to study the structures of biolipid systems and Type I collagen fibrils.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM086707-01A1
Application #
7736070
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Remington, Karin A
Project Start
2010-04-15
Project End
2015-03-31
Budget Start
2010-04-15
Budget End
2011-03-31
Support Year
1
Fiscal Year
2010
Total Cost
$210,299
Indirect Cost
Name
University of South Florida
Department
Biostatistics & Other Math Sci
Type
Schools of Engineering
DUNS #
069687242
City
Tampa
State
FL
Country
United States
Zip Code
33612
Kruczek, James; Chiu, See-Wing; Jakobsson, Eric et al. (2017) Effects of Lithium and Other Monovalent Ions on Palmitoyl Oleoyl Phosphatidylcholine Bilayer. Langmuir 33:1105-1115
Kumar, Anand; Grupcev, Vladimir; Berrada, Meryem et al. (2015) DCMS: A data analytics and management system for molecular simulation. J Big Data 2:9
Fogarty, Joseph C; Arjunwadkar, Mihir; Pandit, Sagar A et al. (2015) Atomically detailed lipid bilayer models for the interpretation of small angle neutron and X-ray scattering data. Biochim Biophys Acta 1848:662-72
Kumar, Anand; Ligatti, Jay; Tu, Yi-Cheng (2015) Query Monitoring and Analysis for Database Privacy - A Security Automata Model Approach. Proc Int Conf Web Inf Syst Eng 9419:458-472
Lu, Yin; Shen, Dan; Pietsch, Maxwell et al. (2015) A novel algorithm for analyzing drug-drug interactions from MEDLINE literature. Sci Rep 5:17357
Li, Hao; Yu, Di; Kumar, Anand et al. (2014) Performance Modeling in CUDA Streams - A Means for High-Throughput Data Processing. Proc IEEE Int Conf Big Data 2014:301-310
Kumar, Anand; Grupcev, Vladimir; Yuan, Yongke et al. (2014) Computing Spatial Distance Histograms for Large Scientific Datasets On-the-Fly. IEEE Trans Knowl Data Eng 26:2410-2424
Fogarty, Joseph C; Chiu, See-Wing; Kirby, Peter et al. (2014) Automated optimization of water-water interaction parameters for a coarse-grained model. J Phys Chem B 118:1603-11
Wu, Xindong; Zhu, Xingquan; He, Yu et al. (2013) PMBC: pattern mining from biological sequences with wildcard constraints. Comput Biol Med 43:481-92
Bennett, C Brad; Kruczek, James; Rabson, D A et al. (2013) The effect of cross-link distributions in axially-ordered, cross-linked networks. J Phys Condens Matter 25:285101

Showing the most recent 10 out of 19 publications