Molecular simulations (MS) have become an integral part of molecular and structural biology. By pro- viding model descriptions for biochemical and biophysical processes at nano-scopic scale, MS can provide fundamental understanding of diseases and help discovery of drugs. MS, by their nature, generate large amounts of data. Although many of the MS software are carefully designed to achieve maximum computational performance in simulation, they seriously fall short on storage and handling of the large scale data output. The objective of the proposed research is to use database technologies to improve the efficiency, ease of maintenance, and security of MS data analysis. We propose to accomplish this by developing novel data structures and query processing algorithms in the kernel of the database management system (DBMS), in addition to leveraging the advantages of such systems in their current forms. Based on the success of above database-centric techniques, we will also develop automatic feedback control mechanisms in MS to improve the online tuning of simulations that is needed in studying many biochemical processes. The project has three specific aims: 7 Development of a Database-centric MS (DCMS) data analysis framework that stores simulation data collected from various sources, provides standard application programming interfaces (APIs) for data retrieval, and allows global data access to research community while ensuring fine data security policies. 7 Augmenting DCMS with novel data structures and algorithms for efficient data retrieval and query processing. We focus on creative indexing and data organization techniques, and query processing and optimization strategies. 7 Integration of DCMS and steering-based MS programs into one unified simulation framework that can greatly improve the efficiency of the MS process. This framework will be demonstrated as part of the efforts to solve real biomedical problems. We believe DCMS will produce a revolutionary high throughput technique for MS researchers and accelerate the discovery process in medical research. Such innovations will bring significant intellectual merit from which both the biomedical and database management communities will benefit.

Public Health Relevance

We propose a novel Database-centric Molecular Simulation (DCMS) framework that can connect to high-efficiency computational power of existing molecular simulation (MS) software and augment it with strong points of database systems for post-simulation data storage and analysis. It also provides significant improvement on the efficiency of the MS process itself. Such technologies will produce a high throughput platform for MS researchers to study the structure, dynamics, and thermodynamics of biomolecules. This will greatly impact many fields of medical research such as discovery of new medicines, and pathology/diagnosis of genetic diseases. As part of the project, the developed tech- niques will be used to study the structures of biolipid systems and Type I collagen fibrils.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Lyster, Peter
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of South Florida
Biostatistics & Other Math Sci
Schools of Engineering
United States
Zip Code
Kumar, Anand; Ligatti, Jay; Tu, Yi-Cheng (2015) Query Monitoring and Analysis for Database Privacy - A Security Automata Model Approach. Proc Int Conf Web Inf Syst Eng 9419:458-472
Fogarty, Joseph C; Arjunwadkar, Mihir; Pandit, Sagar A et al. (2015) Atomically detailed lipid bilayer models for the interpretation of small angle neutron and X-ray scattering data. Biochim Biophys Acta 1848:662-72
Lu, Yin; Shen, Dan; Pietsch, Maxwell et al. (2015) A novel algorithm for analyzing drug-drug interactions from MEDLINE literature. Sci Rep 5:17357
Kumar, Anand; Grupcev, Vladimir; Berrada, Meryem et al. (2015) DCMS: A data analytics and management system for molecular simulation. J Big Data 2:9
Kumar, Anand; Grupcev, Vladimir; Yuan, Yongke et al. (2014) Computing Spatial Distance Histograms for Large Scientific Datasets On-the-Fly. IEEE Trans Knowl Data Eng 26:2410-2424
Li, Hao; Yu, Di; Kumar, Anand et al. (2014) Performance Modeling in CUDA Streams - A Means for High-Throughput Data Processing. Proc IEEE Int Conf Big Data 2014:301-310
Fogarty, Joseph C; Chiu, See-Wing; Kirby, Peter et al. (2014) Automated optimization of water-water interaction parameters for a coarse-grained model. J Phys Chem B 118:1603-11
Wu, Xindong; Zhu, Xingquan; He, Yu et al. (2013) PMBC: pattern mining from biological sequences with wildcard constraints. Comput Biol Med 43:481-92
Bennett, C Brad; Kruczek, James; Rabson, D A et al. (2013) The effect of cross-link distributions in axially-ordered, cross-linked networks. J Phys Condens Matter 25:285101
Hewanadungodage, Chandima; Xia, Yuni; Lee, Jaehwan John et al. (2013) Hyper-structure mining of frequent patterns in uncertain data streams. Knowl Inf Syst 37:219-244

Showing the most recent 10 out of 18 publications