Molecular simulations (MS) have become an integral part of molecular and structural biology. By pro- viding model descriptions for biochemical and biophysical processes at nano-scopic scale, MS can provide fundamental understanding of diseases and help discovery of drugs. MS, by their nature, generate large amounts of data. Although many of the MS software are carefully designed to achieve maximum computational performance in simulation, they seriously fall short on storage and handling of the large scale data output. The objective of the proposed research is to use database technologies to improve the efficiency, ease of maintenance, and security of MS data analysis. We propose to accomplish this by developing novel data structures and query processing algorithms in the kernel of the database management system (DBMS), in addition to leveraging the advantages of such systems in their current forms. Based on the success of above database-centric techniques, we will also develop automatic feedback control mechanisms in MS to improve the online tuning of simulations that is needed in studying many biochemical processes. The project has three specific aims: 7 Development of a Database-centric MS (DCMS) data analysis framework that stores simulation data collected from various sources, provides standard application programming interfaces (APIs) for data retrieval, and allows global data access to research community while ensuring fine data security policies. 7 Augmenting DCMS with novel data structures and algorithms for efficient data retrieval and query processing. We focus on creative indexing and data organization techniques, and query processing and optimization strategies. 7 Integration of DCMS and steering-based MS programs into one unified simulation framework that can greatly improve the efficiency of the MS process. This framework will be demonstrated as part of the efforts to solve real biomedical problems. We believe DCMS will produce a revolutionary high throughput technique for MS researchers and accelerate the discovery process in medical research. Such innovations will bring significant intellectual merit from which both the biomedical and database management communities will benefit.

Public Health Relevance

We propose a novel Database-centric Molecular Simulation (DCMS) framework that can connect to high-efficiency computational power of existing molecular simulation (MS) software and augment it with strong points of database systems for post-simulation data storage and analysis. It also provides significant improvement on the efficiency of the MS process itself. Such technologies will produce a high throughput platform for MS researchers to study the structure, dynamics, and thermodynamics of biomolecules. This will greatly impact many fields of medical research such as discovery of new medicines, and pathology/diagnosis of genetic diseases. As part of the project, the developed tech- niques will be used to study the structures of biolipid systems and Type I collagen fibrils.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM086707-03
Application #
8259152
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Lyster, Peter
Project Start
2010-04-15
Project End
2015-03-31
Budget Start
2012-04-01
Budget End
2013-03-31
Support Year
3
Fiscal Year
2012
Total Cost
$173,300
Indirect Cost
$44,602
Name
University of South Florida
Department
Biostatistics & Other Math Sci
Type
Schools of Engineering
DUNS #
069687242
City
Tampa
State
FL
Country
United States
Zip Code
33612
Kumar, Anand; Grupcev, Vladimir; Yuan, Yongke et al. (2014) Computing Spatial Distance Histograms for Large Scientific Datasets On-the-Fly. IEEE Trans Knowl Data Eng 26:2410-2424
Fogarty, Joseph C; Chiu, See-Wing; Kirby, Peter et al. (2014) Automated optimization of water-water interaction parameters for a coarse-grained model. J Phys Chem B 118:1603-11
Wu, Xindong; Zhu, Xingquan; He, Yu et al. (2013) PMBC: pattern mining from biological sequences with wildcard constraints. Comput Biol Med 43:481-92
Bennett, C Brad; Kruczek, James; Rabson, D A et al. (2013) The effect of cross-link distributions in axially-ordered, cross-linked networks. J Phys Condens Matter 25:285101
Metcalf, Rainer; Pandit, Sagar A (2012) Mixing properties of sphingomyelin ceramide bilayers: a simulation study. J Phys Chem B 116:4500-9
Chen, Shaoping; Tu, Yi-Cheng; Xia, Yuni (2011) Performance analysis of a dual-tree algorithm for computing spatial distance histograms. VLDB J 20:471-494
Tumaneng, Paul W; Pandit, Sagar A; Zhao, Guijun et al. (2011) Self-consistent mean-field model for palmitoyloleoylphosphatidylcholine-palmitoyl sphingomyelin-cholesterol lipid bilayers. Phys Rev E Stat Nonlin Soft Matter Phys 83:031925