Public health research increasingly incorporates high-throughput biomedical data, opening up new areas for data-driven research. Recently, scientists have begun to realize the potential for modern biology to move 'beyond the genome'to look at the genome's complex interactions with the social and physical environments, focusing on disease etiology and the role of all cellular aspects in promoting health. In order to realize this potential our scientists have been moving from individual ad hoc studies to collaborative projects intended to scale across a broad range of disciplines. In the last five years, dramatic increases in the scale of environmental and health data acquisition, sequencing and assay technologies have coupled with increased decentralization of data generation resulting in a growing data management and analysis bottleneck. Our long term goal at the School of Public Health is to provide a seamless collaboratory environment in which it is possible to exploit the broad range of our expertise across shared datasets spanning investigations from the cell to the population. In order to achieve this aim we need to radically improve our existing shared computer data storage from its concentration on low volume, high stability, high cost, high performance with a user pays all costs model, to a tiered data storage model, subsidized by the institution, that is flexible enough to meet a broad range of requirements. We wish to: (a) co-locate genomic, genetic, environmental, epidemiological, social, and statistical data in a shared data environment;(b) apply consistent policies, access, user support, computing environments, workflows and user interfaces;( c) provide a scalable data storage resource at low cost to accommodate the rapid increase in sizes of genomic and cohort data. The effective management, storage and processing of this complex experimental data is therefore crucial and requires computational infrastructure capable of providing consistent storage and organization of primary data and derived results. With scalable, shared data storage, we will directly impact studies in complex diseases, host response to infectious diseases, pathogen diversity, nutrition, and studies of genes to environment. The Harvard School of Public Health (HSPH) is requesting funding for the deployment of a centralized, tiered high-performance data storage system to support our NIH-funded research in computational biology, genomics and biostatistics as applied to public health.

Agency
National Institute of Health (NIH)
Institute
National Center for Research Resources (NCRR)
Type
Biomedical Research Support Shared Instrumentation Grants (S10)
Project #
1S10RR031865-01
Application #
8052149
Study Section
Special Emphasis Panel (ZRG1-BST-F (30))
Program Officer
Levy, Abraham
Project Start
2011-05-01
Project End
2013-04-30
Budget Start
2011-05-01
Budget End
2013-04-30
Support Year
1
Fiscal Year
2011
Total Cost
$473,070
Indirect Cost
Name
Harvard University
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
149617367
City
Boston
State
MA
Country
United States
Zip Code
02115
Sansone, Susanna-Assunta; Rocca-Serra, Philippe; Field, Dawn et al. (2012) Toward interoperable bioscience data. Nat Genet 44:121-6
Bradley, David P; Kulstad, Roger; Racine, Natalie et al. (2012) Alterations in energy balance following exenatide administration. Appl Physiol Nutr Metab 37:893-9