Investigation of Cloud Computing to Support Data-Parallel Health Research

Fox, Geoffrey

Abstract

We will form a multidisciplinary team of Indiana University computer scientists, biologists, and bioinformaticians to develop and deploy new large-scale computing infrastructure and tools that will enable fundamental health research. Our research will investigate the impact of Cloud computing architectures on large-scale computational biology, particularly widely encountered, """"""""data parallel"""""""" problems including but not limited to DNA sequence analysis. GO funds will be used to establish the new field of Cloud-based computational life science. Cloud computing is currently typified by Amazon Web Services, Microsoft Azure, and other commercial efforts. However, many universities (including Indiana University) are in the process of establishing research Cloud deployments that will address two general problems: Infrastructure: Clouds provide simple Web service programming interfaces that allows scientists to create computing clusters and use highly reliable data storage. That is, Clouds provide a way to outsource computing infrastructure. Runtimes: Cloud systems are particularly appropriate for running large-scale information retrieval problems. These data-parallel problems involve pipelines of replicated, sequential commands that process very large data sets divided into many pieces. Example technologies include Microsoft Dryad and Apache Hadoop. In this proposal, we will partner with Microsoft Research, which is currently converting Dryad from a research project to a robust tool. We have analyzed a wide variety of health research problems and have shown that they can benefit from Cloud infrastructure and runtimes. Clouds provide research groups with a way to outsource computing, storage, and networking and to achieve high performance on data-parallel problems in health research. Our team's research efforts (many NIH funded) represent a wide range of applications, including a) sequence-based transcriptome profiling, b) genome re-sequencing for mutation mapping, c) metagenomics analysis, d) genome annotation, e) comparative genomics, and f) population genomics h) advanced parallel datamining in patient health records. Processing large-scale data is the common problem uniting these efforts.

Public Health Relevance

We propose to investigate and develop a unique Cloud computing research infrastructure that will have a very large impact on several different life science research areas. Our focus is on the large-scale, data-parallel analysis problems that result from the deluge of data from short-read gene sequencing devices and other sources. We will develop and demonstrate our infrastructure in collaboration with several existing biological and biomedical projects.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: High Impact Research and Research Infrastructure Programs (RC2)
Project #: 1RC2HG005806-01
Application #: 7852166
Study Section: Special Emphasis Panel (ZRG1-GGG-A (99))
Program Officer: Bonazzi, Vivien

Project Start: 2009-09-30
Project End: 2011-08-31
Budget Start: 2009-09-30
Budget End: 2010-08-31
Support Year: 1
Fiscal Year: 2009
Total Cost: $735,042
Indirect Cost

Institution

Name: Indiana University Bloomington
Department: Miscellaneous
Type: Other Domestic Higher Education
DUNS #: 006046700

City: Bloomington
State: IN
Country: United States
Zip Code: 47401

Related projects


NIH 2010 RC2 HG	Investigation of Cloud Computing to Support Data-Parallel Health Research Fox, Geoffrey C. / Indiana University Bloomington	$756,297
NIH 2009 RC2 HG	Investigation of Cloud Computing to Support Data-Parallel Health Research Fox, Geoffrey C. / Indiana University Bloomington	$735,042

Publications

Hawlena, Hadas; Rynkiewicz, Evelyn; Toh, Evelyn et al. (2013) The arthropod, but not the vertebrate host or its environment, dictates bacterial community composition of fleas and ticks. ISME J 7:221-3

Kuehn, Joanna S; Gorden, Patrick J; Munro, Daniel et al. (2013) Bacterial community profiling of milk samples as a means to understand culture-negative bovine clinical mastitis. PLoS One 8:e61959

Hughes, Adam; Ruan, Yang; Ekanayake, Saliya et al. (2012) Interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets. BMC Bioinformatics 13 Suppl 2:S9

Wolfe, Alan J; Toh, Evelyn; Shibata, Noriko et al. (2012) Evidence of uncultivated bacteria in the adult female bladder. J Clin Microbiol 50:1376-83

Nelson, David E; Dong, Qunfeng; Van der Pol, Barbara et al. (2012) Bacterial communities of the coronal sulcus and distal urethra of adolescent males. PLoS One 7:e36298

Revanna, Kashi V; Munro, Daniel; Gao, Alvin et al. (2012) A web-based multi-genome synteny viewer for customized data. BMC Bioinformatics 13:190

Dong, Qunfeng; Nelson, David E; Toh, Evelyn et al. (2011) The microbial communities in male first catch urine are highly similar to those in paired urethral swab specimens. PLoS One 6:e19709

Revanna, Kashi V; Chiu, Chi-Chen; Bierschank, Ezekiel et al. (2011) GSV: a web-based genome synteny viewer for customized data. BMC Bioinformatics 12:316

Qiu, Judy; Ekanayake, Jaliya; Gunarathne, Thilina et al. (2010) Hybrid cloud and cluster computing paradigms for life science applications. BMC Bioinformatics 11 Suppl 12:S3

Comments

Be the first to comment on Geoffrey Fox's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: