Application of High-Performance Computing to the Reconstruction, Analysis, and

Stevens, Rick

Abstract

This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator. The field of biology is undergoing a fundamental shift from a data poor field to a data rich field thanks to the advent of numerous new high-throughput technologies for the collection of experimental data: Shotgun sequencing, Pyrosequencing, Microarrays, ChIP-chip, Biolog phenotyping arrays, microfluidic devices, and flow cytometry. Today simulation is struggling to keep pace with data collection in many areas, and nowhere is this more evident than in genome-sequencing versus genome-scale modeling. While over 800 prokaryotic genomes have been sequenced in the past ten years, only 30 genome-scale metabolic models have been published, and the pace of genome-sequencing continues to increase threatening to extend this already massive gap. Today, the paradigm of the genome-scale metabolic modeling community is that it requires a year or more of manual effort to produce a new model of a microorganism. However, technologies have emerged in recent years that make it possible to automate or expedite various steps of the genome-scale reconstruction process, and we have recently tied these technologies together into an automated genome-scale model reconstruction pipeline. While this pipeline makes it possible to construct a single model in one to five days, extensive computation is required in this reconstruction process. We are proposing to use the computational resources in the TerraGrid to apply this reconstruction process to build new genome-scale metabolic models for every prokaryote with a completely sequenced genome. We then plan to use these models in a number of high-impact scientific studies including: (1) simulating the knockout of every metabolic gene to study the robustness of the metabolic networks of these organisms and identify new potential targets for future antibacterial drug development, (2) simulating growth of each microorganism in a variety of chemical conditions to identify the environments in which various communities of microorganisms are capable of surviving, (3) predicting the minimal defined media conditions that are required in order to culture each modeled organism, and (4) simulating the engineering of each modeled organism to produce organic compounds of industrial value from a variety of renewable raw materials. While these studies produce very different results and appeal to fundamentally different application areas, they all can be accomplished by applying the Flux Balance Analysis method to the genome-scale metabolic models we will be constructing. The primary algorithm we will be using in the proposed reconstruction and analysis of genome-scale models is flux balance analysis. The most significant computation involved in this algorithm is the solving of a linear or mixed integer linear optimization problem. Fortunately, numerous open source software is available for solving linear and mixed integer linear optimization problems. We will be applying the GLPK, SCIP, and BCP solvers along with our own custom built MPI-ready FBA software to perform all of the proposed reconstruction and analysis calculations. In total, we expect that 104 distinct mixed integer linear optimization problems and 1012 distinct linear optimization problems will need to be solved in the first year of this project, requiring a total of 1.5 million CPU hours. In the two following years, we anticipate an equal number of simulations will be required due to the release of additional sequenced organism and the update of the annotations in the existing organisms.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Center for Research Resources (NCRR)
Type: Biotechnology Resource Grants (P41)
Project #: 5P41RR006009-20
Application #: 8171929
Study Section: Special Emphasis Panel (ZRG1-BCMB-Q (40))

Project Start: 2010-08-01
Project End: 2013-07-31
Budget Start: 2010-08-01
Budget End: 2013-07-31
Support Year: 20
Fiscal Year: 2010
Total Cost: $1,091
Indirect Cost

Institution

Name: Carnegie-Mellon University
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 052184116

City: Pittsburgh
State: PA
Country: United States
Zip Code: 15213

Related projects

Publications

Simakov, Nikolay A; Kurnikova, Maria G (2018) Membrane Position Dependency of the pKa and Conductivity of the Protein Ion Channel. J Membr Biol 251:393-404

Yonkunas, Michael; Buddhadev, Maiti; Flores Canales, Jose C et al. (2017) Configurational Preference of the Glutamate Receptor Ligand Binding Domain Dimers. Biophys J 112:2291-2300

Hwang, Wonmuk; Lang, Matthew J; Karplus, Martin (2017) Kinesin motility is driven by subdomain dynamics. Elife 6:

Earley, Lauriel F; Powers, John M; Adachi, Kei et al. (2017) Adeno-associated Virus (AAV) Assembly-Activating Protein Is Not an Essential Requirement for Capsid Assembly of AAV Serotypes 4, 5, and 11. J Virol 91:

Subramanian, Sandeep; Chaparala, Srilakshmi; Avali, Viji et al. (2016) A pilot study on the prevalence of DNA palindromes in breast cancer genomes. BMC Med Genomics 9:73

Ramakrishnan, N; Tourdot, Richard W; Radhakrishnan, Ravi (2016) Thermodynamic free energy methods to investigate shape transitions in bilayer membranes. Int J Adv Eng Sci Appl Math 8:88-100

Zhang, Yimeng; Li, Xiong; Samonds, Jason M et al. (2016) Relating functional connectivity in V1 neural circuits and 3D natural scenes using Boltzmann machines. Vision Res 120:121-31

Lee, Wei-Chung Allen; Bonin, Vincent; Reed, Michael et al. (2016) Anatomy and function of an excitatory network in the visual cortex. Nature 532:370-4

Murty, Vishnu P; Calabro, Finnegan; Luna, Beatriz (2016) The role of experience in adolescent cognitive development: Integration of executive, memory, and mesolimbic systems. Neurosci Biobehav Rev 70:46-58

Jurkowitz, Marianne S; Patel, Aalapi; Wu, Lai-Chu et al. (2015) The YhhN protein of Legionella pneumophila is a Lysoplasmalogenase. Biochim Biophys Acta 1848:742-51

Showing the most recent 10 out of 292 publications

Comments

Be the first to comment on Rick Stevens's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: