Teragrid: the Encyclopedia of Life: a Novel Toolkit for Enabling High Throughpu

Miller, Mark

Abstract

This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator. The explosion in DNA sequencing capacities, wireless sensor technologies, and data storage capabilities makes it possible for biologists to generate and store vast amounts of data in the pursuit of biomedical discoveries. To realize the full potential in this approach, this new, data-driven biology requires a new cyberinfrastructure that is readily accessible to all scientists, and that facilitates the movement, storage and analysis of large amounts of data. To meet this need, a variety of novel tools and utilities must be created including: 1) automated pipelining tools that allow users to analyze vast quantities of data; 2) access to highly integrated database resources, so collected data can be instantly linked to existing knowledge; 3) a software workbench where federated data can be manipulated and visualized in a user-friendly environment; 4) access to computational resources to drive the calculations through grid computing; and 5) tools to store and share the results of individual investigations. The design philosophy of these tools must provide these tools in a mode that requires minimal resources on the part of the user: computations must be carried out on the server side, graphics must be lightweight, data must be provided in forms that allow for interoperability; and access to computational resources must be transparent: the scientist must be able to use grid computing resources with no need for an awareness of where these resources are, or even that they are grid computing resources. The Encyclopedia of Life (EOL) is a Grand Challenge project aimed at creating precisely this type of cyberinfrastructure for the Proteomics community of the 21st century. The EOL consists of three elements. The first is a software pipeline that allows the automated annotation of sequenced genomes. This pipeline consists of protein sequence and structure prediction and annotation tools that run on whole genomes, and a workflow system that maps these calculations onto distributed resources at partner institutions throughout the world. The second is a reference database; annotations derived from the pipeline are stored in a normalized reference database that is federated with seven other major biological databases, allowing direct queries across several areas of specialization. The third element of EOL is focused on use and distribution of the data: all data generated, stored, and federated by the EOL project will be presented to the user for analysis and distribution using innovative Web services-based data sharing tools, including a web browser-based encyclopedia of annotated genomes, and a virtual user notebook that allows for data storage, preservation of workflow information, and peer-to-peer data sharing. These distribution tools are implemented in alpha form or under development at the present time. The cyberinfrastructure created by the EOL project is designed with an interest in creating a significant number of generic software tools and middleware that is not confined to proteomics but can be applied across the biomedical community. These tools can be implemented directly to facilitate the exchange of information and analysis of data within any given domain in Biology, and, importantly, between domains, with a minimum of additional effort.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Center for Research Resources (NCRR)
Type: Biotechnology Resource Grants (P41)
Project #: 2P41RR006009-16A1
Application #: 7358476
Study Section: Special Emphasis Panel (ZRG1-BCMB-Q (40))

Project Start: 2006-09-30
Project End: 2007-07-31
Budget Start: 2006-09-30
Budget End: 2007-07-31
Support Year: 16
Fiscal Year: 2006
Total Cost: $1,012
Indirect Cost

Institution

Name: Carnegie-Mellon University
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 052184116

City: Pittsburgh
State: PA
Country: United States
Zip Code: 15213

Related projects

Publications

Simakov, Nikolay A; Kurnikova, Maria G (2018) Membrane Position Dependency of the pKa and Conductivity of the Protein Ion Channel. J Membr Biol 251:393-404

Yonkunas, Michael; Buddhadev, Maiti; Flores Canales, Jose C et al. (2017) Configurational Preference of the Glutamate Receptor Ligand Binding Domain Dimers. Biophys J 112:2291-2300

Hwang, Wonmuk; Lang, Matthew J; Karplus, Martin (2017) Kinesin motility is driven by subdomain dynamics. Elife 6:

Earley, Lauriel F; Powers, John M; Adachi, Kei et al. (2017) Adeno-associated Virus (AAV) Assembly-Activating Protein Is Not an Essential Requirement for Capsid Assembly of AAV Serotypes 4, 5, and 11. J Virol 91:

Murty, Vishnu P; Calabro, Finnegan; Luna, Beatriz (2016) The role of experience in adolescent cognitive development: Integration of executive, memory, and mesolimbic systems. Neurosci Biobehav Rev 70:46-58

Subramanian, Sandeep; Chaparala, Srilakshmi; Avali, Viji et al. (2016) A pilot study on the prevalence of DNA palindromes in breast cancer genomes. BMC Med Genomics 9:73

Ramakrishnan, N; Tourdot, Richard W; Radhakrishnan, Ravi (2016) Thermodynamic free energy methods to investigate shape transitions in bilayer membranes. Int J Adv Eng Sci Appl Math 8:88-100

Zhang, Yimeng; Li, Xiong; Samonds, Jason M et al. (2016) Relating functional connectivity in V1 neural circuits and 3D natural scenes using Boltzmann machines. Vision Res 120:121-31

Lee, Wei-Chung Allen; Bonin, Vincent; Reed, Michael et al. (2016) Anatomy and function of an excitatory network in the visual cortex. Nature 532:370-4

Jurkowitz, Marianne S; Patel, Aalapi; Wu, Lai-Chu et al. (2015) The YhhN protein of Legionella pneumophila is a Lysoplasmalogenase. Biochim Biophys Acta 1848:742-51

Showing the most recent 10 out of 292 publications

Comments

Be the first to comment on Mark Miller's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: