Genome Annotation Pipeline Prototype Development

Murphy, Sean

Abstract

This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator. We propose a startup request to create TeraGrid-enabled prototypes for several genomic analysis projects. If the prototypes are successful, we will submit the projects for independent consideration as TeraGrid allocations. These projects will share sequence analysis applications such as NCBI blast and HMMER, and also datasets such as the NCBI non-redundant protein data files. The first project is to calculate PCR primers to cover the entire exome (the coding region) of the human genome. PCR primers are used to amplify a particular small sequence of interest from a genome, for example, to determine the sequence of a gene which might be involved with hypertension. By creating a database of PCR primers for the entire genome, it will be possible to automate and simplify the ability of researchers to investigate specific genes of interest from patient populations. PCR primer determination is computationally intensive because each primer must be ensured to be unique across the genome such that the resulting PCR product is sufficiently pure. The second project is a pipeline to annotate prokaryotic (bacterial) genomes. This pipeline will take as input assembled genomes and perform a number of analytical steps such as identify gene boundaries and coding regions, assign putative functions to genes using several types of computational evidence, and identify the presence or absence of complete biochemical pathways. The third project is similar to the second but will annotate eukaryotic (multicellular) genomes. Eukaryotic annotation is more complex than prokaryotic annotation and its automation involves the use of AI and machine-learning techniques. The fourth project is an annotation pipeline for metagenomic sequences. Metagenomic data is the result of sequencing DNA from complex samples such as ocean water or the human digestive tract. It typically contains fragmentary sequences from hundreds of distinct bacterial and viral species. Metagenomic analysis is useful for detecting organisms without culturing them, and also for understanding the microecology of different environments. By computationally identifying and quantifying the enzymes in a given sample, the processing of biomolecules can be better understood. One application of metagenomics is to understand the global carbon cycle in enough detail to couple long-term weather prediction with carbon sequestration modeling.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Center for Research Resources (NCRR)
Type: Biotechnology Resource Grants (P41)
Project #: 5P41RR006009-20
Application #: 8171896
Study Section: Special Emphasis Panel (ZRG1-BCMB-Q (40))

Project Start: 2010-08-01
Project End: 2013-07-31
Budget Start: 2010-08-01
Budget End: 2013-07-31
Support Year: 20
Fiscal Year: 2010
Total Cost: $1,091
Indirect Cost

Institution

Name: Carnegie-Mellon University
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 052184116

City: Pittsburgh
State: PA
Country: United States
Zip Code: 15213

Related projects

Publications

Simakov, Nikolay A; Kurnikova, Maria G (2018) Membrane Position Dependency of the pKa and Conductivity of the Protein Ion Channel. J Membr Biol 251:393-404

Yonkunas, Michael; Buddhadev, Maiti; Flores Canales, Jose C et al. (2017) Configurational Preference of the Glutamate Receptor Ligand Binding Domain Dimers. Biophys J 112:2291-2300

Hwang, Wonmuk; Lang, Matthew J; Karplus, Martin (2017) Kinesin motility is driven by subdomain dynamics. Elife 6:

Earley, Lauriel F; Powers, John M; Adachi, Kei et al. (2017) Adeno-associated Virus (AAV) Assembly-Activating Protein Is Not an Essential Requirement for Capsid Assembly of AAV Serotypes 4, 5, and 11. J Virol 91:

Subramanian, Sandeep; Chaparala, Srilakshmi; Avali, Viji et al. (2016) A pilot study on the prevalence of DNA palindromes in breast cancer genomes. BMC Med Genomics 9:73

Ramakrishnan, N; Tourdot, Richard W; Radhakrishnan, Ravi (2016) Thermodynamic free energy methods to investigate shape transitions in bilayer membranes. Int J Adv Eng Sci Appl Math 8:88-100

Zhang, Yimeng; Li, Xiong; Samonds, Jason M et al. (2016) Relating functional connectivity in V1 neural circuits and 3D natural scenes using Boltzmann machines. Vision Res 120:121-31

Lee, Wei-Chung Allen; Bonin, Vincent; Reed, Michael et al. (2016) Anatomy and function of an excitatory network in the visual cortex. Nature 532:370-4

Murty, Vishnu P; Calabro, Finnegan; Luna, Beatriz (2016) The role of experience in adolescent cognitive development: Integration of executive, memory, and mesolimbic systems. Neurosci Biobehav Rev 70:46-58

Ramakrishnan, N; Radhakrishnan, Ravi (2015) Phenomenology based multiscale models as tools to understand cell membrane and organelle morphologies. Adv Planar Lipid Bilayers Liposomes 22:129-175

Showing the most recent 10 out of 292 publications

Comments

Be the first to comment on Sean Murphy's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: