Statistical and Computational Methods for Molecular Biology and Biomedicine

Munson, Peter

Abstract

Genome projects are producing vast amounts of novel, uncharacterized sequence data which needs interpretation. Protein primary sequence can often be determined far in advance of tertiary structure or function. We are developing and testing automated techniques for assigning protein structure to novel, uncharacterized sequence, a technique called fold-recognition. Previously, we had developed new techniques for protein secondary structure prediction. Recently, we have incorporated the results of such predictions into a hidden Markov models (HMMs) based approach to protein fold recognition, called FORESST. The power of such statistical techniques can only be assessed in carefully controlled retrospective statistical studies or in prospective trials. Disseminating statistical and computational tools for very specialized analyses has been a time consuming and costly process, and has not been well supported in the profit-driven commercial sector. Web technology is now making it possible for developers in the non-profit sector to efficiently distribute and support such analytical tools. Cost of software distribution over the Web is now insignificant, which allows developers in this area to devote more time to methods development. Furthermore, Web clients can provide for a platform-independent user interface to computational tools running on central servers. A strategy which is now supplanting software distribution is to provide servers tuned to specific computational needs, for use by a working group. The working group need not be a single laboratory, but may efficiently cut across organization and even institutional bounds. We are developing both software distribution and software server approaches to solving a number of computational problems common to physiology, pharmacology, endocrinology and molecular biology. We are also using web-technology to provide a common interface for programs used by our own Section.Progress in FY98: The FORESST method was tested extensively with existing protein families using a cross-validated analysis and a larger set of models of protein fold families. This study compared the method to local sequence similarity, sequence-motif recognition using HMMs and global sequence recognition using HMMs. These four methods were compared on the problem of recognizing distantly homologous proteins or protein folds, which is a critical problem facing genome annotation efforts today. Results showed that a method incorporating secondary structure propensity, (FORESST) outperformed purely sequence-based methods for the most difficult remote homology detection problems, whereas local sequence homology was generally the most powerful for moderate to close homolog detection.ABS staff also collaborated in several projects with NIH Intramural investigators, to investigate, using bioinformatics and structure prediction tools, particular sequences or protein families of interest. Techniques include secondary structure prediction, fold assignments, determining sequence-structure relationships using multiple sequence alignments, homology modeling, motif analysis and database searching. ABS staff also provided statistical advice and collaboration in areas of ligand binding data analysis, dose-response curve analysis, repeated-measures ANOVA and MANOVA and in one project, analysis of endocrine time series.The ABS structure prediction team is also engaged in CASP3 (Critical Assessment of Structure Prediction 3), an international competition which seeks to evaluate structure prediction algorithms. Our entries include secondary structure prediction and protein fold recognition on a variety of newly solved but unpublished protein structures. Results of this competition will be announced in December 1998.A series of programs developed in this Section over the past two decades had previously been distributed by a private service which mailed out diskettes and paper documentation. We have deployed a web-based software download site which now makes PC and Macintosh versions of several programs (LIGAND, ALLFIT, FLEXIFIT and PULSEFIT) available to any user equipped with a web-browser. The software download site, developed by ABS personnel, keeps track of users' names and addresses in a log. Documentation, previously distributed on paper, was scanned and made available for download using Adobe Acrobat software.The ABS web server also provides a number of unique sequence analysis services, which were augmented to include secondary structure prediction by the GOR4 algorithm, multiple sequence alignment by CLUSTALW and MUSEQAL, and various reformatting services. A """"""""BLAST to FASTA"""""""" conversion service was upgraded and all services offered were upgraded to allow for file uploads in addition to paste-in text boxes. A new service to produce formatted files for the CASP3 prediction contest was also implemented.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: Center for Information Technology (CIT)
Type: Intramural Research (Z01)
Project #: 1Z01CT000227-08
Application #: 6103833
Study Section: Special Emphasis Panel (MSCL)

Project Start
Project End
Budget Start
Budget End
Support Year: 8
Fiscal Year: 1998
Total Cost
Indirect Cost

Institution

Name: Center for Information Technology
Department
Type
DUNS #

City
State
Country: United States
Zip Code

Related projects

Publications

Pajevic, Sinisa; Plenz, Dietmar (2009) Efficient network reconstruction from dynamical cascades identifies small-world topology of neuronal avalanches. PLoS Comput Biol 5:e1000271

McQueen, Philip G; McKenzie, F Ellis (2008) Host control of malaria infections: constraints on immune and erythropoeitic response kinetics. PLoS Comput Biol 4:e1000149

Coppey, Mathieu; Boettiger, Alistair N; Berezhkovskii, Alexander M et al. (2008) Nuclear trapping shapes the terminal gradient in the Drosophila embryo. Curr Biol 18:915-9

Sam, Vichetra; Tai, Chin-Hsien; Garnier, Jean et al. (2008) Towards an automatic classification of protein structural domains based on structural similarity. BMC Bioinformatics 9:74

Hendler, Richard W; Shrager, Richard I; Meuse, Curtis W (2008) The ability of actinic light to modify the bacteriorhodopsin photocycle revisited: heterogeneity vs photocooperativity. Biochemistry 47:5406-16

Williams, Ruth R E; Azuara, Veronique; Perry, Pascale et al. (2006) Neural induction promotes large-scale chromatin reorganisation of the Mash1 locus. J Cell Sci 119:132-40

Nishizuka, Satoshi; Washburn, Newell R; Munson, Peter J (2006) Evaluation method of ordinary flatbed scanners for quantitative density analysis. Biotechniques 40:442, 444, 446 passim

McQueen, Philip G; McKenzie, F Ellis (2006) Competition for red blood cells can enhance Plasmodium vivax parasitemia in mixed-species malaria infections. Am J Trop Med Hyg 75:112-25

Sam, Vichetra; Tai, Chin-Hsien; Garnier, Jean et al. (2006) ROC and confusion analysis of structure comparison methods identify the main causes of divergence from manual protein classification. BMC Bioinformatics 7:206

Knodler, Leigh A; Steele-Mortimer, Olivia (2005) The Salmonella effector PipB2 affects late endosome/lysosome distribution to mediate Sif extension. Mol Biol Cell 16:4108-23

Showing the most recent 10 out of 15 publications

Comments

Be the first to comment on Peter Munson's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: