Genome projects are producing vast amounts of novel, uncharacterized sequence data which needs interpretation. Protein primary sequence can often be determined far in advance of tertiary structure or function. We are developing and testing automated techniques for assigning protein structure to novel, uncharacterized sequence, a technique called fold-recognition. Previously, we had developed new techniques for protein secondary structure prediction. Recently, we have incorporated the results of such predictions into a hidden Markov models (HMMs) based approach to protein fold recognition, called FORESST. The power of such statistical techniques can only be assessed in carefully controlled retrospective statistical studies or in prospective trials. Disseminating statistical and computational tools for very specialized analyses has been a time consuming and costly process, and has not been well supported in the profit-driven commercial sector. Web technology is now making it possible for developers in the non-profit sector to efficiently distribute and support such analytical tools. Cost of software distribution over the Web is now insignificant, which allows developers in this area to devote more time to methods development. Furthermore, Web clients can provide for a platform-independent user interface to computational tools running on central servers. A strategy which is now supplanting software distribution is to provide servers tuned to specific computational needs, for use by a working group. The working group need not be a single laboratory, but may efficiently cut across organization and even institutional bounds. We are developing both software distribution and software server approaches to solving a number of computational problems common to physiology, pharmacology, endocrinology and molecular biology. We are also using web-technology to provide a common interface for programs used by our own Section.Progress in FY98: The FORESST method was tested extensively with existing protein families using a cross-validated analysis and a larger set of models of protein fold families. This study compared the method to local sequence similarity, sequence-motif recognition using HMMs and global sequence recognition using HMMs. These four methods were compared on the problem of recognizing distantly homologous proteins or protein folds, which is a critical problem facing genome annotation efforts today. Results showed that a method incorporating secondary structure propensity, (FORESST) outperformed purely sequence-based methods for the most difficult remote homology detection problems, whereas local sequence homology was generally the most powerful for moderate to close homolog detection.ABS staff also collaborated in several projects with NIH Intramural investigators, to investigate, using bioinformatics and structure prediction tools, particular sequences or protein families of interest. Techniques include secondary structure prediction, fold assignments, determining sequence-structure relationships using multiple sequence alignments, homology modeling, motif analysis and database searching. ABS staff also provided statistical advice and collaboration in areas of ligand binding data analysis, dose-response curve analysis, repeated-measures ANOVA and MANOVA and in one project, analysis of endocrine time series.The ABS structure prediction team is also engaged in CASP3 (Critical Assessment of Structure Prediction 3), an international competition which seeks to evaluate structure prediction algorithms. Our entries include secondary structure prediction and protein fold recognition on a variety of newly solved but unpublished protein structures. Results of this competition will be announced in December 1998.A series of programs developed in this Section over the past two decades had previously been distributed by a private service which mailed out diskettes and paper documentation. We have deployed a web-based software download site which now makes PC and Macintosh versions of several programs (LIGAND, ALLFIT, FLEXIFIT and PULSEFIT) available to any user equipped with a web-browser. The software download site, developed by ABS personnel, keeps track of users' names and addresses in a log. Documentation, previously distributed on paper, was scanned and made available for download using Adobe Acrobat software.The ABS web server also provides a number of unique sequence analysis services, which were augmented to include secondary structure prediction by the GOR4 algorithm, multiple sequence alignment by CLUSTALW and MUSEQAL, and various reformatting services. A """"""""BLAST to FASTA"""""""" conversion service was upgraded and all services offered were upgraded to allow for file uploads in addition to paste-in text boxes. A new service to produce formatted files for the CASP3 prediction contest was also implemented.
Showing the most recent 10 out of 15 publications