Multiple sequence alignment (MSA) is a core element of bioinformatics, comparative genomics, phylogenetics, molecular, and structural biology. Several excellent software tools for MSA construction are heavily used by the biomedical community;however, essentially all automatically constructed MSAs require significant editing (error corrections) prior to their analysis and utilization by downstream applications. The editing and analysis process is carried out using specialized software tools - MSA editors. Although several MSA editors are currently available, none of them offer a comprehensive set of features that are essential for effective MSA manipulation, editing, analysis, and export. The goal of the proposed research is to build a commercially-viable MSA editor, termed AlignShop, which streamlines the production of high-quality MSAs and thus accelerates the derivation of biological knowledge from sequence data. The central idea is to integrate several recently developed approaches to improve MSA quality and develop advanced visualization and handling in a single tool. This will be achieved by accomplishing the following Specific Aims: 1) Develop an approach to rapidly predict secondary structure for each protein sequence in MSA. 2) Incorporate analytical capabilities to facilitate extracting higher-order knowledge encoded within MSA. 3) Develop a comprehensive, modern interface for interacting with MSA. During Phase I we will develop AlignShop into a fully-functional, robust MSA editor and analysis tool that outperforms existing free and commercial editors. AlignShop is not intended to be an MSA building tool or a comprehensive bioinformatics workbench. Rather, we are solely focusing on building the best stand-alone tool for editing, analyzing, utilizing, and publishing MSAs, which can be used with all popular MSA building programs and downstream applications. This program will be useful to the broad community of computational and experimental biomedical scientists.
The results of this research will provide scientists with the most capable and effective tool for editing and analyzing multiple sequence alignments, which are foundational for understanding the human genome, genetic diseases, properties of important microbial organisms including human pathogens and agents of infectious disease. Because of the critical role of multiple sequence alignments in virtually every area in biomedical science, this project will significantly impact and accelerate biomedical discovery.