The hepatitis C virus (HCV) infects approximately 4 million people in the U.S. The high mutation rate of HCV results in vast numbers of new genetic sequences and associated biological data in the daily conduct of laboratory research and clinical trials with attendant serious data management problems. Investigators currently rely upon homespun databases, generic software products, and tools from public web repositories to sort, organize and analyze their genomic and biological data. These tools are not tailored to the HCV genome, and moving data from one program to the next is labor intensive and vulnerable to error. We are developing a desktop software product tailored for the rapid, efficient and flexible management of HCV data. The product consists of graphical-user interface (GUI) tools and a data-storage and retrieval system that are both designed specifically for HCV analysis. It also includes a commercial relational data base engine. The most notable technical innovation is our annotation tool which simplifies the capture, storage and management of crucial experimental data points, and brings these user defined data points (annotations) into the same searchable context as those that are inherently systemic and structured. Other innovations include our alignment, phylogenetics and mutation analysis tools that are specifically tailored to the mathematics of the HCV replication rate and its error-prone polymerase. In preliminary work we designed, built and successfully unit- tested a prototype software system. The software architecture for that product consists of 3 tiers;presentation (GUI), middleware (Domain), and a relational database management system (RDBMS). In Phase I we will develop an alignment tool which will be linked to the existent query tool and include a contig assembler (aim I) for analyzing complete and partial genomic sequences. We will also develop a phylogeny tool for assembling alignments into evolutionary trees that will color-code and time-stamp the input sequences (aim II), and a graphics tool that will present the raw electropherogram data (traces), and assemble line and bar graphs to plot up to two variables (aim III). In Phase II we will develop additional tools for mutation tracking, report generation and entropy measurement, and we will develop statistical routines and security and installation packages.
Specific Aims :
Aim I. Develop an alignment tool;
Aim II Develop a phylogeny tool.
Aim III. Develop a graphics tool;
Aim I V. Unit test the software application. We are seeking funds to build our product, addressing a scientific need with a marketable bioinformatics approach. In this way, we will merge informatics with basic research for rapid discovery. We believe that our disease-specific software products will aid in the rapidly developing market of HCV research. The result will be software that greatly improves analysis capabilities and reduces data processing time. These goals fall well within the scope of the NIH to promote basic research in the field of bioinformatics and information sciences, and could lead to enormous public benefit.
The hepatitis C virus is difficult to study and not effectively treated with anti-viral drugs, with fewer than 50% responding favorably to the current therapies. Efficacious options are years away. A major problem that investigators face is the rapid mutation rate of the virus and the related difficult data management problems that result from this rapid mutation rate. We are developing a powerful software product that will make it easier for scientists to overcome these data management problems. Moreover, our design will streamline the serious bottleneck of data management, significantly compressing the time between data collection and cure discovery.