The hepatitis C virus (HCV) infects approximately 4 million people in the U.S, and 170 million people worldwide. The high mutation rate of HCV results in vast numbers of new genetic sequences and associated biological data in the daily conduct of laboratory research and clinical trials with attendant serious data management problems. Investigators currently rely upon in-house developed databases, generic software products, and tools from public web repositories to sort, organize and analyze their genomic and biological data. These tools are not tailored to the HCV genome, and moving data from one program to the next is labor intensive and vulnerable to error. We are developing a web-based (vs. local server only) software product tailored for the rapid, efficient and flexible management of HCV data. The product consists of graphical-user interface (GUI) tools and a data-storage and retrieval system that are both designed specifically for HCV analysis. It also includes a commercial relational data base engine. The most notable technical innovation is our annotation tool which simplifies the capture, storage and management of crucial experimental data points, and brings these user defined data points (annotations) into the same searchable context as those that are inherently systemic and structured. Other innovations include our alignment, phylogenetics and mutation analysis tools that are specifically tailored to the mathematics of the HCV replication rate and its error-prone polymerase. In preliminary and Phase I work we designed, built and successfully unit tested a prototype software system, consisting of 3 tiers;presentation (GUI), middleware (Domain), and a relational database management system (RDBMS), and including tools for conducting HCV- tailored alignments and contig assemblies that are linked to a highly flexible query tool, as well as tools for assembling and viewing phylogenic trees and for producing graphics tool that present the raw electropherogram data (traces), and assemble line and bar graphs to plot up to two variables. In this Phase II work, we will develop additional tools for mutation tracking, report generation and entropy measurement, and we will develop statistical routines and security and installation packages.
Specific Aims :
Aim I. Transition the software platform to cloud computing and hosting environment;
Aim II. Develop a suite of tools for conducting full mutation analysis;
Aim III. Develop statistical routines;
Aim I V. Unit test the software application. We are seeking funds to build our product, addressing a scientific need with a marketable bioinformatics approach. In this way, we will merge informatics with basic research for rapid discovery. We believe that our disease-specific software products will aid in the rapidly developing market of HCV research. The result will be software that greatly improves analysis capabilities and reduces data processing time. These goals fall well within the scope of the NIH to promote basic research in the field of bioinformatics and information sciences, and could lead to enormous public benefit.

Public Health Relevance

The hepatitis C virus (HCV) is difficult to study and not effectively treated with the current anti- viral drug combination. Effective treatment options are years away. A major problem that HCV investigators must contend with is the rapid mutation rate of the viral genes, which creates a need to test patients continuously and a massive data accumulation problem. We are developing a powerful, game-changing software application that will make it easier for scientists to overcome these problems and focus on treatment and cure discovery.

National Institute of Health (NIH)
National Institute of Allergy and Infectious Diseases (NIAID)
Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-HDM-R (11))
Program Officer
Koshy, Rajen
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Gataca, LLC
United States
Zip Code