A substantial commercial potential exists for software tools that allow a biomedical research scientist to use genomic data to form experimentally testable hypotheses. These will be used to exploit genomic sequence data to understand the aetiology of disease, to improve diagnostic tools, and to develop more effective therapies. The Master Catalog, a commercial product developed jointly by EraGen Biosciences and the Benner laboratory at the University of Florida, provides a convenient framework for implementing heuristics that do this. The Master Catalog is a naturally organized database that contains evolutionary trees, multiple sequence alignments, and reconstructed evolutionary intermediates for all of the proteins in the GenBank database. The Benner laboratory has developed and anecdotally tested heuristics that date events in the molecular history, provide evidence for and against functional recruitment within a protein family, detect distant homologs, associate individual residues important for functional changes with a crystal structure, find metabolic and regulatory pathways, and correlate events in the molecular record with the history of life on Earth. This Phase I proposal seeks to validate a set of these heuristics more broadly to determine their suitability for database-wide application. In Phase II, we will implement these within the Master Catalog, and launch a commercial bioinformatics product to support functional analysis of genomic databases.
In its present version, the Master Catalog is a successful commercial product within a niche: """"""""best in class"""""""" of bioinformatics databases. Adding a validated set of heuristics for extracting functional information from genome databases will make it the software of choice for most functional genomics work, and be a central tool in the pharmaceutical/ biotechnology industries. Academic versions and student versions will find markets in most universities.