Neomorphic Software, Inc. intends to develop new algorithmic analysis methods, software implementations, and annotated biosequence data sets for the elucidation of new genes and genomic and proteomic relationships. The motivation behind this SBIR grant is to develop new methods for the improved annotation of genomic, cDNA, and protein data given some or all of these types of data in consort. Faced with the massive sequencing efforts of expressed sequence tag (EST) and genomic data in both the public and private sector, Neomorphic's goal is to derive knowledge from data through the use of new statistical analysis techniques. An SBIR Phase II research project will continue with the success of phase I in which new Hidden Markov Model (HMM) based algorithmic methods were invented for the alignment, error correction, and homology identification of ESTs and the identification of genes in genomic DNA. Phase II research will focus on the annotation of nucleic acid sequences with specific emphasis on: 1. the identification of protein motifs, domains and remote homologies that would aid in the classification of ESTsequences that include relatively high rates of indels and substitutions. and are currently unclassified and 2.the identification and functional characterization of new genes using EST and protein homology information from preliminary consensus genomic DNA obtained from low coverage shotgun sequencing. The new analysis methods will aid scientists in assimilating evidence for the precise annotation of genomic or transcriptional (cDNA) data including intron/exon boundaries, UTR regions, transcription start sites and other regulatory elements, codon structure, frame shifts, base call corrections, single nucleotide polymorphisms (SNPs), alternative splicing, putative protein prediction, and associations with homologous protein sequences, families, and motifs.
Our software will allow biotechnology and pharmaceutical companies to mine EST databases for critical.new lead targets, and as further human genomic sequence becomes available and new functional genomics platforms are developed, to fully characterize human genes involved in critical disease pathways. We will contribute substantial value-added information to both public and private biosequence databases, greatly enhancing the value of this vital data.