Identifying residues of importance in the protein products of genes is a challenging and important problem for informatics, genome annotation, molecular biology, biochemistry and drug discovery. Functional annotation of genes is inherently hierarchical; genes can be annotated at the level of genome sequence, transcript variant, protein product, protein domain, nucleotide or amino acid. Only a few resources annotate protein function at the level of the amino acid and language relating residue function and gene product sequence, structure and expression is challenging. To address this, I am investigating how sequence, evolutionary and structural descriptors can be used to quantify function. I am applying this knowledge to develop methods that can associate residues with known functional annotations, perform annotation transfer onto an experimentally determined or modeled protein structure, and determine the likely molecular effects of mutation, thus creating a framework for residue annotation. One of the greatest challenges for the computational biologist is identifying features (or attributes) that are useful for classification of genomic data. With this effort, we will continue our work describing novel features for classification of functional sites and we will test them using supervised machine learning tools. We will do this by, 1) testing the power of several diverse functional features for classification of catalytic residues in proteins, 2) applying these features to other important residue functional annotation problems, and 3) evaluate features based on homologous sequences. This research is important for understanding the molecular basis of diseases such as cancer and pharmacogenetics data from a molecular perspective. When completed, scientists will have a rich set of data and tools for basic health research. ? ? ? ?
Showing the most recent 10 out of 12 publications