Gene function is known for only about half of the roughly 30 to 35 thousand human genes. There are many diverse experimental sources of data for determining gene function (e.g. sequence, expression, and proteomic data). Analysis methods for each data source have their individual strengths and are maturing, while the returns from these existing algorithms are diminishing. To expand the scope of discovery this effort brings together diverse researchers within Computer Science and Biology in order to develop and apply data mining methods that analyze multiple sources of multiple experimental data types. The goal is to discover gene networks for human and yeast genes. These methods are able to identify and support biological hypothesis that are overlooked when data from a single experimental methodology is analyzed in isolation. Discovery of gene networks for human and yeast genes promises to address such grand challenge problems as determining the fundamental organization of genes in the cell and creating a theoretical framework for interpreting high-throughput biological data, moving ultimately towards predictive theoretical models of biology and understanding disease at the cellular level.
The project will also help establish a broad center of excellence in computational biology at the University of Texas. In addition to the dissemination of new algorithms through the project Web site {http://bioinformatics.icmb.utexas.edu} and scientific publications, newly derived gene functions will be submitted to public biological databases such as BIND (Biomolecular Interaction Network Database) and DIP (Database of Interaction Proteins).