This subproject is one of many research subprojects utilizing theresources provided by a Center grant funded by NIH/NCRR. The subproject andinvestigator (PI) may have received primary funding from another NIH source,and thus could be represented in other CRISP entries. The institution listed isfor the Center, which is not necessarily the institution for the investigator.The project focus is to develop a family of supervised learning techniques for promoter prediction. Publicly available databases and experimentally observed promoter information on a small subset of genes in the human genome are used as training information. This information is used predict promoter areas in test genes with no current promoter information. The training information is represented using features related to the presence or absence of core promoter elements in promoter regions expressed in binary form or fuzzy data. The goal of the project is to use this training information to filter out any redundant information and express the promoter information in the form of non-reducible descriptors (NRDs). Once the NRDs for the known genes have been determined, they may be used as templates so that predictions on unclassified genes may be carried out. Such techniques have been used previously in other fields and are known to yield high accuracy classification. Variations of these techniques are suitable for several areas of research within bioinformatics including gene function prediction and cancer classification.
Showing the most recent 10 out of 120 publications