This subproject is one of many research subprojects utilizing theresources provided by a Center grant funded by NIH/NCRR. The subproject andinvestigator (PI) may have received primary funding from another NIH source,and thus could be represented in other CRISP entries. The institution listed isfor the Center, which is not necessarily the institution for the investigator.Machine Learning is the subfield of computer science that addresses the issues of programs that are able to improve their performance with experience on a task and to find patterns in data. In the 2007-08 grant year, we have continued work on GAMI, an approach to motif inference that uses a genetic algorithms (GA) search. In this work, we are using a computational approach to search for regions in non-coding DNA sequences that may affect gene function, with the hypothesis that patterns (motifs) that have been conserved across evolutionary time are more likely to be functional. GAMI identifies the motifs that are most strongly represented in the data, so that these may be studied by researchers in the lab to assess functionality.We have been working with several genes, including ABCC7, the cystic fibrosis transmembrane conductance regulator (CFTR), finding many highly conserved patterns that merit additional study. We have also assessed the ability of our scoring metric to capture highly conserved regions, and have demonstrated that it outperforms the metric typically used in the literature for motif inference. We have also ascertained that motifs identified by GAMI correlate with known functional regions cataloged in the TRANSFAC database indicating further promise for GAMI as an approach for identifying functional regions.To date, we have demonstrated GAMI to be an effective tool for searching large datasets (long sequence lengths and possibly many sequences) of divergent species. The system has been validated for small problems, finding known TFBS referenced in other published work and finding the best motifs identified by exhaustive search. We have also compared the CFTR motifs found by GAMI to the full human genome and to known TFBSs. These motifs are non-promiscuous; some of these motifs represent known TFBS for other genes while some of these motifs may represent novel discoveries.
Showing the most recent 10 out of 246 publications