A principal goal of the NHGRI is to develop methods for comprehensively identifying functional elements in genome sequences, in order to establish genomic parts lists as foundations for large-scale biology. The long-term objective of the research program described in this proposal is to develop new computational approaches for identifying genomic features +using probabilistic modeling methods. This proposal focuses specifically on identifying and characterizing the numerous genes that produce structural, regulatory, and catalytic RNAs. Current methodology is not yet up to the task of systematic enumeration of the RNA genes in any genome. Noncoding RNAs pose interesting challenges for computational sequence analysis, and motivate approaches substantially different from standard primary sequence alignment methods. The proposed methods use comparative sequence analysis and a class of probabilistic models called stochastic context free grammars (SCFGs), which are well suited to modeling the evolutionary conservation of both RNA secondary structure and RNA sequence.
Five specific aims are proposed. The human genome and the genomes of two major model animal systems, the worm Caenorhabditis and the fly Drosophila, will be screened computationally for new RNA genes using comparative genome sequence information and an SCFG-based structural RNA genefinding program, QRNA.
Three aims propose improvements in the speed of SCFG-based RNA structural homology searches: an a priori banded dynamic programming alignment method, extension of the BLAST algorithm to RNA structure alignment, and a constrained Sankoff algorithm for simultaneous alignment and folding of two homologous RNAs.
A final aim proposes a method for identifying the mRNA targets of regulatory RNAs (such as the newly discovered micro RNAs) by comparative genome analysis.
Showing the most recent 10 out of 22 publications