We developed a suite of four transcript prediction algorithms collectively called """"""""FEAST"""""""" (Fast Empirical Algorithms Suggesting Transcripts), which are conceptually independent of the two established classes of gene discovery algorithms, namely """"""""ab initio"""""""" and database search methods. The main goals of this proposal are (1) to develop further this independent third class of gene prediction algorithms, (2) to apply them to the dentification of novel genes in the genome, and (3) to test the hypothesis that non-coding transcripts are prevalent in the genome, and are the medium for the expression of small RNA genes and other functional genomic elements. We will extend the statistical model and develop the software towards a fully integrated gene prediction tool capable of discovering genes in genomic sequences of one species, or in multiple species simultaneously for higher precision. We will use the new tool to produce a comprehensive catalog of predicted genes. This is the genetic """"""""parts list"""""""", that is required for the construction of metabolic and regulatory models of cell function. We will correlate the transcript predictions to expression data from hybridization array technology, and validate novel genes experimentally by RT-PCR and sequencing. We identified an unusual class of genes (which we call """"""""stencil"""""""" genes) in which the exons play no other role than the production of introns as precursor material for deriving one or more functional RNA molecules, like miRNAs and snoRNAs. We will put special emphasis in obtaining a comprehensive catalog of such """"""""stencil"""""""" genes and will study computationally their prevalence, their modes of regulation and how they evolve. We expect many of the novel transcripts to be central to the genetic regulation of development, and therefore of direct importance to cancer research.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM081083-03
Application #
7668499
Study Section
Special Emphasis Panel (ZRG1-BST-E (51))
Program Officer
Lyster, Peter
Project Start
2007-08-01
Project End
2011-10-31
Budget Start
2009-08-01
Budget End
2011-10-31
Support Year
3
Fiscal Year
2009
Total Cost
$414,466
Indirect Cost
Name
Institute for Systems Biology
Department
Type
DUNS #
135646524
City
Seattle
State
WA
Country
United States
Zip Code
98109
Caballero, Juan; Smit, Arian F A; Hood, Leroy et al. (2014) Realistic artificial DNA sequences as negative controls for computational genomics. Nucleic Acids Res 42:e99
Roach, Jared C; Glusman, Gustavo; Hubley, Robert et al. (2011) Chromosomal haplotypes by genetic phasing of human families. Am J Hum Genet 89:382-97
Glusman, Gustavo; Caballero, Juan; Mauldin, Denise E et al. (2011) Kaviar: an accessible system for testing SNV novelty. Bioinformatics 27:3216-7
Roach, Jared C; Glusman, Gustavo; Smit, Arian F A et al. (2010) Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328:636-9
Dishaw, Larry J; Mueller, M Gail; Gwatney, Natasha et al. (2008) Genomic complexity of the variable region-containing chitin-binding proteins in amphioxus. BMC Genet 9:78