A fundamental challenge in decoding the information stored in a genome is to describe the transcripts read from it and their structure. The nematode C. elegans offers an extraordinary opportunity among eukaryotes to accomplish this goal now. The small, compact genome is completely sequenced. The simple anatomy, fixed cell lineage and transparent body through the full life span make each and every cell available for observation and analysis at any time. Already more than 1,300 noncoding RNAs and 17,000 of the estimated 21,000 protein coding genes, along with 2,500 alternative splice forms, have been fully or at least partially defined experimentally. The present proposal seeks to complete the definition of the transcribed genome of C. elegans. We will do this by assembly of all the available experimental data with a variety of gene models to define accurately the extent of the known transcribed genome. From this base, we will extend our knowledge of the transcribed genome through systematic application of genome tiling arrays across various stages and cells of the life cycle, including targeted analysis of microRNAs. In turn we will integrate this new data along with any other new data from the community with the gene models and any new models that develop. We will attempt directed confirmation of unconfirmed gene models through RT-PCR and custom arrays, starting with the initial set of gene models and adding new data as it becomes available. We will also use mass spectrometry to distinguish protein coding transcripts from noncoding transcripts for small potential open reading frames. The result will be a set of transcripts that will approach completion for protein coding genes and their UTRs and alternative splice forms as well as non-coding RNAs. The experience gained with this modest genome should be of value in interpreting more complex genomes, such as human.
Showing the most recent 10 out of 21 publications