The goal of this proposal is to comprehensively identify all sequence-based functional elements associated with transcribed sequences including both protein coding and non-protein coding sequences, characterizing gene structures including transcription start sites (TSS) polyadenylation sites and alternative transcripts detected in a representative and diverse panel of human cells and tissues. Based on the empirically determined characteristics of the detected transcripts uncovered in this proposal, a classification system for transcribed protein coding and non-protein coding portions of the human transcriptome will be established.
Our aims i nclude first to generate a comprehensive set of subcellular compartment-specific long (>200 nucleotides, nts) and short (<200 nts) polyadenylated (polyA+) and non-polyadenylated (polyA-) RNA samples from each of the cell types studied. These RNA samples will be analyzed using: a) high density tiling arrays (5 nucleotides [nt] interrogation resolution for long and short RNAs), b) sequencing (pyrosequencing [454] and clonal single molecule sequencing for short RNAs [Solexa]), c) sequenced paired-end ditags (PETs) for 5'TSS and 3'termination locations for polyA+ transcripts and d) sequenced cap analysis of gene expression (CAGE) tags for 5'TSS of polyA- RNAs. Characterization of full length subcellular compartment-specific transcripts will also be carried out using: 1) a combination of rapid amplification of cDNA ends (RACE), RT-PCR and sequencing, 2) RNA immunoprecipitation (RIP) and 3) in situ immunohistochemistry. These characterization steps will provide additional information concerning the annotated and unannotated RNAs found to be associated with known functional, compartment-specific proteins and their localization in subcellular organelles of known function. The research and health-care community are well positioned to take advantage of a detailed catalog of classified transcribed regions in the human genome. For example, the identification of millions of single nucleotide polymorphism (SNPs) and the ability to genetically alter specific transcript expression by small inhibitory (si-) and micro (mi-) RNAs are highly useful for the molecular characterization of diseases associated with the transcribed regions. However, the utility of these and other genomic resources are dependent upon having a complete and high quality catalogue of transcribed regions.
Dobin, Alexander; Gingeras, Thomas R (2015) Mapping RNA-seq Reads with STAR. Curr Protoc Bioinformatics 51:11.14.1-19 |
Pervouchine, Dmitri D (2014) IRBIS: a systematic search for conserved complementarity. RNA 20:1519-31 |
Abdelhamid, Rehab F; Plessy, Charles; Yamauchi, Yoshio et al. (2014) Multiplicity of 5' cap structures present on short RNAs. PLoS One 9:e102895 |
Dobin, Alexander; Davis, Carrie A; Schlesinger, Felix et al. (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15-21 |
Zhang, Yubo; Wong, Chee-Hong; Birnbaum, Ramon Y et al. (2013) Chromatin connectivity maps reveal dynamic promoter-enhancer long-range associations. Nature 504:306-310 |
Steijger, Tamara; Abril, Josep F; Engström, Pär G et al. (2013) Assessment of transcript reconstruction methods for RNA-seq. Nat Methods 10:1177-84 |
Engström, Pär G; Steijger, Tamara; Sipos, Botond et al. (2013) Systematic evaluation of spliced alignment programs for RNA-seq data. Nat Methods 10:1185-91 |
Schlesinger, Felix; Smith, Andrew D; Gingeras, Thomas R et al. (2013) De novo DNA demethylation and noncoding transcription define active intergenic regulatory elements. Genome Res 23:1601-14 |
Tilgner, Hagen; Knowles, David G; Johnson, Rory et al. (2012) Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs. Genome Res 22:1616-25 |
Kowalczyk, Monika S; Higgs, Douglas R; Gingeras, Thomas R (2012) Molecular biology: RNA discrimination. Nature 482:310-1 |
Showing the most recent 10 out of 40 publications