Development of an organ as complex as the brain depends on an intricate interplay of thousands of signaling proteins, orchestrated by an interacting web of regulatory factors. Recent ENCODE data reveal that the large tracts of so-called 'junk'DNA in introns and between genes is, in fact, actively transcribed. The functions of these non-coding transcripts, which number in the millions, are virtually unknown, although one small sib-class - the microRNAs - is receiving close attention as regulatory RNAs. One process that may be controlled by the non-coding transcripts is alternate exon use - a mechanism that adds significantly to the diversity of cellular proteins (particularly in the CNS), the regulation of which is little understood at this time. Recent advances that generate whole-transcriptome data have provided the means to systematically explore both these factors. However, the bioinformatic challenges are substantial, and the lack of comprehensive, well designed, and easily used software to manage, visualize, analyze and interpret the data will likely be the limiting factor in this field of research. We propose to develop a bioinformatics toolkit specifically to integrate whole-transcriptome data from two different technologies, the Affymetrix All Exon microarray which 1.4 million distinct transcript measurements, and RNASeq, which provides for 'digital'expression analysis of the whole transcriptome. The toolkit will also integrate epigenomic data that will be essential in the understanding of regulation of transcription. The design and development of the software will be guided by prominent scientists engaged in the study of the brain, and will be applied to sample datasets derived from neurological tissues, to ensure that the program incorporates functions and annotations relevant to this field.
While the large-scale array technologies have provided an unprecedented capability to model cellular processes in the brain, both in normal functioning and disease states, this capability is utterly dependent on the availability of complex data management, computational, statistical and informatics software tools. The utility of the next generation of arrays and sequencing technologies- which focus on critical regulation and control functions of the cell - will be stymied by an initial lack of suitable bioinformatic tools. This proposal initiates an accelerated development of an integrated software package intended to empower biologists in the application and analysis of these powerful new technologies, with broadly reaching impact at all levels of biological and clinical research, and across every discipline.