The widespread use of RNA sequencing technology over the past decade has allowed scientists to discover a far larger and richer repertoire of genes and transcripts encoded by the human genome than were known just a decade ago. At least 90% of human genes have multiple isoforms, including splicing variants, alternative sites of transcription initiation and termination, exon skipping events, and more. The number of human transcripts in standard gene databases has grown enormously, from ~40,000 in the late 2000s to over 200,000 today, but it is still likely far from complete. Our previous work using exon-exon splice junctions and other fragmentary transcripts has demonstrated the clinical relevance of unannotated but expressed genes in the human brain, including associations with schizophrenia and its genetic risk. This project will attempt to discover and characterize novel gene isoforms collected from both healthy and diseased brains, using the latest computational methods for transcriptome assembly and an extensive collection of brain RNA-seq datasets. The project is organized into three aims: first, we will develop new algorithms designed to assemble RNA-seq data from samples that have been sequenced using ribosomal RNA depletion, a technique that is widely used in human brain studies but that is not used in most other RNA-seq experiments, which instead use polyA+ enrichment. We will implement these methods as extensions to the HISAT and StringTie systems for RNA-seq alignment and assembly, both of which were developed in the PI's and co-PI's labs. We will then apply these improved methods to thousands of publicly available RNA-seq samples from human brain tissue to create a new CHESS-BRAIN (Comprehensive Human Expressed Sequences in Brain) gene annotation database. This effort will also determine which transcripts are tissue-specific and brain-region specific; i.e., expressed at significantly higher or lower levels in brain tissues and in various brain regions as compared to other tissues. In the second aim, we will use these methods to quantify gene expression levels in hundreds of post-mortem brain RNA-seq samples from subjects diagnosed with schizophrenia (SCZD), major depression (MDD), bipolar disorder (BPD), autism spectrum disorder (ASD), and post-traumatic stress disorder (PTSD), whom we will compare to matched controls to identify the contribution of unannotated transcription in these disorders. In our third aim we will perform expression quantitative trait loci (eQTL) mapping across the entire CHESS-brain dataset, both within and across brain regions and diagnoses, to identify genetic regulation of unannotated transcripts, including both coding and noncoding transcripts. This analysis will identify genes and transcripts whose expression levels change significantly in different tissues and diseases. We will combine these results to identify novel transcripts associated with genetic risk for each of the psychiatric disorders.

Public Health Relevance

RNA sequencing experiments focused on brain disorders have generated enormous datasets that are being studied to determine which genetic variants are associated with diseases such as schizophrenia, depression, bipolar disorder, and autism spectrum disorder. Most analysis to date has relied on already-known gene isoforms, but recent work has shown that many human gene isoforms are not yet captured in standard gene databases. This proposal will create novel software and databases for better understanding the role of unannotated transcription in the human brain and its potentially important role in serious brain disorders.

Agency
National Institute of Health (NIH)
Institute
National Institute of Mental Health (NIMH)
Type
Research Project (R01)
Project #
1R01MH123567-01A1
Application #
10205617
Study Section
Behavioral Genetics and Epidemiology Study Section (BGES)
Program Officer
Arguello, Alexander
Project Start
2021-03-02
Project End
2025-12-31
Budget Start
2021-03-02
Budget End
2021-12-31
Support Year
1
Fiscal Year
2021
Total Cost
Indirect Cost
Name
Johns Hopkins University
Department
Biomedical Engineering
Type
Schools of Medicine
DUNS #
001910777
City
Baltimore
State
MD
Country
United States
Zip Code
21218