In the years since the completion of the human genome project, our ability to read the genome has improved tremendously. Frustratingly, our ability to interpret and comprehend what we can now easily read has lagged behind. Genetic variants that lie outside protein-coding regions are particularly challenging to interpret because the rules that govern their regulatory function are far less understood than the principles of protein- coding sequence. Addressing this challenge is particularly urgent because the drastic fall in whole genome sequencing costs will bring with it a wave of newly discovered non-coding and potentially causal variants. The recent completion of several genome-scale projects, and the release of pilot data from new and ongoing projects, have made it possible to begin building models whose goal is to predict the function of non-coding genetic variation. We propose the development of a brain-centric variant annotation framework that integrates temporal and spatial expression information from existing data sets, regulatory relationships established by eQTL studies, and chromatin state information uncovered by ENCODE and other studies, with the aim of providing, for any arbitrary input variant, an estimate of the magnitude of the effect, the systems or tissues most likely affected by the variant, and the stage of development at which the variant is most likely to produce a phenotype. Models will be trained on variants from whole genome sequencing studies of diagnosed and undiagnosed individuals. Development of this framework will proceed in three stages: 1) the above lines of genomic evidence will be combined with other features as predictors of enrichment for variants identified in individuals with diagnosed neuropsychiatric conditions, producing a score indicative of the variant's phenotype-shaping potential; 2) spatiotemporal gene expression matrices will be integrated to provide estimates of the tissues and time points most likely affected by variation at the non- coding query locus; 3) by combining the estimates produced in stages 1 and 2, we will create a single weighted context matrix that represents the individual's aggregate regulatory variant burden in space (i.e. brain tissue/region) and time. The framework will be demonstrated on previously unpublished variants in autism and bipolar disorder. The proposed framework would to our knowledge be the first non-coding variant annotation system that focuses on the effect on the brain, and is able to guide the user as to when and where the effects of potentially functional variants are likely to emerge in an individual. A further novel aspect of the proposed system is that it will provide an integrated estimate of the overall burden context for an individual in spac and time. The proposed project will provide a valuable resource for scientists performing research in the genomics of psychiatric and neurological conditions. Perhaps more importantly, the lessons learned in the course of this project will provide the foundation for developing tools that may one day make interpreting non-coding variation in the clinic a reality.

Public Health Relevance

Non-protein coding sequence accounts for about 97% of the human genome, and genetic variants in these regions can contribute to diseases and other traits. Our proposed computer algorithms and methods will make it easier for scientists and clinicians to focus on the non-coding variants that are most likely to play a role in disease.

Agency
National Institute of Health (NIH)
Institute
National Institute of Mental Health (NIMH)
Type
Research Project (R01)
Project #
5R01MH105527-03
Application #
9174092
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Arguello, Alexander
Project Start
2014-12-01
Project End
2018-07-31
Budget Start
2016-11-01
Budget End
2018-07-31
Support Year
3
Fiscal Year
2017
Total Cost
Indirect Cost
Name
University of Iowa
Department
Psychiatry
Type
Schools of Medicine
DUNS #
062761671
City
Iowa City
State
IA
Country
United States
Zip Code
52242
Vervier, Kévin; Michaelson, Jacob J (2018) TiSAn: estimating tissue-specific effects of coding and non-coding variants. Bioinformatics 34:3061-3068
Michaelson, Jacob J (2017) Genetic Approaches to Understanding Psychiatric Disease. Neurotherapeutics 14:564-581
Bahl, Ethan; Koomar, Tanner; Michaelson, Jacob J (2017) cerebroViz: an R package for anatomical visualization of spatiotemporal brain data. Bioinformatics 33:762-763
Geisheker, Madeleine R; Heymann, Gabriel; Wang, Tianyun et al. (2017) Hotspots of missense mutation identify neurodevelopmental disorder genes and functional domains. Nat Neurosci 20:1043-1051
Vervier, Kévin; Michaelson, Jacob J (2016) SLINGER: large-scale learning for predicting gene expression. Sci Rep 6:39360