In the years since the completion of the human genome project, our ability to read the genome has improved tremendously. Frustratingly, our ability to interpret and comprehend what we can now easily read has lagged behind. Genetic variants that lie outside protein-coding regions are particularly challenging to interpret because the rules that govern their regulatory function are far less understood than the principles of protein- coding sequence. Addressing this challenge is particularly urgent because the drastic fall in whole genome sequencing costs will bring with it a wave of newly discovered non-coding and potentially causal variants. The recent completion of several genome-scale projects, and the release of pilot data from new and ongoing projects, have made it possible to begin building models whose goal is to predict the function of non-coding genetic variation. We propose the development of a brain-centric variant annotation framework that integrates temporal and spatial expression information from existing data sets, regulatory relationships established by eQTL studies, and chromatin state information uncovered by ENCODE and other studies, with the aim of providing, for any arbitrary input variant, an estimate of the magnitude of the effect, the systems or tissues most likely affected by the variant, and the stage of development at which the variant is most likely to produce a phenotype. Models will be trained on variants from whole genome sequencing studies of diagnosed and undiagnosed individuals. Development of this framework will proceed in three stages: 1) the above lines of genomic evidence will be combined with other features as predictors of enrichment for variants identified in individuals with diagnosed neuropsychiatric conditions, producing a score indicative of the variant's phenotype-shaping potential; 2) spatiotemporal gene expression matrices will be integrated to provide estimates of the tissues and time points most likely affected by variation at the non- coding query locus; 3) by combining the estimates produced in stages 1 and 2, we will create a single weighted context matrix that represents the individual's aggregate regulatory variant burden in space (i.e. brain tissue/region) and time. The framework will be demonstrated on previously unpublished variants in autism and bipolar disorder. The proposed framework would to our knowledge be the first non-coding variant annotation system that focuses on the effect on the brain, and is able to guide the user as to when and where the effects of potentially functional variants are likely to emerge in an individual. A further novel aspect of the proposed system is that it will provide an integrated estimate of the overall burden context for an individual in spac and time. The proposed project will provide a valuable resource for scientists performing research in the genomics of psychiatric and neurological conditions. Perhaps more importantly, the lessons learned in the course of this project will provide the foundation for developing tools that may one day make interpreting non-coding variation in the clinic a reality.
Non-protein coding sequence accounts for about 97% of the human genome, and genetic variants in these regions can contribute to diseases and other traits. Our proposed computer algorithms and methods will make it easier for scientists and clinicians to focus on the non-coding variants that are most likely to play a role in disease.
Vervier, Kévin; Michaelson, Jacob J (2018) TiSAn: estimating tissue-specific effects of coding and non-coding variants. Bioinformatics 34:3061-3068 |
Geisheker, Madeleine R; Heymann, Gabriel; Wang, Tianyun et al. (2017) Hotspots of missense mutation identify neurodevelopmental disorder genes and functional domains. Nat Neurosci 20:1043-1051 |
Michaelson, Jacob J (2017) Genetic Approaches to Understanding Psychiatric Disease. Neurotherapeutics 14:564-581 |
Bahl, Ethan; Koomar, Tanner; Michaelson, Jacob J (2017) cerebroViz: an R package for anatomical visualization of spatiotemporal brain data. Bioinformatics 33:762-763 |
Vervier, Kévin; Michaelson, Jacob J (2016) SLINGER: large-scale learning for predicting gene expression. Sci Rep 6:39360 |