Mental illnesses are some of the most devastating diseases affecting human populations, placing a huge burden on individuals, families and society. Genome-wide association studies (GWAS) have identified dozens of common single nucleotide polymorphisms (SNPs) that are associated with psychiatric diseases, but a majority of those SNPs have been mapped to intergenic or intronic regions and are functionally unclassified. Existing software or algorithms only query multiple databases and produce lists of hits without intelligent integration and ignore much of the valuable regulatory information. The overall goal of this proposal is to integrate all available genetic, genomic and epigenomic data to generate a probability-based prediction about a SNP's influence on gene expression level in brain. Our previous studies have shown that psychiatric GWAS signals are enriched with brain eQTL SNPs (eSNPs), and these brain eSNPs are likely to be functional and contribute to disease susceptibilities. We will use SNPs in eQTLs to anchor a chain of evidence incorporating histone marks, conserved sequences, transcription factor binding sites, DNA methylation, accessible chromatins, non-coding RNA, and other data. We will use a machine learning method to predict regulatory SNPs based on known relationships between these epigenetic marks and their target genes, as well as their distinct patterns in genome. We will also use our novel unsupervised deconvolution algorithm to extract cell-type (i.e., neuron vs. non-neuron) specific measures from heterogeneous brain tissue data to improve our predictions. We will use both statistical and experimental methods to validate the predictions. Quantitative PCR and CRISPR-cas9 will be used on induced pluripotent cell lines to compare gene expression levels of alleles of predicted functional SNPs. Both algorithm and predicted functional variants will made public via a website and standalone application. The novel algorithm will significantly improve our understanding of psychiatric disease genetics by uncovering the gene-regulatory functions for disease-associated, non-coding SNPs.
Genome-wide association studies (GWAS) have identified thousands of common SNPs associated with major complex diseases, but the majority of those SNPs are located in non-coding regions, leaving those genetic associations functionally unexplained. Existing functional predication software or algorithms only query some databases without providing statistical or biological integration, and dismiss much valuable regulatory information. We propose to integrate all the available genetic, genomic and epigenomic data and use machine learning to produce a probability-based prediction about a SNP's influence on gene expression levels in brain. We will also use a novel unsupervised deconvolution algorithm to extract cell-type specific measures from heterogeneous brain tissue data to improve our prediction. We will use both statistical and experimental methods to validate the predictions. Both algorithm and predicted functional variants will be made public via a website and standalone application. The novel algorithm will significantly improve our understanding of psychiatric disease genetics by giving those non-coding, disease-associated SNPs meaningful biological functions.
Showing the most recent 10 out of 11 publications