Although affected members of multiplex schizophrenia pedigrees have substantially elevated recurrence risk compared to singleton cases, the mean polygenic risk scores between these groups do not differ, suggesting that one source of this higher familial recurrence risk is rare, higher impact variation. We will collect whole genome sequence (WGS) from 600 affected members of multiplex schizophrenia pedigrees to identify rare variation shared by affected individuals within and between pedigrees potentially accounting for the increased recurrence risk, and reducing the `variant space' under consideration. After QC and calling in our existing pipeline, a) familial sequence variants in the exome will be directly analyzed in 2000 Irish cases and 2000 Irish controls with 30X exome sequence data in production currently, and b) variants outside the exome will be imputed into 3600 Irish singleton schizophrenia or bipolar disorder cases and 3000 Irish population controls with GWAS framework data; 3781 additional UK10K controls with 10X WGS are available to increase analysis power. This imputed dataset will be analyzed using recently developed methods for kernel-based tests of variation aggregated over a defined interval (such as a gene) that avoid the inflation of type-1 error. We use multiple sources of genomic information to develop weights for each position in the genome (indexing the prior probability that a change at the site has functional consequence) and each variant detected (indexing the probability that observed changes have functional consequence), and we propose to improve the existing genomic information sources for this weighting in a number of ways.
In aim 3, prioritized variants from aim 2a/2b will be directly genotyped in the case/control samples by custom microarray; individual genes or genesets showing enrichment of variation in cases (if any are observed) will be resequenced in the case/control sample.
In Aim 4, the directly assessed genotypic and sequence data from aim 3 will be analyzed using standard methods to identify individual associated variants, and variant-enriched genes, genesets or other functional sequences. We seek to unambiguously identify 1) individual variants that are significantly more common in cases, or 2) individual genes or other functional sequences or 3) gene- or functional sequence sets enriched for variation in cases to provide critical information about the brain systems perturbed in schizophrenia, and the mechanisms by which such alleles increase risk.
Rare sequence variation has been implicated in many human complex traits, incuding schizophrenia, and has been studied in unrelated cases and controls and parent:offspring trios, but remains unstudied in multiplex families. Sequencing the genomes of such families will allow conprehensive identification of variation in protein coding genes, non-coding expressed loci, regulatory sequences, and evolutionarily conserved regions, as well as detection of structural variation, and testing these alleles in a large case/control series of the same ethnic and geographic origin offers significant advantages over prior study designs, and has the potential to identify individual alleles, variant enriched genes, variant enriched non-genic sequences, and/or variant enriched genesets contributing to SCH risk in the Irish population. Such variants offer great potential for understanding the functional impact of risk alleles and improving mechanistic understanding of schizophrenia and related disorders.