each individual genome is likely to carry a large number of rare variants that cause susceptibility to disease. With the advent of next generation sequencing technologies, it is now possible to comprehensively analyze individual patient genomes in order to connect genetic variants with their respective phenotypes. In order to search for highly deleterious variants that cause autoimmune diseases, including rheumatoid arthritis and systemic lupus erythematosus, whole genome sequences were ascertained in a separate project. Here we propose another analysis of this dataset, which starts out with the genome sequences of 39 autoimmune disease patients and 39 matched controls. By searching for functional modules of genes that are enriched for disease relevant variants, this alternative strategy will exploit weak effect signals that originate from many different loci.
In specific aim , we will identify variant patterns that distinguish cases and controls. Sequence variants will be scored for their potential effects on gene expression or protein function. In parallel, disease candidate genes will be scored for their likelihood to play a molecular role in disease etiology. Sets of candidate genes will then be jointly tested for different patterns of functional variation among cases and controls.
In specific aim 2, we will aim at replication of the observed case-control differences with independent datasets. To this end we will first use imputation of genome-wide genotyping data into the 1000 genomes reference haplotypes. Although this strategy is unlikely to be informative about very rare variants, it has the advantage that large samples can be tested. We furthermore plan to use additional sequencing data that will become available during the course of this project. As a result, we expect to find biological meaningful sets of genes that differ in their joint variant distribution between cases and controls.
We will search whole genome sequencing data for rare variant patterns that are correlated with autoimmune disease, such as rheumatoid arthritis and systemic lupus erythematosus. Testing sets of candidate genes instead of separately testing each gene will increase the statistical power to find patterns of variation that are correlated with disease status. We hope to find sets of gene variants that can predict manifestation of disease.
|Li, Wentian; Freudenberg, Jan; Miramontes, Pedro (2014) Diminishing return for increased Mappability with longer sequencing reads: implications of the k-mer distributions in the human genome. BMC Bioinformatics 15:2|
|Li, Wentian; Freudenberg, Jan (2014) Characterizing regions in the human genome unmappable by next-generation-sequencing at the read length of 1000 bases. Comput Biol Chem 53 Pt A:108-17|