The immanent influx of high-throughout sequencing datasets poses both a unique opportunity to identify the disease susceptibility loci for complex disease and their pathways and a challenge in terms of the statistical analysis. Many of the loci that are recorded by high-throughput sequencing studies will be rare, providing insufficient power for the statistical analysis. For studies with unrelated cases and controls, a number of collapsing approaches has been suggested. However, such methodology does not exist for family-based studies which are by design well suited for rare-variant analysis. They have higher statistical power for rare variants and are robust against population admixture. For population-based designs, statistical approaches that adjust the analysis for such confounding do not exist if the variants are rare. However, for the construction of collapsing method for family-based designs, the linkage disequilibrium (LD) between the loci has to be estimated which is a non-trivial task for rare variants. In population-base designs, this issue can be avoid by utilizing permutation tests that randomly assign the phenotype, but keep the genetic data in a subject fixed. This is not possible in family-based designs. In this grant application, we will develop an analytical approach to the LD-estimation problem in family-based designs. This will enable the construction of rare variant tests for family-based designs. The major goal of sequence-analysis is the identification of the DSLs. The significance of single-locus association tests is defined by the genetic effect size and the allele frequency. Since non-DSLs that are in LD with the true DSL can have higher allele frequencies than the DSL, but have smaller, observed genetic effect sizes, the significance of the test cannot be used to identify DSLs. In order to distinguish the true DSLs from SNPs that are in LD with the DSLs, we will develop statistical approaches that assess differences in LD-pattern across multiple loci between subjects are required. Such methodology will be proposed for designs of unrelated individuals and family-based studies. The new analysis approaches will be integrated in our software packages. The new approaches will support the search for disease loci in the human genome which will lead to a better understanding of the pathways for complex diseases and ultimately to their treatment.
Sequencing data contains the information that is needed to identify the causal genetic loci for complex diseases and phenotypes. However, to translate this wealth of information into the discovery of disease loci, novel statistical analysis approaches are required. While the current analysis methodology remains valid, they are not optimally designed to look at rare variants and sequence data. We will develop statistical tools that are robust against confounding in rare variant data and that can identify the locations of the disease loci in sequencing data. This important information will support the search for disease pathways and their cure.
|Loehlein Fier, Heide; Prokopenko, Dmitry; Hecker, Julian et al. (2017) On the association analysis of genome-sequencing data: A spatial clustering approach for partitioning the entire genome into nonoverlapping windows. Genet Epidemiol 41:332-340|
|Schlauch, Daniel; Fier, Heide; Lange, Christoph (2017) Identification of genetic outliers due to sub-structure and cryptic relationships. Bioinformatics 33:1972-1979|
|Hecker, Julian; Maaser, Anna; Prokopenko, Dmitry et al. (2017) Reporting Correct p Values in VEGAS Analyses. Twin Res Hum Genet 20:257-259|
|Prokopenko, Dmitry; Hecker, Julian; Silverman, Edwin K et al. (2016) Utilizing the Jaccard index to reveal population stratification in sequencing data: a simulation study and an application to the 1000 Genomes Project. Bioinformatics 32:1366-72|
|Prokopenko, Dmitry; Hecker, Julian; Silverman, Edwin et al. (2015) Using Network Methodology to Infer Population Substructure. PLoS One 10:e0130708|
|Hecker, Julian; Prokopenko, Dmitry; Lange, Christoph et al. (2015) On the Recombination Rate Estimation in the Presence of Population Substructure. PLoS One 10:e0145152|
|Erk, Susanne; Meyer-Lindenberg, Andreas; Linden, David E J et al. (2014) Replication of brain function effects of a genome-wide supported psychiatric risk variant in the CACNA1C gene and new multi-locus effects. Neuroimage 94:147-154|
|Qiao, Dandi; Cho, Michael H; Fier, Heide et al. (2014) On the simultaneous association analysis of large genomic regions: a massive multi-locus association test. Bioinformatics 30:157-64|
|Naylor, Melissa G; Cardenas, Valerie A; Tosun, Duygu et al. (2014) Voxelwise multivariate analysis of multimodality magnetic resonance imaging. Hum Brain Mapp 35:831-46|
|Erk, Susanne; Meyer-Lindenberg, Andreas; Schmierer, Phöbe et al. (2014) Hippocampal and frontolimbic function as intermediate phenotype for psychosis: evidence from healthy relatives and a common risk variant in CACNA1C. Biol Psychiatry 76:466-75|
Showing the most recent 10 out of 36 publications