The immanent influx of high-throughout sequencing datasets poses both a unique opportunity to identify the disease susceptibility loci for complex disease and their pathways and a challenge in terms of the statistical analysis. Many of the loci that are recorded by high-throughput sequencing studies will be rare, providing insufficient power for the statistical analysis. For studies with unrelated cases and controls, a number of collapsing approaches has been suggested. However, such methodology does not exist for family-based studies which are by design well suited for rare-variant analysis. They have higher statistical power for rare variants and are robust against population admixture. For population-based designs, statistical approaches that adjust the analysis for such confounding do not exist if the variants are rare. However, for the construction of collapsing method for family-based designs, the linkage disequilibrium (LD) between the loci has to be estimated which is a non-trivial task for rare variants. In population-base designs, this issue can be avoid by utilizing permutation tests that randomly assign the phenotype, but keep the genetic data in a subject fixed. This is not possible in family-based designs. In this grant application, we will develop an analytical approach to the LD-estimation problem in family-based designs. This will enable the construction of rare variant tests for family-based designs. The major goal of sequence-analysis is the identification of the DSLs. The significance of single-locus association tests is defined by the genetic effect size and the allele frequency. Since non-DSLs that are in LD with the true DSL can have higher allele frequencies than the DSL, but have smaller, observed genetic effect sizes, the significance of the test cannot be used to identify DSLs. In order to distinguish the true DSLs from SNPs that are in LD with the DSLs, we will develop statistical approaches that assess differences in LD-pattern across multiple loci between subjects are required. Such methodology will be proposed for designs of unrelated individuals and family-based studies. The new analysis approaches will be integrated in our software packages. The new approaches will support the search for disease loci in the human genome which will lead to a better understanding of the pathways for complex diseases and ultimately to their treatment.

Public Health Relevance

Sequencing data contains the information that is needed to identify the causal genetic loci for complex diseases and phenotypes. However, to translate this wealth of information into the discovery of disease loci, novel statistical analysis approaches are required. While the current analysis methodology remains valid, they are not optimally designed to look at rare variants and sequence data. We will develop statistical tools that are robust against confounding in rare variant data and that can identify the locations of the disease loci in sequencing data. This important information will support the search for disease pathways and their cure.

National Institute of Health (NIH)
National Institute of Mental Health (NIMH)
Research Project (R01)
Project #
Application #
Study Section
Behavioral Genetics and Epidemiology Study Section (BGES)
Program Officer
Senthil, Geetha
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Harvard University
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
Qiao, Dandi; Cho, Michael H; Fier, Heide et al. (2014) On the simultaneous association analysis of large genomic regions: a massive multi-locus association test. Bioinformatics 30:157-64
Erk, Susanne; Meyer-Lindenberg, Andreas; Schmierer, Phöbe et al. (2014) Hippocampal and frontolimbic function as intermediate phenotype for psychosis: evidence from healthy relatives and a common risk variant in CACNA1C. Biol Psychiatry 76:466-75
Erk, Susanne; Meyer-Lindenberg, Andreas; Linden, David E J et al. (2014) Replication of brain function effects of a genome-wide supported psychiatric risk variant in the CACNA1C gene and new multi-locus effects. Neuroimage 94:147-54
Qiao, Dandi; Mattheisen, Manuel; Lange, Christoph (2013) On association analysis of rare variants under population substructure: an approach for the detection of subjects that can cause bias in the analysis--T opt: an outlier detection method. Genet Epidemiol 37:431-9
Erk, S; Meyer-Lindenberg, A; Schmierer, P et al. (2013) Functional impact of a recently identified quantitative trait locus for hippocampal volume with genome-wide support. Transl Psychiatry 3:e287
Ersland, Kari M; Christoforou, Andrea; Stansberg, Christine et al. (2012) Gene-based analysis of regionally enriched cortical genes in GWAS data sets of cognitive traits and psychiatric disorders. PLoS One 7:e31687
Christoforou, Andrea; Dondrup, Michael; Mattingsdal, Morten et al. (2012) Linkage-disequilibrium-based binning affects the interpretation of GWASs. Am J Hum Genet 90:727-33
Degenhardt, Franziska; Priebe, Lutz; Herms, Stefan et al. (2012) Association between copy number variants in 16p11.2 and major depressive disorder in a German case-control sample. Am J Med Genet B Neuropsychiatr Genet 159B:263-73
Muhleisen, Thomas W; Mattheisen, Manuel; Strohmaier, Jana et al. (2012) Association between schizophrenia and common variation in neurocan (NCAN), a genetic risk factor for bipolar disorder. Schizophr Res 138:69-73
Frank, Josef; Cichon, Sven; Treutlein, Jens et al. (2012) Genome-wide significant association between alcohol dependence and a variant in the ADH gene cluster. Addict Biol 17:171-80

Showing the most recent 10 out of 19 publications