The immanent influx of high-throughout sequencing datasets poses both a unique opportunity to identify the disease susceptibility loci for complex disease and their pathways and a challenge in terms of the statistical analysis. Many of the loci that are recorded by high-throughput sequencing studies will be rare, providing insufficient power for the statistical analysis. For studies with unrelated cases and controls, a number of collapsing approaches has been suggested. However, such methodology does not exist for family-based studies which are by design well suited for rare-variant analysis. They have higher statistical power for rare variants and are robust against population admixture. For population-based designs, statistical approaches that adjust the analysis for such confounding do not exist if the variants are rare. However, for the construction of collapsing method for family-based designs, the linkage disequilibrium (LD) between the loci has to be estimated which is a non-trivial task for rare variants. In population-base designs, this issue can be avoid by utilizing permutation tests that randomly assign the phenotype, but keep the genetic data in a subject fixed. This is not possible in family-based designs. In this grant application, we will develop an analytical approach to the LD-estimation problem in family-based designs. This will enable the construction of rare variant tests for family-based designs. The major goal of sequence-analysis is the identification of the DSLs. The significance of single-locus association tests is defined by the genetic effect size and the allele frequency. Since non-DSLs that are in LD with the true DSL can have higher allele frequencies than the DSL, but have smaller, observed genetic effect sizes, the significance of the test cannot be used to identify DSLs. In order to distinguish the true DSLs from SNPs that are in LD with the DSLs, we will develop statistical approaches that assess differences in LD-pattern across multiple loci between subjects are required. Such methodology will be proposed for designs of unrelated individuals and family-based studies. The new analysis approaches will be integrated in our software packages. The new approaches will support the search for disease loci in the human genome which will lead to a better understanding of the pathways for complex diseases and ultimately to their treatment.

Public Health Relevance

Sequencing data contains the information that is needed to identify the causal genetic loci for complex diseases and phenotypes. However, to translate this wealth of information into the discovery of disease loci, novel statistical analysis approaches are required. While the current analysis methodology remains valid, they are not optimally designed to look at rare variants and sequence data. We will develop statistical tools that are robust against confounding in rare variant data and that can identify the locations of the disease loci in sequencing data. This important information will support the search for disease pathways and their cure.

National Institute of Health (NIH)
National Institute of Mental Health (NIMH)
Research Project (R01)
Project #
Application #
Study Section
Behavioral Genetics and Epidemiology Study Section (BGES)
Program Officer
Senthil, Geetha
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Harvard University
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
Hecker, Julian; Prokopenko, Dmitry; Lange, Christoph et al. (2015) On the Recombination Rate Estimation in the Presence of Population Substructure. PLoS One 10:e0145152
Prokopenko, Dmitry; Hecker, Julian; Silverman, Edwin et al. (2015) Using Network Methodology to Infer Population Substructure. PLoS One 10:e0130708
Qiao, Dandi; Cho, Michael H; Fier, Heide et al. (2014) On the simultaneous association analysis of large genomic regions: a massive multi-locus association test. Bioinformatics 30:157-64
Naylor, Melissa G; Cardenas, Valerie A; Tosun, Duygu et al. (2014) Voxelwise multivariate analysis of multimodality magnetic resonance imaging. Hum Brain Mapp 35:831-46
Erk, Susanne; Meyer-Lindenberg, Andreas; Schmierer, Phöbe et al. (2014) Hippocampal and frontolimbic function as intermediate phenotype for psychosis: evidence from healthy relatives and a common risk variant in CACNA1C. Biol Psychiatry 76:466-75
Erk, Susanne; Meyer-Lindenberg, Andreas; Linden, David E J et al. (2014) Replication of brain function effects of a genome-wide supported psychiatric risk variant in the CACNA1C gene and new multi-locus effects. Neuroimage 94:147-54
Lutz, Sharon M; Vansteelandt, Stijn; Lange, Christoph (2013) Testing for direct genetic effects using a screening step in family-based association studies. Front Genet 4:243
Erk, S; Meyer-Lindenberg, A; Schmierer, P et al. (2013) Functional impact of a recently identified quantitative trait locus for hippocampal volume with genome-wide support. Transl Psychiatry 3:e287
Qiao, Dandi; Mattheisen, Manuel; Lange, Christoph (2013) On association analysis of rare variants under population substructure: an approach for the detection of subjects that can cause bias in the analysis--T opt: an outlier detection method. Genet Epidemiol 37:431-9
Fernandes, Carla P D; Christoforou, Andrea; Giddaluru, Sudheer et al. (2013) A genetic deconstruction of neurocognitive traits in schizophrenia and bipolar disorder. PLoS One 8:e81052

Showing the most recent 10 out of 30 publications