The ability to generate sequence data is rapidly becoming a reality. Sequencing efforts are already underway at candidate gene regions surrounding association peaks identified by genome-wide association studies (GWAS), paving the way for """"""""whole-exome"""""""" and, ultimately, whole-genome sequencing studies. Comprehensive sequencing has the potential to reveal a vast trove of low frequency variants, but most statistical association methods used for GWAS are likely inadequate because they are targeted towards common variants and have been optimized for identifying associations at a single variant at a time, and therefore, do not account for multiple variants acting at the same locus. For sequencing studies to attain their full potential, the development of new statistical methods will be critical. We propose to develop new methods for both targeted and genome-wide sequencing approaches.
In Specific Aim 1 we will develop statistical methods for identifying causal variants inside a targeted region, such as a GWAS peak or candidate gene. DNA sequencing provides a complete picture of genetic variation, enabling the localization of association signal(s) in order to identify true causal alleles against a background of correlated variants due to linkage disequilibrium. We will design statistical strategies for finding causal variants underlying association peaks. We will consider the presence of multiple causal alleles at a locus.
In Specific Aim 2 we will develop statistical methods for sequencing studies to optimally capture the association signal arising from multiple rare variants acting within the same disease gene. The initial focus will be on candidate gene sequencing with an eye towards whole-exome and even whole-genome sequencing. Associations of individual rare alleles with disease are difficult to detect because low-frequency alleles have limited power in single-variant association tests. We will develop methods combining multiple rare variants from the same gene (or pathway) and treat genes (pathways) rather than individual alleles as the unit for the association test. Recent studies demonstrate that genes underlying certain quantitative phenotypes display an excess of rare coding variation in individuals at one phenotypic extreme. In addition to combining multiple rare variants in a single test, we will also develop methods incorporating both rare and common variants, which will be important when whole- genome sequencing eventually becomes practical.
In Specific Aim 3 we will assess the power of both targeted and genome-wide approaches and generate study design recommendations, using a population genetic model based on allele frequency distributions from empirical sequencing data sets. We will make recommendations on sequencing strategies, sample sizes, and inclusion of specific populations. All power calculations and recommendations will critically depend on assumptions about allele frequency distributions, which we will rigorously model using empirical sequence data. Our population genetic model will incorporate complex demographic histories, recombination and natural selection in addition to mutation and genetic drift. RESEARCH NARRATIVE: The study of human genetic variation has already begun to pay big dividends, as genome- wide association studies (GWAS) focusing on common genetic variation has identified risk variants for numerous complex diseases. However, for most diseases the fraction of genetic heritability explained by these findings is extremely small, motivating deep resequencing studies, which will be able to identify rare risk variants. These resequencing studies will require new statistical methods that will have great potential for furthering our understanding of disease etiology, leading to possible drug targets, and may also be useful for diagnostic testing in healthy individuals.

Agency
National Institute of Health (NIH)
Institute
National Institute of Mental Health (NIMH)
Type
Research Project (R01)
Project #
5R01MH084676-02
Application #
7692276
Study Section
Special Emphasis Panel (ZMH1-ERB-C (06))
Program Officer
Yao, Yin Y
Project Start
2008-09-26
Project End
2011-06-30
Budget Start
2009-07-01
Budget End
2010-06-30
Support Year
2
Fiscal Year
2009
Total Cost
$444,264
Indirect Cost
Name
Brigham and Women's Hospital
Department
Type
DUNS #
030811269
City
Boston
State
MA
Country
United States
Zip Code
02115
Chun, Sung; Casparino, Alexandra; Patsopoulos, Nikolaos A et al. (2017) Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat Genet 49:600-605
Agarwala, Vineeta; Flannick, Jason; Sunyaev, Shamil et al. (2013) Evaluating empirical bounds on complex disease genetic architecture. Nat Genet 45:1418-27
Coste, Bertrand; Houge, Gunnar; Murray, Michael F et al. (2013) Gain-of-function mutations in the mechanically activated ion channel PIEZO2 cause a subtype of Distal Arthrogryposis. Proc Natl Acad Sci U S A 110:4667-72
Kiezun, Adam; Pulit, Sara L; Francioli, Laurent C et al. (2013) Deleterious alleles in the human genome are on average younger than neutral alleles of the same frequency. PLoS Genet 9:e1003301
Sunyaev, Shamil R (2012) Inferring causality and functional significance of human coding DNA variants. Hum Mol Genet 21:R10-7
Liu, Dajiang J; Leal, Suzanne M (2012) SEQCHIP: a powerful method to integrate sequence and genotype data for the detection of rare variant associations. Bioinformatics 28:1745-51
Pasaniuc, Bogdan; Rohland, Nadin; McLaren, Paul J et al. (2012) Extremely low-coverage sequencing and imputation increases power for genome-wide association studies. Nat Genet 44:631-5
Neale, Benjamin M; Kou, Yan; Liu, Li et al. (2012) Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485:242-5
Leshchiner, Ignaty; Alexa, Kristen; Kelsey, Peter et al. (2012) Mutation mapping and identification by whole-genome sequencing. Genome Res 22:1541-8
Liu, Dajiang J; Leal, Suzanne M (2012) Estimating genetic effects and quantifying missing heritability explained by identified rare-variant associations. Am J Hum Genet 91:585-96

Showing the most recent 10 out of 20 publications