Robust Methods for the Efficient Analysis and Integration of DNA Sequence Data

Allen, Andrew

Abstract

Human genetics research is on the cusp of a major transformation in how genetic variation is captured-from a marker-based approach to one based on a complete characterization of an individual's genome by sequencing. This is an exciting prospect but not without its challenges. The imminent production of large amounts of sequence data raises several issues on how best to use these data. For example, because of the sheer scale of the data, statistical approaches for associating sequence variants with human disease need to be efficient, both statistically and computationally. In addition, most genetic association experiments in the near term will not rely solely on sequence data but instead will have sub-samples of individuals with sequence data while the rest of the sample will remain unsequenced but will contain genotype information. Alternatively, sequence data may be available on a separate, external sample. Thus it will be important to develop statistical methods that can appropriately integrate these various types of data into a unified inferential framework. This research project will address these issues by proposing to develop a novel class of sequence- based haplotype sharing statistics that exploit the implications of DNA sequence evolution in testing for variant/disease association (specific aim 1). Further, we propose to develop a statistical framework that allows for the unified analysis of DNA sequence and genotype data (specific aim 2). Throughout we will leverage our previous work developing robust methods for haplotype inference to develop computationally and statistically efficient procedures that remain robust to population genetic assumptions. A stratified analytic approach will be emphasized to allow for adjustment for confounding due to population stratification. Efficient Monte Carlo procedures will be proposed to account for the large number of sequence variants investigated. We will develop a suite of software tools that fully implement the methodology developed and make them freely available to the general research community (specific aim 3). Finally, using these tools, we will analyze a publicly available DNA sequence dataset with the goal of better localizing disease- associated sequence variants (specific aim 4). The methods developed through this proposal represent a unified and statistically rigorous framework for developing powerful tests that exploit evolutionary relationships between DNA sequences while allowing for disparate data types to be incorporated into a unified analysis. These procedures will give researchers the tools to more finely localize disease-associated sequence variants, allowing variants to be better prioritized for subsequent investigation via functional studies. Human genetics research is on the cusp of a major transformation in how genetic variation is captured-from a marker based approach to one based on a complete characterization of an individual's genome by sequencing. The imminent production of large amounts of sequencing data, however, leads to questions concerning their statistical analysis and incorporation into the larger experiment. We address these questions by proposing a unified and statistically rigorous framework for developing powerful tests that exploit evolutionary relationships between DNA sequences and that allow for disparate data types to be incorporated into a unified analysis. ? ? ?

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of Mental Health (NIMH)
Type: Research Project (R01)
Project #: 1R01MH084680-01
Application #: 7554820
Study Section: Special Emphasis Panel (ZMH1-ERB-C (06))
Program Officer: Yao, Yin Y

Project Start: 2008-09-26
Project End: 2011-06-30
Budget Start: 2008-09-26
Budget End: 2009-06-30
Support Year: 1
Fiscal Year: 2008
Total Cost: $234,000
Indirect Cost

Institution

Name: Duke University
Department: Other Clinical Sciences
Type: Schools of Medicine
DUNS #: 044387793

City: Durham
State: NC
Country: United States
Zip Code: 27705

Related projects


NIH 2010 R01 MH	Robust Methods for the Efficient Analysis and Integration of DNA Sequence Data Allen, Andrew S. / Duke University	$234,000
NIH 2010 R01 MH	Robust Methods for the Efficient Analysis and Integration of DNA Sequence Data Allen, Andrew S. / Duke University	$209,939
NIH 2009 R01 MH	Robust Methods for the Efficient Analysis and Integration of DNA Sequence Data Allen, Andrew S. / Duke University	$234,000
NIH 2008 R01 MH	Robust Methods for the Efficient Analysis and Integration of DNA Sequence Data Allen, Andrew S. / Duke University	$234,000

Publications

Xing, Chuanhua; M McCarthy, Janice; Dupuis, Josée et al. (2016) Robust analysis of secondary phenotypes in case-control genetic association studies. Stat Med 35:4226-37

Satten, Glen A; Allen, Andrew S; Ikeda, Morna et al. (2014) Robust regression analysis of copy number variation data based on a univariate score. PLoS One 9:e86272

Epstein, Michael P; Duncan, Richard; Broadaway, K Alaine et al. (2012) Stratification-score matching improves correction for confounding by population stratification in case-control association studies. Genet Epidemiol 36:195-205

Epstein, Michael P; Duncan, Richard; Jiang, Yunxuan et al. (2012) A permutation procedure to correct for confounders in case-control studies, including tests of rare variation. Am J Hum Genet 91:215-23

Allen, Andrew S; Satten, Glen A (2011) Control for confounding in case-control studies using the stratification score, a retrospective balancing score. Am J Epidemiol 173:752-60

Xing, Chuanhua; Satten, Glen A; Allen, Andrew S (2011) A weighted accumulation test for associating rare genetic variation with quantitative phenotypes. BMC Proc 5 Suppl 9:S6

Allen, Andrew; Epstein, Michael P; Satten, Glen A (2010) Score-based adjustment for confounding by population stratification in genetic association studies. Genet Epidemiol 34:383-5

Laje, Gonzalo; Cannon, Dara M; Allen, Andrew S et al. (2010) Genetic variation in HTR2A influences serotonin transporter binding potential as measured using PET and [11C]DASB. Int J Neuropsychopharmacol 13:715-24

Allen, Andrew S; Satten, Glen A; Bray, Sarah L et al. (2010) Fast and robust association tests for untyped SNPs in case-control studies. Hum Hered 70:167-76

Allen, Andrew S; Satten, Glen A (2010) SNPs in CAST are associated with Parkinson disease: a confirmation study. Am J Med Genet B Neuropsychiatr Genet 153B:973-9

Showing the most recent 10 out of 13 publications

Comments

Be the first to comment on Andrew Allen's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: