Statistical Methods for Next-Generation Sequencing in Disease Association Studies Through this project we propose to develop statistical approaches and software for genotype calling and association testing in next-generation sequence data. The field is driven by molecular advances that allow for affordable, massively parallel sequencing. The rapid development of statistical methods for next-generation sequence data in disease studies is necessary to keep pace with the advancing molecular technology. Next- generation sequencing is based on random, short-read technology;thus the coverage of any nucleotide is highly variable and subject to error. Distinguishing random error from truly variable sites is required for """"""""SNP- calling"""""""". One step beyond this is identifying the individual's actual genotype at the site. This is a highly statistical problem and we have yet to see this problem addressed in a statistically rigorous manner. The solution that we propose, and what makes our approach novel, assumes that we have a sample of individuals, each with next-generation sequence data. We anticipate that sequencing may ultimately replace GWAS SNP arrays for disease-association studies. While this may be several years away for whole-genome sequencing, sequencing enough people individually for a small association study is already becoming practical with target capture arrays. We can leverage the information from a sample of individuals with next-generation sequence data to more accurately estimate an individual's genotype and the position-specific error rate. Our approach is to express the genotype probabilities and error rate in a likelihood framework. We can then use standard statistical theory to help us call genotypes. This approach should perform better than calling genotypes for a single individual at a time based on an arbitrary filter as is currently done. A distinct advantage of this statistical framework is that the uncertainty in the genotype calls can be incorporated directly into our disease-association tests (e.g., case-control and rare variant analysis). In this way we will increase power of our association tests and reduce bias due to error or systematic missingness. Incorporation of next-generation sequence data into the association tests provides a complete analysis pipeline from sequence to association.

Public Health Relevance

Our project meets the goals of the GO grant program because of its potential high-impact in a short term. Methods development is particularly well-suited for, and in need of, a short-term infusion of support. The area of next-generation sequencing is rapidly growing, yet statistical methods to use these data effectively lag far behind molecular advances. Our project will provide the rapid acceleration needed to quickly provide statistical approaches to meet the coming data from these new technologies.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
High Impact Research and Research Infrastructure Programs (RC2)
Project #
5RC2HG005605-02
Application #
7943996
Study Section
Special Emphasis Panel (ZHG1-HGR-N (O1))
Program Officer
Brooks, Lisa
Project Start
2009-09-30
Project End
2012-07-31
Budget Start
2010-08-01
Budget End
2012-07-31
Support Year
2
Fiscal Year
2010
Total Cost
$500,000
Indirect Cost
Name
University of Miami School of Medicine
Department
Genetics
Type
Schools of Medicine
DUNS #
052780918
City
Coral Gables
State
FL
Country
United States
Zip Code
33146
Kinnamon, Daniel D; Martin, Eden R (2014) Valid Monte Carlo permutation tests for genetic case-control studies with missing genotypes. Genet Epidemiol 38:325-44
Nuytemans, Karen; Bademci, Guney; Inchausti, Vanessa et al. (2013) Whole exome sequencing of rare variants in EIF4G1 and VPS35 in Parkinson disease. Neurology 80:982-9
Kinnamon, Daniel D; Hershberger, Ray E; Martin, Eden R (2012) Reconsidering association testing methods using single-variant test statistics as alternatives to pooling tests for sequence data with rare variants. PLoS One 7:e30238
Hedges, Dale J; Guettouche, Toumy; Yang, Shan et al. (2011) Comparison of three targeted enrichment strategies on the SOLiD sequencing platform. PLoS One 6:e18595
Rampersaud, Evadnie; Kinnamon, Daniel D; Hamilton, Kara et al. (2010) Common susceptibility variants examined for association with dilated cardiomyopathy. Ann Hum Genet 74:110-6
Martin, E R; Kinnamon, D D; Schmidt, M A et al. (2010) SeqEM: an adaptive genotype-calling approach for next-generation sequencing studies. Bioinformatics 26:2803-10
Hedges, Dale J; Hedges, Dale; Burges, Dan et al. (2009) Exome sequencing of a multigenerational human pedigree. PLoS One 4:e8232