Statistical Methods for Population Genomics and """"""""Next-gen"""""""" Sequencing Data

Scheet, Paul

Abstract

Massively-parallel (""""""""next-generation"""""""") shotgun DNA sequencing projects will provide the highest resolution to date for genetic variation of human populations. This new technology offers great promise for interrogating the genetic etiology of complex disease. However, with this promise come challenges. These new sequencing methods are prone to nontrivial error rates and sparse coverage of mapped reads, confounding polymorphism discovery and genotyping. Copy number variation must often be inferred indirectly. The massive size of these data sets requires rapid and scaleable analytic approaches. In this proposal, we present statistical methods to address these challenges directly, using computationally tractable models for population genetic variation. Our methods take account of the dependence among nearby alleles (linkage disequilibrium) with a clusterbased model for haplotype variation, and utilize this information to aid inferences about the underlying genetic architecture of the samples. Specifically, we propose to call genotypes and detect novel polymorphic loci from next- generation shotgun sequence data, detect rare disease risk alleles for follow-up sequencing studies, and simultaneously model single nucleotide and copy number polymorphism in population data to facilitate studies of association between phenotype and genotype. Our experienced team of medical and statistical geneticists have the technical expertise and access to data sets necessary for achieving these aims. We will implement our methods in our widely-used software package fastPHASE.

Public Health Relevance

High throughput DNA sequencing technology is providing unparalleled detail of human genetic variation. This will allow finer resolution in locating disease genes that affect human health and disease. Both the large quantity and the uneven quality of this new technology demand new statistical methods for inference, risk assessment and eventually clinical translation.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG005859-02
Application #: 8288682
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Brooks, Lisa

Project Start: 2011-09-01
Project End: 2016-05-31
Budget Start: 2012-06-01
Budget End: 2013-05-31
Support Year: 2
Fiscal Year: 2012
Total Cost: $384,477
Indirect Cost: $114,482

Institution

Name: University of Texas MD Anderson Cancer Center
Department: Public Health & Prev Medicine
Type: Schools of Medicine
DUNS #: 800772139

City: Houston
State: TX
Country: United States
Zip Code: 77030

Related projects


NIH 2015 R01 HG	Statistical Methods for Population Genomics and 'Next-gen' Sequencing Data Scheet, Paul A. / University of Texas MD Anderson Cancer Center
NIH 2014 R01 HG	Statistical Methods for Population Genomics and """"""""Next-gen"""""""" Sequencing Data Scheet, Paul A. / University of Texas MD Anderson Cancer Center	$376,477
NIH 2013 R01 HG	Statistical Methods for Population Genomics and """"""""Next-gen"""""""" Sequencing Data Scheet, Paul A. / University of Texas MD Anderson Cancer Center	$367,026
NIH 2012 R01 HG	Statistical Methods for Population Genomics and """"""""Next-gen"""""""" Sequencing Data Scheet, Paul A. / University of Texas MD Anderson Cancer Center	$384,477
NIH 2011 R01 HG	Statistical Methods for Population Genomics and """"""""Next-gen"""""""" Sequencing Data Scheet, Paul A. / University of Texas MD Anderson Cancer Center	$399,128

Publications

Yu, Yao; Hu, Hao; Chen, Jiun-Sheng et al. (2018) Integrated case-control and somatic-germline interaction analyses of melanoma susceptibility genes. Biochim Biophys Acta Mol Basis Dis 1864:2247-2254

Deshpande, Aditya; Lang, Wenhua; McDowell, Tina et al. (2018) Strategies for identification of somatic variants using the Ion Torrent deep targeted sequencing platform. BMC Bioinformatics 19:5

Peng, Bo; Wang, Gao; Ma, Jun et al. (2018) SoS Notebook: an interactive multi-language data analysis environment. Bioinformatics 34:3768-3770

Liu, Yihua; Weber, Zachary; San Lucas, F Anthony et al. (2018) Assessing inter-component heterogeneity of biphasic uterine carcinosarcomas. Gynecol Oncol 151:243-249

Gausachs, Mireia; Borras, Ester; Chang, Kyle et al. (2017) Mutational Heterogeneity in APC and KRAS Arises at the Crypt Level and Leads to Polyclonality in Early Colorectal Tumorigenesis. Clin Cancer Res 23:5936-5947

Huang, Jing; Liu, Yulun; Vitale, Steve et al. (2017) On meta- and mega-analyses for gene-environment interactions. Genet Epidemiol 41:876-886

Sivakumar, Smruthy; Lucas, F Anthony San; McDowell, Tina L et al. (2017) Genomic Landscape of Atypical Adenomatous Hyperplasia Reveals Divergent Modes to Lung Adenocarcinoma. Cancer Res 77:6119-6130

San Lucas, F Anthony; Sivakumar, Smruthy; Vattathil, Selina et al. (2016) Rapid and powerful detection of subtle allelic imbalance from exome sequencing data with hapLOHseq. Bioinformatics 32:3015-7

Liu, Yulun; Chen, Yong; Scheet, Paul (2016) A meta-analytic framework for detection of genetic interactions. Genet Epidemiol 40:534-543

Jakubek, Yasminka; Lang, Wenhua; Vattathil, Selina et al. (2016) Genomic Landscape Established by Allelic Imbalance in the Cancerization Field of a Normal Appearing Airway. Cancer Res 76:3676-83

Showing the most recent 10 out of 22 publications

Comments

Be the first to comment on Paul Scheet's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: