Statistical Methods to Analyze Resequencing Data

Qin, Zhaohui

Abstract

Identification of genetic factors that contributing to complex diseases is one of the grant challenges in the post-genomic era. A series of exciting new findings were made recently using the genome wide association study (GWAS) design. However, moving from confirmed association signal to the collection of causal variants at a given locus poses significant challenges. A desirable follow-up strategy of GWAS is to conduct a comprehensively resequencing analysis at the genomic regions of interest. This will allow scientists to comprehensively discover and study all sequence variants, which greatly increase the chance of identifying new disease causing mutations. Rapid advances in the next generation sequencing technologies are making such a strategy increasingly feasible. Novel statistical methods need to be developed in order to analyze data generated from these new sequencing instruments. In this proposal, we focus on identifying single nucleotide polymorphisms (SNPs) from resequencing data generated from the Illumina Genome Analyzer platform. First, we will develop a probability-based model that allow us to simultaneously perform mapping of multi- mapped short sequencing reads, identifying sequencing errors, and calling SNPs and their genotypes. Since our method will be developed under the Bayesian framework, additional information such as the genotypes obtained from GWAS can be incorporated as informative priors to improve our inference. Second, we will develop a probability- based approach that combine sequencing read data at selected loci from multiple individuals to improve SNP and genotype calling. The goal is to borrow strength among a pool of samples to resolve ambiguity at loci with low sequencing depth. We will implement our statistical methods in freely available software tools to facilitate analysis of targeted resequencing studies. Finally, we plan to apply our methods on data generated from real targeted resequencing studies that is being planned for psoriasis and type 2 diabetes through collaboration.

Public Health Relevance

Next generation sequencing technologies facilitate large scale resequencing studies which offer us better chances of identifying disease-causing mutations. In this proposal, we will develop novel statistical methods for the identification of genetic variants from the so called ultra-high-throughput sequencing data. When completed, software tools and methods will be made freely available to allow better analysis of data generated from resequencing studies.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Exploratory/Developmental Grants (R21)
Project #: 5R21HG004751-02
Application #: 8149999
Study Section: Genetic Variation and Evolution Study Section (GVE)
Program Officer: Brooks, Lisa

Project Start: 2010-09-27
Project End: 2014-07-31
Budget Start: 2011-08-01
Budget End: 2014-07-31
Support Year: 2
Fiscal Year: 2011
Total Cost: $192,719
Indirect Cost

Institution

Name: Emory University
Department: Biostatistics & Other Math Sci
Type: Schools of Public Health
DUNS #: 066469933

City: Atlanta
State: GA
Country: United States
Zip Code: 30322

Related projects


NIH 2011 R21 HG	Statistical Methods to Analyze Resequencing Data Qin, Zhaohui Steve / Emory University	$192,719
NIH 2010 R21 HG	Statistical Methods to Analyze Resequencing Data Qin, Zhaohui Steve / Emory University	$192,719

Publications

Johnston, Henry Richard; Hu, Yi-Juan; Gao, Jingjing et al. (2017) Identifying tagging SNPs for African specific genetic variation from the African Diaspora Genome. Sci Rep 7:46398

Mathias, Rasika Ann; Taub, Margaret A; Gignoux, Christopher R et al. (2016) A continuum of admixture in the Western Hemisphere revealed by the African Diaspora genome. Nat Commun 7:12522

Kessler, Michael D; Yerges-Armstrong, Laura; Taub, Margaret A et al. (2016) Challenges and disparities in the application of personalized genomic medicine to populations with African ancestry. Nat Commun 7:12521

Yuan, Shuai; Johnston, H Richard; Zhang, Guosheng et al. (2015) One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies. PLoS Comput Biol 11:e1004448

Yang, Rendong; Chen, Li; Newman, Scott et al. (2014) Integrated analysis of whole-genome paired-end and mate-pair sequencing data for identifying genomic structural variations in multiple myeloma. Cancer Inform 13:49-53

Yuan, Shuai; Qin, Zhaohui (2012) Read-mapping using personalized diploid reference genome for RNA sequencing data reduced bias for detecting allele-specific expression. IEEE Int Conf Bioinform Biomed Workshops 2012:718-724

Comments

Be the first to comment on Zhaohui Qin's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: