Integrated Variation Detection Annotation and Analysis

Wang, Kai

Abstract

High-throughput sequencing (HTS) data on the genomes of a diverse number of species are being produced at an unprecedented rate. However, the development of computational and statistical approaches for handling these data lags behind, creating a gap between the massive data being generated and the biological knowledge that could be gleaned. Here we propose to develop an integrated system for genetic variation detection, annotation and analysis for HTS data, therefore reducing the critical gap faced by the community.
In Aim 1, we will develop a hidden Markov model (HMM) based computational algorithm that incorporates multiple sources of information, including sequence depth, allelic dosage, population allele frequency and paired-end reads distance, for reliable yet efficient detection of copy number variations (CNVs). Given a large list of SNPs, indels and CNVs, researchers are faced with the challenge of identifying a subset of functionally important variants.
In Aim 2, we will develop a comprehensive functional annotation pipeline to annotate functional importance of coding and non-coding variants, utilizing database information from many large-scale genomics projects, and generate a 'functional vector' for each variant. These functional vectors can help biologists interpret sequencing results and help statistical geneticists develop informed association tests using sequencing data. Appropriate statistical methods are needed to analyze population-level sequencing data, in order to identify genomic variants that may contribute to disease susceptibility or phenotypic variability.
In Aim 3, we will develop a hierarchical modeling strategy, which utilizes functional vector information for each variant, to perform association tests on genes, genomic regions, or biological pathways, such as ontology categories and gene regulatory/metabolic pathways. Finally, in Aim 4, we will test the properties of each approach via simulation and real data analysis, and develop, distribute and support freely available software packages implementing the proposed methods. We believe that well-documented and supported software implementations will allow other researchers to yield the maximum information from the methodological and scientific advances that result from this project. Successful completion of the aims will enable researchers to fully investigate the massive amounts of sequencing data that have been or will be generated, thus contributing to our understanding on how genetic variants influence phenotype variability.

Public Health Relevance

Despite the rapid advancement of high-throughput sequencing (HTS) techniques; the development of computational and statistical approaches for handling these data lags behind; creating a gap between the massive data being generated and the biological knowledge that could be gleaned. Here we propose to develop an integrated system to detect variants; annotate variants and analyze them for genotype-phenotype associations. Successful completion of the aims will enable researchers to fully investigate the massive amounts of sequencing data that have been or will be generated; thus contributing to our understanding on how genetic variants influence phenotype variability.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 7R01HG006465-06
Application #: 9402354
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Brooks, Lisa

Project Start: 2016-12-15
Project End: 2017-02-28
Budget Start: 2016-12-15
Budget End: 2017-02-28
Support Year: 6
Fiscal Year: 2016
Total Cost: $454,596
Indirect Cost: $160,826

Institution

Name: Columbia University (N.Y.)
Department: Internal Medicine/Medicine
Type: Schools of Medicine
DUNS #: 621889815

City: New York
State: NY
Country: United States
Zip Code: 10032

Related projects


NIH 2016 R01 HG	Integrated variation detection annotation and analysis for high-throughout seque Wang, Kai / University of Southern California	$363,000
NIH 2016 R01 HG	Integrated Variation Detection Annotation and Analysis Wang, Kai / Columbia University (N.Y.)	$454,596
NIH 2015 R01 HG	Integrated variation detection annotation and analysis for high-throughout seque Wang, Kai / University of Southern California
NIH 2014 R01 HG	Integrated variation detection annotation and analysis for high-throughout seque Wang, Kai / University of Southern California	$354,302
NIH 2013 R01 HG	Integrated variation detection annotation and analysis for high-throughout seque Wang, Kai / University of Southern California	$344,564
NIH 2012 R01 HG	Integrated variation detection annotation and analysis for high-throughout seque Wang, Kai / University of Southern California	$360,067

Publications

Khan, Atlas; Liu, Qian; Wang, Kai (2018) iMEGES: integrated mental-disorder GEnome score by deep neural network for prioritizing the susceptibility genes for mental disorders in personal genomes. BMC Bioinformatics 19:501

Son, Jung Hoon; Xie, Gangcai; Yuan, Chi et al. (2018) Deep Phenotyping on Electronic Health Records Facilitates Genetic Diagnosis by Clinical Exomes. Am J Hum Genet 103:58-73

Li, Quan; Wang, Kai (2017) InterVar: Clinical Interpretation of Genetic Variants by the 2015 ACMG-AMP Guidelines. Am J Hum Genet 100:267-280

Liu, Qian; Zhang, Peng; Wang, Depeng et al. (2017) Interrogating the ""unsequenceable"" genomic trinucleotide repeat disorders by long-read sequencing. Genome Med 9:65

Fang, Han; Wu, Yiyang; Yang, Hui et al. (2017) Whole genome sequencing of one complex pedigree illustrates challenges with genomic medicine. BMC Med Genomics 10:10

de Araújo Lima, Leandro; Wang, Kai (2017) PennCNV in whole-genome sequencing data. BMC Bioinformatics 18:383

Liu, Zehua; Lou, Huazhe; Xie, Kaikun et al. (2017) Reconstructing cell cycle pseudo time-series via single-cell transcriptome data. Nat Commun 8:22

Zhao, Jian; Song, Xiaofeng; Wang, Kai (2016) lncScore: alignment-free identification of long noncoding RNA from assembled novel transcripts. Sci Rep 6:34838

Ding, Xiao-Lei; Yang, Xiaojing; Liang, Gangning et al. (2016) Isoform switching and exon skipping induced by the DNA methylation inhibitor 5-Aza-2'-deoxycytidine. Sci Rep 6:24545

Dong, Chengliang; Guo, Yunfei; Yang, Hui et al. (2016) iCAGES: integrated CAncer GEnome Score for comprehensively prioritizing driver genes in personal cancer genomes. Genome Med 8:135

Showing the most recent 10 out of 42 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: