Next-Generation Bioinformatics for Next-Generation Sequencing

Yu, Fuli; Zhang, Kui; Zhi, Degui

Abstract

The genetics and genomics communities are advancing rapidly in the Next-Generation Sequencing (NGS) era. The identification of both common and rare genetic variants from large-cohort studies and Mendelian studies provides new opportunities to elucidate disease etiologies and underlying molecular mechanisms. That ultimately will lead to novel and personalized diagnostics, prognostics and therapeutic treatments. However, significant analytical challenges remain: (1) the discovery and haplotype phasing of rare variants remain difficult; (2) data analysis is fragmented when multiple datasets [SNP arrays, whole-exome sequencing (WES), and/or low-coverage whole-genome sequencing (WGS)] are available; and (3) bioinformatics methods and software are difficult to use for average users: there is no unified bioinformatics framework and many different tool sets are needed for an end-to-end process. Advanced computational and statistical methods and friendly software are urgently needed to meet the demand of the community. The overall goal of this application is to develop an integrative and novel analytical framework that can significantly improve the sensitivity and accuracy of rare variant discovery and haplotype phasing and harmonize multiple datasets in genomics studies. In order to do so, the following specific aims will be pursued: 1) Develop a framework for improvement of rare variant discovery and haplotype phasing using read information. 2) Develop a framework for integrating multiple genetic variation datasets. 3) Validate genotyping and phasing of rare variants for pipeline optimization and cross-evaluation between different methods using simulated and experimental data. 4) Develop software packages with Cloud deployment for the community. The approaches are innovative because they utilize novel concepts and methods to improve the accuracy of genotype calling and haplotype phasing from NGS data and to integrate multiple types of genotype data. Successful accomplishment of our proposed aims will dramatically improve the sensitivity and accuracy in rare variant discovery and phasing, expediting the understanding the genetic architecture of human diseases.

Public Health Relevance

Next generation sequencing technologies hold great promise for identifying causal genetic variants for human diseases but also pose daunting challenges for analytical and bioinformatics development. In this application, we will develop comprehensive statistical methods to improve accuracy of genotype calling and phasing of rare variants, develop a comprehensive framework for integrating multiple types of genotype data and sequencing data, and deploy Cloud based software tools as a cyber- infrastructure to serve the community. The proposed research is relevant to public health and the mission of NIH because the accomplishment of our proposed work is expected to facilitate the identification of genetic variants underlying human diseases, and help us to understand, prevent, diagnose, and treat these diseases.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG008115-03
Application #: 9097730
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Brooks, Lisa

Project Start: 2014-09-10
Project End: 2017-06-30
Budget Start: 2016-07-01
Budget End: 2017-06-30
Support Year: 3
Fiscal Year: 2016
Total Cost
Indirect Cost

Institution

Name: Baylor College of Medicine
Department: Genetics
Type: Schools of Medicine
DUNS #: 051113330

City: Houston
State: TX
Country: United States
Zip Code: 77030

Related projects


NIH 2016 R01 HG	Next-Generation Bioinformatics for Next-Generation Sequencing Yu, Fuli; Zhang, Kui; Zhi, Degui / Baylor College of Medicine
NIH 2015 R01 HG	Next-Generation Bioinformatics for Next-Generation Sequencing Yu, Fuli; Zhang, Kui; Zhi, Degui / Baylor College of Medicine
NIH 2014 R01 HG	Next-Generation Bioinformatics for Next-Generation Sequencing Yu, Fuli; Zhang, Kui; Zhi, Degui / Baylor College of Medicine	$465,000

Publications

Yang, Xinyu; Li, Jiani; Fang, Yabo et al. (2018) Rho Guanine Nucleotide Exchange Factor ARHGEF17 Is a Risk Gene for Intracranial Aneurysms. Circ Genom Precis Med 11:e002099

Rasmy, Laila; Wu, Yonghui; Wang, Ningtao et al. (2018) A study of generalizability of recurrent neural network-based predictive models for heart failure onset risk using a large and heterogeneous EHR data set. J Biomed Inform 84:11-16

Chen, Yiyun; Bartanus, Justin; Liang, Desheng et al. (2017) Characterization of chromosomal abnormalities in pregnancy losses reveals critical genes and loci for human early development. Hum Mutat 38:669-677

Dai, Hongying; Wu, Guodong; Wu, Michael et al. (2016) An Optimal Bahadur-Efficient Method in Detection of Sparse Signals with Applications to Pathway Analysis in Sequencing Association Studies. PLoS One 11:e0152667

Xue, Cheng; Chen, Hua; Yu, Fuli (2016) Base-Biased Evolution of Disease-Associated Mutations in the Human Genome. Hum Mutat 37:1209-1214

Huang, Zhuoyi; Rustagi, Navin; Veeraraghavan, Narayanan et al. (2016) A hybrid computational strategy to address WGS variant analysis in >5000 samples. BMC Bioinformatics 17:361

Zhi, Degui; Liu, Nianjun; Zhang, Kui (2015) On the design and analysis of next-generation sequencing genotyping for a cohort with haplotype-informative reads. Methods 79-80:41-6

Geng, Xin; Sha, Jin; Liu, Shikai et al. (2015) A genome-wide association study in catfish reveals the presence of functional hubs of related genes within QTLs for columnaris disease resistance. BMC Genomics 16:196

Challis, Danny; Antunes, Lilian; Garrison, Erik et al. (2015) The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes. BMC Genomics 16:143

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: