Emerging sequencing technologies have made whole-genome sequencing become available for researches to study various phenotypes/diseases of interest, particularly focusing on rare variants sites. Although the first batch of sequencing projects has mainly focused on the analysis of unrelated individuals, numerous sequencing studies including related individuals have been carried out or launched recently as the sequencing cost reduces rapidly. However, the methodologies for analyzing family-based sequence data are largely falling behind partially due to the complexity of family structures and computational barrier. In this study, our primary goals are to efficiently and accurately infer individual genotypes and haplotypes - the key component of any sequencing project - by combining information from both family and population levels, and to study how differential sequencing errors will affect downstream association analysis. To achieve these goals, we propose specific aims as follows: 1) We will propose a novel statistical framework for genotyping calling and haplotype inference of sequence data including relative individuals. The new method takes advantages of both short stretches shared between unrelated individuals and long stretches shared between family members in a computationally feasible manner while retaining a high degree of accuracy via the synergy between two classic approaches: hidden Markov model (HMM) for linkage disequilibrium information and Lander-Green algorithm for inheritance vectors;2) We will develop an exact algorithm for HMM computation to speed up a class of widely use genetics programs, including the method developed in Aim 1, without any sacrifice of accuracy;3) We will assess the impact of sequencing errors on family-based association methods for rare variants and use the intrinsic stochastic nature of the proposed methods in Aim 1 to reduce the false positives under a framework of multiple imputation;4) We will test and recalibrate our developed methods in collaboration with ongoing sequencing projects and systematically investigate different study designs. Successful completion of these aims will yield state-of-the-art statistical methods and software, which will facilitate the fast growing sequencing projects including family members and guide the design and analysis of future studies.

Public Health Relevance

Next generation sequencing studies have been widely conducted to identify rare variants associated with complex diseases. We will develop several statistical and computational methods, including genotype calling and association analysis, to facilitate the analysis of both population and family-based sequence data for ongoing and future sequencing projects.

National Institute of Health (NIH)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Brooks, Lisa
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pittsburgh
Schools of Medicine
United States
Zip Code
Chen, Han; Wang, Chaolong; Conomos, Matthew P et al. (2016) Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models. Am J Hum Genet 98:653-66
Yan, Qi; Chen, Rui; Sutcliffe, James S et al. (2016) The impact of genotype calling errors on family-based studies. Sci Rep 6:28323
Wang, Ting; Ren, Zhao; Ding, Ying et al. (2016) FastGGM: An Efficient Algorithm for the Inference of Gaussian Graphical Model in Biological Networks. PLoS Comput Biol 12:e1004755
Fan, Ruzong; Wang, Yifan; Yan, Qi et al. (2016) Gene-Based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions. Genet Epidemiol 40:133-43
Fan, Ruzong; Wang, Yifan; Chiu, Chi-Yang et al. (2016) Meta-analysis of Complex Diseases at Gene Level with Generalized Functional Linear Models. Genetics 202:457-70
Zeng, Zhen; Weeks, Daniel E; Chen, Wei et al. (2016) A Pipeline for Classifying Relationships Using Dense SNP/SNV Data and Putative Pedigree Information. Genet Epidemiol 40:161-71
Chang, Lun-Ching; Li, Bingshan; Fang, Zhou et al. (2016) A computational method for genotype calling in family-based sequencing data. BMC Bioinformatics 17:37
Fan, Ruzong; Wang, Yifan; Boehnke, Michael et al. (2015) Gene Level Meta-Analysis of Quantitative Traits by Functional Linear Models. Genetics 200:1089-104
Xu, Zheng; Duan, Qing; Yan, Song et al. (2015) DISSCO: direct imputation of summary statistics allowing covariates. Bioinformatics 31:2434-42
Wei, Qiang; Zhan, Xiaowei; Zhong, Xue et al. (2015) A Bayesian framework for de novo mutation calling in parents-offspring trios. Bioinformatics 31:1375-81

Showing the most recent 10 out of 14 publications