Although numerous common variants have been identified for human complex traits in the past few years, a large proportion of the heritability o these traits remains unexplained. Next-generation sequencing is currently being employed to uncover the full spectrum of genetic variations with a particular focus on identifying low frequency variants (e.g. minor allele frequency (MAF) between 1-5%) and rare variants (e.g. MAF<1%) associated with complex traits. However identification of associated rare variants is extremely challenging due to the low frequency and allelic heterogeneity. Therefore it is crucial to develop effective designs and efficient analytical and computational tools to address these difficulties. Although case-control studies were extensively used for association studies of common variations, family designs provide an effective alternative for rare variant analysis due to the enrichment of causal rare variants in pedigrees. In addition, family studies are robust in the presence of population stratification, a property that is essential since routinely used methods for common variants may fail to correct for population stratification of rare variants. For family sequencing studies, a critical step is to infer underlying genotypes from sequence data and inaccurate genotype calls can lead to Mendelian inconsistencies and power loss of association studies. To address these challenges, in this application we propose to develop a comprehensive suite of statistical and computational methods for genotype/haplotype inference from family sequencing data and for rare variant association analysis in families. Using these methods we will carry out simulations to investigate cost efficiency of various family designs in comparison with case-control studies for improved power of detecting rare variant associations. We will also apply our methods to sequence data in our Amish family study and datasets from our collaborators on multiple complex traits. User-friendly and well-documented software packages will be released for public use.
Large-scale sequencing studies are being widely carried out to identify less common and rare variants associated with complex diseases and disease-related traits. However, this strategy is challenging and little is known about effective approaches to discover rare variants from sequencing and to identify disease- associated rare variants. In this application, we aim to develop a comprehensive suite of statistical methods and computational tools for variant calling and rare variant association studies from sequencing data in families and to apply our methods to studies of multiple complex diseases.
|Yan, Qi; Chen, Rui; Sutcliffe, James S et al. (2016) The impact of genotype calling errors on family-based studies. Sci Rep 6:28323|
|Liu, Yongzhuang; Liu, Jian; Lu, Jianguo et al. (2016) Joint detection of copy number variations in parent-offspring trios. Bioinformatics 32:1130-7|
|Chang, Lun-Ching; Li, Bingshan; Fang, Zhou et al. (2016) A computational method for genotype calling in family-based sequencing data. BMC Bioinformatics 17:37|
|Yan, Qi; Weeks, Daniel E; CeledÃ³n, Juan C et al. (2015) Associating Multivariate Quantitative Phenotypes with Genetic Variants in Family Samples with a Novel Kernel Machine Regression Method. Genetics 201:1329-39|
|Wei, Qiang; Zhan, Xiaowei; Zhong, Xue et al. (2015) A Bayesian framework for de novo mutation calling in parents-offspring trios. Bioinformatics 31:1375-81|
|Chen, Rui; Wei, Qiang; Zhan, Xiaowei et al. (2015) A haplotype-based framework for group-wise transmission/disequilibrium tests for rare variant association analysis. Bioinformatics 31:1452-9|
|Li, Bingshan; Wei, Qiang; Zhan, Xiaowei et al. (2015) Leveraging Identity-by-Descent for Accurate Genotype Inference in Family Sequencing Data. PLoS Genet 11:e1005271|
|Li, Bingshan; Liu, Dajiang J; Leal, Suzanne M (2013) Identifying rare variants associated with complex traits via sequencing. Curr Protoc Hum Genet Chapter 1:Unit 1.26|