Genome-wide association studies (GWAS) provide a new and powerful approach to investigate the effect of inherited genetic variation on risks of complex diseases. With recent advances in genotyping technology, genome-wide association studies are now becoming a reality. Data from GWAS are expected in an accelerated rate. Despite tremendous efforts in developing efficient algorithms for mapping complex diseases/traits, single-locus based approaches are still the primary method for GWAS. However, it is known that usually multiple genetic factors, environmental factors as well as their interactions play an important role in the etiology of complex diseases. Novel and practical approaches to simultaneously model multiple variables and their interactions from hundreds of thousands single nucleotide polymorphisms (SNPs) are greatly needed. In this project, we propose to develop efficient algorithms and practical statistical tools to address two important problems in the context of genome- wide association studies: multi-point analysis and multi-locus analysis. For multi-point analysis, our Dynamic Hidden Chain Markov Model (DHCMM) can jointly model historical recombination and muta- tions, haplotype structures and frequencies, and associations, which is expected to be more effective than existing approaches. For multi-locus analysis, we propose to use an advanced machine learning approach to jointly screen SNPs that are predictive of diseases. Our integrated software system MAVEN will facilitate management, analysis, visualization and results sharing of GWA data using cut- ting edge technologies. The true value of GWAS is pending the development of effective computational models and tools. We anticipate that this research project will greatly accelerate the understanding of the genetic architecture of complex diseases.

Public Health Relevance

Li, Jing Title: Multi-point and multi-locus analysis of genomic association data Abstract: Genome-wide association studies (GWAS) provide a new and powerful approach to inves- tigate the effect of inherited genetic variation on risks of complex diseases. With recent advances in genotyping technology, genome-wide association studies are now becoming a reality. Data from GWAS are expected in an accelerated rate. Despite tremendous efforts in developing efficient algorithms for mapping complex diseases/traits, single-locus based approaches are still the primary method for GWAS. However, it is known that usually multiple genetic factors, environmental factors as well as their interactions play an important role in the etiology of complex diseases. Novel and practical approaches to simultaneously model multiple variables and their interactions from hundreds of thousands single nucleotide polymorphisms (SNPs) are greatly needed. In this project, we propose to develop efficient algorithms and practical statistical tools to address two important problems in the context of genome- wide association studies: multi-point analysis and multi-locus analysis. For multi-point analysis, our Dynamic Hidden Chain Markov Model (DHCMM) can jointly model historical recombination and muta- tions, haplotype structures and frequencies, and associations, which is expected to be more effective than existing approaches. For multi-locus analysis, we propose to use an advanced machine learn- ing approach to jointly screen SNPs that are predictive of diseases. Our integrated software system MAVEN will facilitate management, analysis, visualization and results sharing of GWA data using cut- ting edge technologies. The true value of GWAS is pending the development of effective computational models and tools. We anticipate that this research project will greatly accelerate the understanding of the genetic architecture of complex diseases. PHS 398 Page 1

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
2R01LM008991-04
Application #
7652746
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2006-03-15
Project End
2011-07-31
Budget Start
2009-08-01
Budget End
2010-07-31
Support Year
4
Fiscal Year
2009
Total Cost
$951,009
Indirect Cost
Name
Case Western Reserve University
Department
Engineering (All Types)
Type
Schools of Engineering
DUNS #
077758407
City
Cleveland
State
OH
Country
United States
Zip Code
44106
Wang, Wenhui; Yang, Sen; Zhang, Xiang et al. (2014) Drug repositioning by integrating target information through a heterogeneous network model. Bioinformatics 30:2923-30
Wang, Wei-Bung; Jiang, Tao; Gardner, Shea (2013) Detection of homologous recombination events in bacterial genomes. PLoS One 8:e75230
Wang, Wenhui; Yin, Xiaolin; Soo Pyon, Yoon et al. (2013) Rare variant discovery and calling by sequencing pooled samples with overlaps. Bioinformatics 29:29-38
Wang, Wenhui; Yang, Sen; Li, Jing (2013) Drug target predictions based on heterogeneous graph inference. Pac Symp Biocomput :53-64
Hayes, Matthew; Li, Jing (2013) Bellerophon: a hybrid method for detecting interchromosomal rearrangements at base pair resolution using next-generation sequencing data. BMC Bioinformatics 14 Suppl 5:S6
Azad, Rajeev K; Li, Jing (2013) Interpreting genomic data via entropic dissection. Nucleic Acids Res 41:e23
Xie, Minzhu; Li, Jing; Jiang, Tao (2012) Detecting genome-wide epistases based on the clustering of relatively frequent items. Bioinformatics 28:5-12
Hayes, Matthew; Pyon, Yoon Soo; Li, Jing (2012) A model-based clustering method for genomic structural variant prediction and genotyping using paired-end sequencing data. PLoS One 7:e52881
Chen, Yixuan; Li, Jing (2012) Generation of synthetic data and experimental designs in evaluating interactions for association studies. J Bioinform Comput Biol 10:1240005
Pirola, Yuri; Bonizzoni, Paola; Jiang, Tao (2012) An efficient algorithm for haplotype inference on pedigrees with recombinations and mutations. IEEE/ACM Trans Comput Biol Bioinform 9:12-25

Showing the most recent 10 out of 54 publications