Whole genome sequencing projects of human and other vertebrates have greatly advanced comparative genomics, which led to novel biological discoveries. Our long-term research goal is to use comparative genomics to elucidate the trajectory of vertebrate genome evolution and the origin of complex traits of different species. Such insights will in turn help us better understand the biology of the human genome. Advances in next-generation sequencing (NGS) technologies have provided us with unprecedented opportunities to tackle this problem. However, the large number of genomes being sequenced and the limitations of genome quality produced by NGS have underlined urgent needs for new computational methods to address several pressing challenges for the new generation of comparative genomic analysis. The objective in this particular application is to develop new computational methods to improve the accuracy of whole- genome comparisons for vertebrate genomes. We have two specific aims: (1) To develop a comparative assembly algorithm to improve vertebrate genomes assembled from NGS data;(2) To develop a probabilistic framework to improve the quality of multiple sequence alignments for vertebrate genomes. Our research plan is innovative because it provides novel algorithms and software tools to systematically improve the foundations for genome comparisons. The research is significant because the methods to be developed will allow researchers to more effectively utilize the new genome sequencing data. The proposed research will have sustained impact even with the increasing number of genomes and the advancement of sequencing technology. By improving the general methodology for next-generation comparative genomics, our work will have a high impact on large-scale genome projects such as G10K and ENCODE. As a result, this innovative project in computational biology will enable advancement in biomedical research.

Public Health Relevance

The proposed research in computational biology is expected to improve comparative genomic analysis to help better understand human biology and disease mechanisms. Thus, this project is relevant to NIH's mission that seeks to obtain fundamental knowledge that will help to enhance health.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Wellington, Christopher
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Illinois Urbana-Champaign
Engineering (All Types)
Biomed Engr/Col Engr/Engr Sta
United States
Zip Code
Tian, Dechao; Gu, Quanquan; Ma, Jian (2016) Identifying gene regulatory network rewiring using latent differential graphical models. Nucleic Acids Res 44:e140
Hou, Jack P; Emad, Amin; Puleo, Gregory J et al. (2016) A new correlation clustering method for cancer mutation analysis. Bioinformatics :
He, Feifei; Li, Yang; Tang, Yu-Hang et al. (2016) Identifying micro-inversions using high-throughput sequencing reads. BMC Genomics 17 Suppl 1:4
Li, Yang; Zhou, Shiguo; Schwartz, David C et al. (2016) Allele-Specific Quantification of Structural Variations in Cancer Genomes. Cell Syst 3:21-34
Heo, Yun; Ramachandran, Anand; Hwu, Wen-Mei et al. (2016) BLESS 2: accurate, memory-efficient and fast error correction method. Bioinformatics 32:2369-71
Kim, Young-Chae; Byun, Sangwon; Zhang, Yang et al. (2015) Liver ChIP-seq analysis in FGF19-treated mice reveals SHP as a global transcriptional partner of SREBP-2. Genome Biol 16:268
Earl, Dent; Nguyen, Ngan; Hickey, Glenn et al. (2014) Alignathon: a competitive assessment of whole-genome alignment methods. Genome Res 24:2077-89