Phylogenetic analysis is crucial to a wide range of biological and medical research. A new type of data based on gene order and gene content within whole genomes has attracted increasing interest from researchers in the past several years.
Specific Aims : The complexity of genome evolution poses many exciting challenges to developers of mathematical models and algorithms. However, the current tools can only be applied to small genomes (such as organelle genomes) evolving via very simple rearrangements events, hence their breadth of usage is limited. We will address these problems by: (1) mathematical modeling and theoretical analysis of complex evolutionary events such as gene duplication and loss;(2) algorithm design and implementation for phylogenetics and gene order reconstruction (3) performance assessment of these new algorithms through extensive testing on simulated and biological datasets;(4) high-performance implementation of the algorithms using algorithm engineering techniques and a flexible approach to parallelization. Contributions and Broader Impact: The broader impacts of the proposed project are several. (1) The development of new theories and algorithms for the efficient reconstruction of phylogenies and inference of ancestral genomes based on complex genome rearrangements will considerably enlarge the scope of research in the field and give rise to interesting new problems in mathematical and computational biology. (2) Efficient and accurate software for phylogenetic analysis and genome comparison, tested on a large variety of real datasets and on an extensive range of simulations, is expected to reveal new evolutionary patterns and to enable the investigation of novel biological questions. (3) A web server hosted by our group (or by our collaborators) will enable biologists to submit their datasets through a user-friendly web interface and get results back within reasonable amount of time, without the burden of installation and learning parallel computation. (4) The project team combines expertise in mathematic modeling, algorithm design, high-performance computing, comparative genomics, and phylogenetics. Students (both undergraduate and graduate) and postdocs on this project will receive valuable interdisciplinary training experience. (5) Both universities have established programs to boost research in computational biology. This project will enable the PIs to establish close interdisciplinary collaborations among departments from both universities and recruit graduate students (especially minorities) to this fast-growing research field.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
3R01GM078991-03S1
Application #
7942522
Study Section
Special Emphasis Panel (ZGM1-CBCB-5 (BM))
Program Officer
Remington, Karin A
Project Start
2009-09-30
Project End
2010-05-31
Budget Start
2009-09-30
Budget End
2010-05-31
Support Year
3
Fiscal Year
2009
Total Cost
$88,299
Indirect Cost
Name
University of South Carolina at Columbia
Department
Engineering (All Types)
Type
Schools of Engineering
DUNS #
041387846
City
Columbia
State
SC
Country
United States
Zip Code
29208
Flagel, Lex E; Willis, John H; Vision, Todd J (2014) The standing pool of genomic structural variation in a natural population of Mimulus guttatus. Genome Biol Evol 6:53-64
Lin, Yu; Hu, Fei; Tang, Jijun et al. (2013) Maximum likelihood phylogenetic reconstruction from high-resolution whole-genome data and a tree of 68 eukaryotes. Pac Symp Biocomput :285-96
Czabarka, E; Erd?s, P L; Johnson, V et al. (2013) Generating Functions for Multi-labeled Trees. Discrete Appl Math 161:107-117
Luo, Haiwei; Arndt, William; Zhang, Yiwei et al. (2012) Phylogenetic analysis of genome rearrangements among five mammalian orders. Mol Phylogenet Evol 65:871-82
Székely, L A; Wang, Hua; Wu, Taoyang (2011) The sum of the distances between the leaves of a tree and the 'semi-regular' property. Discrete Math 311:1197-1203
Burleigh, J Gordon; Bansal, Mukul S; Eulenstein, Oliver et al. (2011) Genome-scale phylogenetics: inferring the plant tree of life from 18,896 gene trees. Syst Biol 60:117-25
Luo, Haiwei; Tang, Jijun; Friedman, Robert et al. (2011) Ongoing purifying selection on intergenic spacers in group A streptococcus. Infect Genet Evol 11:343-8
Asbury, Thomas M; Mitman, Matt; Tang, Jijun et al. (2010) Genome3D: a viewer-model framework for integrating and visualizing multi-scale epigenomic information within a three-dimensional genome. BMC Bioinformatics 11:444
Shi, Jian; Zhang, Yiwei; Luo, Haiwei et al. (2010) Using jackknife to assess the quality of gene order phylogenies. BMC Bioinformatics 11:168
Yue, Feng; Shi, Jian; Tang, Jijun (2009) Simultaneous phylogeny reconstruction and multiple sequence alignment. BMC Bioinformatics 10 Suppl 1:S11

Showing the most recent 10 out of 26 publications