In recent years, genome sequencing has become easier, cheaper, and significantly faster. So, not surprisingly, the number of available, fully sequenced genomes continues to increase at an exponential rate. To keep pace with the flood of new data, it is important to have efficient comparative genomic resources on hand. Of chief importance to comparative genomics methods is the ability to determine functionally equivalent genes (orthologs). Yet, to date, there is no resource that maintains an inventory of orthologs for all available genomes. Of the databases that do host orthology data, the largest provides coverage for less than 25% of the total genomes now sequenced. This proposal aims to fill this gap by building a comprehensive and expandable ortholog repository that will keep pace with the rate of genome sequencing.
The specific aims of the research are to: 1. Build out an improved algorithm for ortholog detection, and use this to amass a database of orthologs and associated evolutionary distances that matches the number of genomes presently available; 2. Develop an interrogation platform that allows a user to explore this inventory of comparative genomics data in detail to address hypotheses in comparative genomics and other fields; 3. Develop a graphical user interface to allow easy access by any research biologist world-wide via the Web. The research will result in a powerful and flexible comparative genomic framework to enable biologists to explore, at a whole-genomic level, patterns of genetic diversification across many organisms. In addition to being a massive repository of orthologous sequences and evolutionary distances the resource will be an active research tool capable of calculating new sets of orthologs between two genomes that have not already been compared, or that have not been compared using a particular set of parameters. This will ensure that the resource be dynamic and expandable to keep pace with the rapid rate of genome sequencing and that it be 'evolvable'to allow exploration through parameter space that may yield novel results of interest, such as putative orthologs that are highly divergent with the exception of a small domain. And as a publicly expandable repository, the tool will help to quicken the pace of research by preventing duplication of effort - i.e. orthologs between two species only need be computed once for a particular set of parameters. The tool will also allow a diversity of query types to address a variety of biological hypotheses, hopefully leading to exciting discoveries in both the medical and basic sciences.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Small Research Grants (R03)
Project #
3R03LM009261-02S1
Application #
7872681
Study Section
Special Emphasis Panel (ZLM1-ZH-R (J2))
Program Officer
Ye, Jane
Project Start
2009-08-01
Project End
2010-07-31
Budget Start
2009-08-01
Budget End
2010-07-31
Support Year
2
Fiscal Year
2009
Total Cost
$42,250
Indirect Cost
Name
Harvard University
Department
Miscellaneous
Type
Schools of Medicine
DUNS #
047006379
City
Boston
State
MA
Country
United States
Zip Code
02115
Jung, Jae-Yoon; DeLuca, Todd F; Nelson, Tristan H et al. (2014) A literature search tool for intelligent extraction of disease-associated genes. J Am Med Inform Assoc 21:399-405
Nelson, Tristan H; Jung, Jae-Yoon; Deluca, Todd F et al. (2012) Autworks: a cross-disease network biology application for Autism and related disorders. BMC Med Genomics 5:56
DeLuca, Todd F; Cui, Jike; Jung, Jae-Yoon et al. (2012) Roundup 2.0: enabling comparative genomics for over 1800 genomes. Bioinformatics 28:715-6
Cui, Jike; DeLuca, Todd F; Jung, Jae-Yoon et al. (2011) Phylogenetically informed logic relationships improve detection of biological network organization. BMC Bioinformatics 12:476
Cui, Jike; DeLuca, Todd F; Jung, Jae-Yoon et al. (2011) Detecting biological network organization and functional gene orthologs. Bioinformatics 27:2919-20
Wall, Dennis P; Kudtarkar, Parul; Fusaro, Vincent A et al. (2010) Cloud computing for comparative genomics. BMC Bioinformatics 11:259
Kudtarkar, Parul; Deluca, Todd F; Fusaro, Vincent A et al. (2010) Cost-effective cloud computing: a case study using the comparative genomics tool, roundup. Evol Bioinform Online 6:197-203