This research program focuses primarily on the computational analysis of the homeodomain group of proteins, which play a fundamental role in the specification of body plan, pattern formation, and cell fate determination during metazoan development. A variety of bioinformatic approaches are used to understand the evolution and function of these proteins and their ultimate role in human disease. Homeobox (or Hox) genes are organized in conserved genomic clusters across a range of phylogenetic taxa. Over evolutionary time, the functional diversification of these Hox genes has contributed to the diversification of animal body plans. Building upon prior work on the origin and early evolution of these Hox genes, our focus has turned to analyzing the genomes of early-branching metazoan phyla to better-understand the relationship between genomic complexity and morphological complexity, as well as the molecular basis for the evolution of novel cell types. Given that Ctenophora was the last remaining non-bilaterian phylum lacking a species with a sequenced genome (and, therefore, a completely examined homeobox repertoire), we used next-generation sequencing approaches to sequence and assemble the 150 MB of the lobate ctenophore, Mnemiopsis leydii. Using these data, we were able to identify a set of 76 homeobox-containing genes in Mnemiopsis. We phylogenetically categorized this set into established gene families and classes, and were able to determine that several important classes and subclasses of homeodomains are absent from both Mnemiopsis and from the poriferan Amphimedon queenslandica. As the first study to compare the complete homeobox catalog of species from all of the non-bilaterian phyla, along with that of the two major bilaterian lineages where complete genomic sequence data is available, this work provides a major missing piece of evidence that is critical to understanding the makeup of the homeodomain family in early metazoan history. These data, in turn, enable us to begin to decipher what role the expansion of the homeodomain superfamily has played in the evolution of animal phyla. This work has also enabled us to evaluate the congruency of the homeodomain data with recently proposed phylogenetic relationships of the early branching phyla. Our results suggest that Porifera and Ctenophora were the first two extant lineages to diverge from the rest of the animals. We have taken advantage of having this high-quality sequence data in-hand to investigate the evolution of additional protein families that play a critical role in human metabolic processes and development. First, with our collaborators at the University of Hawaii, we examined the Wnt/beta-catenin signaling pathway. Molecular phylogenetic analysis shows four distinct Wnt ligands, and most (but not all) components of the receptor and intracellular signaling pathways were detected. Notably absent in the Mnemiopsis geneome are most major secreted antagonists, which suggests that complex regulation of this secreted signaling pathway likely evolved later in animal evolution. With our collaborators at the Woods Hole Oceanographic Institute, we focused on nuclear receptors (NRs), which play key roles in the regulation of reproduction, development, and energetic homeostasis. Using phylogenomic approaches, we found that all ctenophore NRs lacked the highly conserved DNA-binding domain that has heretofore been characteristic of nuclear receptors. This may reflect an ancestral NR domain structure or a lineage-specific loss of this domain from an ancestral NR that contained the DNA-binding domain. Phylogenetic analyses of NRs support the idea that expansion of the NR superfamily occurred in a stepwise fashion. As an outgrowth of our studies on the homeodomain class of proteins, we have developed and continue to maintain the Homeodomain Resource. The Homeodomain Resource is a curated collection of sequence, structure, interaction, genomic and functional information on the homeodomain family. The current version builds upon previous versions by the addition of new, complete sets of homeodomain sequences from fully sequenced genomes, the expansion of existing curated homeodomain information and the improvement of data accessibility through better search tools and more complete data integration. The current release contains 1536 full-length homeodomain-containing sequences, 107 experimentally derived homeodomain structures, 101 homeodomain proteinprotein interactions, 107 homeodomain DNA-binding sites, 53 homeodomain proteins with documented allelic variants, and 186 homeodomain proteins implicated in human genetic disorders. The Homeodomain Resource is freely available at http:/research.nhgri.nih.gov/homeodomain/.
Showing the most recent 10 out of 20 publications