With the completion and public availability of the human genome sequence, it is now possible to perform large-scale, comprehensive genome analyses that were not possible even a few years ago. As the sequence has progressed from a working draft to a finished state, many groups have developed tools to annotate this sequence, thereby making it even more useful to the scientific community. My research focuses on developing methodologies to integrate, in an automated manner, these diverse sequence and annotation data with experimentally-generated data so that bench biologists can quickly and easily obtain results for their own large-scale, genome-wide experiments. The goal of one of my research projects is to take advantage of the publicly available set of sequence and annotations to develop automated tools for the computational characterization of experimentally identified genomic sequences. We align each sequence to the reference human genome assembly to determine its genomic location, and then compare the coordinates of this sequence to the coordinates of a variety of genome annotations. Using this approach, we can assign putative functions to the experimentally-identified sequences based on their proximity to known sequence features. In order to provide statistical rigor for the analysis, we have developed a pipeline to characterize sequences picked at random from the genome. We are applying this method to two NHGRI research projects, which, although fundamentally different on a biological level, are identical from a computational perspective, as both involve determining the chromosomal location of a genomic sequence fragment and then analyzing the genomic context of the region. Dr. Gregory Crawford, a postdoctoral fellow in Dr. Francis Collins' lab, is developing an experimental strategy to identify regulatory regions in the human genome. To achieve this goal, he clones and sequences DNAse I hypersensitive (DNAse HS) sites. Our evidence suggests that these DNAse HS sites occur frequently in regions thought to be involved in gene regulation, including upstream of genes, and within CpG islands and regions of human/mouse conservation. Jaya Jagadeesh in Dr. Fabio Candotti's lab is cloning and sequencing retroviral integration sites in a patient treated in a retroviral gene therapy trial. We are in the process of determining whether any of these integrations could disrupt gene function and thereby affect the patient?s health. We anticipate applying this method to other sets of experimental data from a variety of organisms. Human genomic data can also be easily integrated with publicly available mRNA and protein sequences and annotations from human as well as other organisms. We have generated tools that use links between these data sources to output data suitable for large-scale analysis. For example, large-scale microarray experiments provide data about groups of genes that are co-transcribed under various conditions. Using our analysis tools, we can easily extract the sequences upstream of these genes, and, in subsequent collaborations with NHGRI investigators, develop informatics methodologies to predict the sequence elements that are responsible for this co-regulation. The completion of human genome sequencing also makes it possible to perform comprehensive analyses on small-scale projects. Previously, I discovered a novel gene family termed ADAM, for membrane proteins containing A Disintegrin and Metalloprotease domain. A total of 34 members of the ADAM family have been identified to date, and they are involved in many events including fertilization, neurogenesis and myogenesis, as well as in the process of ectodomain shedding. I am carrying out a comprehensive search for ADAM and ADAM-like genes in the completely sequenced eukaryotic genomes including human, mouse, Caenorhabditis elegans, Drosophila melanogaster and Arabidopsis thaliana, and will extend the analysis to other organisms as their genomes near completion. This work will allow a more thorough understanding of the complex roles that the ADAM proteins play in these different organisms, as well as the evolutionary events that gave rise to this large gene family.