With the completion and public availability of the human genome sequence, it is now possible to perform large-scale, comprehensive genome analyses that were not possible even a few years ago. As the sequence has progressed from a working draft to a finished state, many groups have developed tools to annotate this sequence, thereby making it even more useful to the scientific community. My research focuses on developing methodologies to integrate, in an automated manner, these diverse sequence and annotation data with experimentally-generated data so that bench biologists can quickly and easily obtain results for their own large-scale, genome-wide experiments. The goal of one of my research projects is to take advantage of the publicly available set of sequence and annotations to develop automated tools for the computational characterization of experimentally identified genomic sequences. We align each sequence to the reference human genome assembly to determine its genomic location, and then compare the coordinates of this sequence to the coordinates of a variety of genome annotations. Using this approach, we can assign putative functions to the experimentally-identified sequences based on their proximity to known sequence features. In order to provide statistical rigor for the analysis, we have developed a pipeline to characterize sequences picked at random from the genome. We are applying this method to two types of research projects, which, although fundamentally different on a biological level, are identical from a computational perspective, as both involve determining the chromosomal location of a genomic sequence fragment and then analyzing the genomic context of the region. Dr. Gregory Crawford, a postdoctoral fellow in Dr. Francis Collins' lab, is developing an experimental strategy to identify regulatory regions in the human genome. To achieve this goal, he clones and sequences DNAse I hypersensitive (DNAse HS) sites. Our analysis of ~230,000 hypersensitive sites from CD4+ T cells hypersensitive sites suggests that the sites occur frequently in regions thought to be involved in gene regulation, including upstream of genes, and within CpG islands and highly conserved sequences. We have applied similar techniques during collaborations with NIH researchers to determine if retroviruses and retroviral vectors integrate randomly into the host genome during the process of retroviral gene therapy. With Dr. Fabio Candotti's lab at NHGRI, we have determined the integration sites in a patient treated in a retroviral gene therapy trial. We are in the process of determining whether any of these integrations could disrupt gene function and thereby affect the patient?s health, as well as whether the pattern of integration sites changes in the years post gene therapy. Trainees in Dr. Jennifer Puck?s lab at NHGRI have transduced CD34 cells with an X-linked severe combined immunodeficiency (XSCID) gene therapy retroviral vector. We are carrying out a computational characterization of the integration sites in different sources of CD34 cells. The completion of the human and other genome sequencing projects also makes it possible to perform comprehensive analyses on gene structure. With Dr. Lawrence Brody of NHGRI, we are exploring the role of exon size in protein evolution.