With the near completion of the human and mouse genome sequence, it is now possible to carry out large scale, nearly comprehensive analyses that were not feasible only a few years ago. I am involved in two collaborations to determine the sequence elements that regulate gene expression. In one, conducted with Bill Pavan?s laboratory at NHGRI, we use microarray data to generate a list of genes that are co-transcribed under a specific condition. As the genomic position of these genes has been annotated, their promoter sequences can be easily extracted from the public data, and then searched computationally for both known and novel transcription factor binding sites. We view the genomic context of these binding sites in the mouse and human genomes using a Genome Browser developed by the University of California, Santa Cruz (UCSC). Our current work on a set of genes that are co-expressed across a set of melanoma-derived cell lines that differ in stages of differentiation should help to elucidate the transcriptional hierarchy involved in melanocyte development and melanoma progression. A second project, carried out in collaboration with Francis Collins? laboratory at NHGRI, involves generating a genome-wide library of gene regulatory sequences. Gene expression is regulated by DNA binding proteins, including transcriptional activators and repressors. When these proteins are bound to DNA, local chromatin structure is altered, making the DNA sensitive to digestion by the enzyme DNAse I. Francis Collins? laboratory is developing an experimental procedure to clone and sequence all DNAse I hypersensitive sites in the human genome, and I am working on a multistep computational pipeline to determine if these hypersensitive sites cluster around genes, and might therefore represent binding sites for gene regulatory proteins. In particular, using human genome sequence data provided by UCSC, I determine where these sites map within the human genome, and then calculate their proximity to known genes. A pilot experiment with several hundred DNAse hypersensitive sites from quiescent human primary CD4+ T cells shows that these sites do occur frequently upstream of genes. Another use for the draft human genome sequence is to catalog all members of a gene family, and to identify the orthologous relationships between these genes and genes in other completely sequenced organisms. The ADAMs are a family of cell surface proteins that contain A Disintegrin And Metalloprotease domain. The 33 known members have been cloned from a variety of mammals, as well as Caenorhabditis elegans and Drosophila melanogaster. The proteins are unique in that they can display both a cell adhesion domain (an integrin-binding disintegrin domain) and a metalloprotease domain to the cell exterior. ADAMs have been implicated in a number of developmental events, such as fertilization, neurogenesis, and myogenesis, as well as several disease states, including rheumatoid arthritis, Alzheimer?s disease, and several cancers. We are developing a comprehensive catalog of all the ADAM and ADAM-like genes and pseudogenes in all fully sequenced eukaryotic genomes. This work will allow a more thorough understanding of the function of some ADAM genes, and the evolutionary events that gave rise to this large gene family. Identification of orthologous ADAMs between species will help researchers to carry out experiments in model organisms and apply those results to humans. As Associate Director of the Bioinformatics and Scientific Programming Core at NHGRI, I provide expertise and assistance in bioinformatics and computational analysis for genome research at NHGRI. New software developed this year by the Core includes GeneLink, a web-accessible database that stores and manipulates genotypic data for linkage mapping efforts, an image database used to store and display in situ photos, and WebBLAST, a tool to organize and store sequence data and provide first-pass analysis in the form of BLAST searches.