With the near completion of the human and mouse genome sequence, it is now possible to carry out large scale, nearly comprehensive analyses that were not feasible only a few years ago. I am involved in two collaborations to determine the sequence elements that regulate gene expression. In one, conducted with Bill Pavan?s laboratory at NHGRI, we use microarray data to generate a list of genes that are co-transcribed under a specific condition. As the genomic position of these genes has been annotated, their promoter sequences can be easily extracted from the public data, and then searched computationally for both known and novel transcription factor binding sites. We view the genomic context of these binding sites in the mouse and human genomes using a Genome Browser developed by the University of California, Santa Cruz (UCSC). Our current work on a set of genes that are co-expressed across a set of melanoma-derived cell lines that differ in stages of differentiation should help to elucidate the transcriptional hierarchy involved in melanocyte development and melanoma progression. A second project, carried out in collaboration with Francis Collins? laboratory at NHGRI, involves generating a genome-wide library of gene regulatory sequences. Gene expression is regulated by DNA binding proteins, including transcriptional activators and repressors. When these proteins are bound to DNA, local chromatin structure is altered, making the DNA sensitive to digestion by the enzyme DNAse I. Francis Collins? laboratory is developing an experimental procedure to clone and sequence all DNAse I hypersensitive sites in the human genome, and I am working on a multistep computational pipeline to determine if these hypersensitive sites cluster around genes, and might therefore represent binding sites for gene regulatory proteins. In particular, using human genome sequence data provided by UCSC, I determine where these sites map within the human genome, and then calculate their proximity to known genes. A pilot experiment with several hundred DNAse hypersensitive sites from quiescent human primary CD4+ T cells shows that these sites do occur frequently upstream of genes. Another use for the draft human genome sequence is to catalog all members of a gene family, and to identify the orthologous relationships between these genes and genes in other completely sequenced organisms. The ADAMs are a family of cell surface proteins that contain A Disintegrin And Metalloprotease domain. The 33 known members have been cloned from a variety of mammals, as well as Caenorhabditis elegans and Drosophila melanogaster. The proteins are unique in that they can display both a cell adhesion domain (an integrin-binding disintegrin domain) and a metalloprotease domain to the cell exterior. ADAMs have been implicated in a number of developmental events, such as fertilization, neurogenesis, and myogenesis, as well as several disease states, including rheumatoid arthritis, Alzheimer?s disease, and several cancers. We are developing a comprehensive catalog of all the ADAM and ADAM-like genes and pseudogenes in all fully sequenced eukaryotic genomes. This work will allow a more thorough understanding of the function of some ADAM genes, and the evolutionary events that gave rise to this large gene family. Identification of orthologous ADAMs between species will help researchers to carry out experiments in model organisms and apply those results to humans. As Associate Director of the Bioinformatics and Scientific Programming Core at NHGRI, I provide expertise and assistance in bioinformatics and computational analysis for genome research at NHGRI. New software developed this year by the Core includes GeneLink, a web-accessible database that stores and manipulates genotypic data for linkage mapping efforts, an image database used to store and display in situ photos, and WebBLAST, a tool to organize and store sequence data and provide first-pass analysis in the form of BLAST searches.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Intramural Research (Z01)
Project #
1Z01HG000185-02
Application #
6681684
Study Section
(GTB)
Project Start
Project End
Budget Start
Budget End
Support Year
2
Fiscal Year
2002
Total Cost
Indirect Cost
Name
Human Genome Research
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Hu, Jingqiong; Renaud, Gabriel; Gomes, Theotonius J et al. (2008) Reduced genotoxicity of avian sarcoma leukosis virus vectors in rhesus long-term repopulating cells compared to standard murine retrovirus vectors. Mol Ther 16:1617-23
Pike, B L; Greiner, T C; Wang, X et al. (2008) DNA methylation profiles in diffuse large B-cell lymphoma and their relationship to gene expression status. Leukemia 22:1035-43
Crawford, Gregory E; Holt, Ingeborg E; Whittle, James et al. (2006) Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res 16:123-31
Crawford, Gregory E; Davis, Sean; Scacheri, Peter C et al. (2006) DNase-chip: a high-resolution method to identify DNase I hypersensitive sites using tiled microarrays. Nat Methods 3:503-9
Hematti, Peiman; Hong, Bum-Kee; Ferguson, Cole et al. (2004) Distinct genomic integration of MLV and SIV vectors in primate hematopoietic stem and progenitor cells. PLoS Biol 2:e423
Crawford, Gregory E; Holt, Ingeborg E; Mullikin, James C et al. (2004) Identifying gene regulatory elements by genome-wide recovery of DNase hypersensitive sites. Proc Natl Acad Sci U S A 101:992-7
Gillanders, Elizabeth M; Masiello, Anthony; Gildea, Derek et al. (2004) GeneLink: a database to facilitate genetic studies of complex traits. BMC Genomics 5:81
Scacheri, Peter C; Rozenblatt-Rosen, Orit; Caplen, Natasha J et al. (2004) Short interfering RNAs can induce unexpected and divergent changes in the levels of untargeted proteins in mammalian cells. Proc Natl Acad Sci U S A 101:1892-7