With the completion and public availability of the human genome sequence, it is now possible to perform large-scale, comprehensive genome analyses that were not possible even a few years ago. As the sequence has progressed from a working draft to a finished state, many groups have developed tools to annotate this sequence, thereby making it even more useful to the scientific community. My research focuses on developing methodologies to integrate, in an automated manner, these diverse sequence and annotation data with experimentally-generated data so that bench biologists can quickly and easily obtain results for their own large-scale, genome-wide experiments. The goal of one of my research projects is to take advantage of the publicly available set of sequence and annotations to develop automated tools for the computational characterization of experimentally identified genomic sequences. We align each sequence to the reference human genome assembly to determine its genomic location, and then compare the coordinates of this sequence to the coordinates of a variety of genome annotations. Using this approach, we can assign putative functions to the experimentally-identified sequences based on their proximity to known sequence features. In order to provide statistical rigor for the analysis, we have developed a pipeline to characterize sequences picked at random from the genome. We are applying this method to two types of research projects, which, although fundamentally different on a biological level, are identical from a computational perspective, as both involve determining the chromosomal location of a genomic sequence fragment and then analyzing the genomic context of the region. Dr. Gregory Crawford, a postdoctoral fellow in Dr. Francis Collins' lab, is developing an experimental strategy to identify regulatory regions in the human genome. To achieve this goal, he clones and sequences DNAse I hypersensitive (DNAse HS) sites. Our analysis of ~230,000 hypersensitive sites from CD4+ T cells hypersensitive sites suggests that the sites occur frequently in regions thought to be involved in gene regulation, including upstream of genes, and within CpG islands and highly conserved sequences. We have applied similar techniques during collaborations with NIH researchers to determine if retroviruses and retroviral vectors integrate randomly into the host genome during the process of retroviral gene therapy. With Dr. Fabio Candotti's lab at NHGRI, we have determined the integration sites in a patient treated in a retroviral gene therapy trial. We are in the process of determining whether any of these integrations could disrupt gene function and thereby affect the patient?s health, as well as whether the pattern of integration sites changes in the years post gene therapy. Trainees in Dr. Jennifer Puck?s lab at NHGRI have transduced CD34 cells with an X-linked severe combined immunodeficiency (XSCID) gene therapy retroviral vector. We are carrying out a computational characterization of the integration sites in different sources of CD34 cells. The completion of the human and other genome sequencing projects also makes it possible to perform comprehensive analyses on gene structure. With Dr. Lawrence Brody of NHGRI, we are exploring the role of exon size in protein evolution.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Intramural Research (Z01)
Project #
1Z01HG000185-05
Application #
7147958
Study Section
(GTB)
Project Start
Project End
Budget Start
Budget End
Support Year
5
Fiscal Year
2005
Total Cost
Indirect Cost
Name
Human Genome Research
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Hu, Jingqiong; Renaud, Gabriel; Gomes, Theotonius J et al. (2008) Reduced genotoxicity of avian sarcoma leukosis virus vectors in rhesus long-term repopulating cells compared to standard murine retrovirus vectors. Mol Ther 16:1617-23
Pike, B L; Greiner, T C; Wang, X et al. (2008) DNA methylation profiles in diffuse large B-cell lymphoma and their relationship to gene expression status. Leukemia 22:1035-43
Crawford, Gregory E; Holt, Ingeborg E; Whittle, James et al. (2006) Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res 16:123-31
Crawford, Gregory E; Davis, Sean; Scacheri, Peter C et al. (2006) DNase-chip: a high-resolution method to identify DNase I hypersensitive sites using tiled microarrays. Nat Methods 3:503-9
Hematti, Peiman; Hong, Bum-Kee; Ferguson, Cole et al. (2004) Distinct genomic integration of MLV and SIV vectors in primate hematopoietic stem and progenitor cells. PLoS Biol 2:e423
Crawford, Gregory E; Holt, Ingeborg E; Mullikin, James C et al. (2004) Identifying gene regulatory elements by genome-wide recovery of DNase hypersensitive sites. Proc Natl Acad Sci U S A 101:992-7
Gillanders, Elizabeth M; Masiello, Anthony; Gildea, Derek et al. (2004) GeneLink: a database to facilitate genetic studies of complex traits. BMC Genomics 5:81
Scacheri, Peter C; Rozenblatt-Rosen, Orit; Caplen, Natasha J et al. (2004) Short interfering RNAs can induce unexpected and divergent changes in the levels of untargeted proteins in mammalian cells. Proc Natl Acad Sci U S A 101:1892-7