The overall aim of this research proposal is to combine computational and functional methodologies to develop a set of algorithms with high positive predictive value for identifying and classifying candidate cis-regulatory sequences sites in the vicinity of any gene of interest. The underlying hypothesis is that functional non-coding sequences - particularly those governing a set of tissue-specific genes - will evince common features at the sequence level that can be identified computationally and modeled with sufficient precision to enable accurate de novo predictions. However, it is expected that the overall predictive value of computational approaches alone will be comparatively low. Rather, employed as a screening tool in combination with a high throughput functional validation methodology, computational approaches of even low (10-20%) predictive potential would be of enormous value, enabling rapid culling of tens of thousands of cis-regulatory sequences from the human genome. The strategy employed will commence with development of a catalogue of functional non-coding sequences for a set of tissue- and lineage-specific human genes. This will be achieved by precise localization of DNaseI hypersensitive sites (HSs) surrounding 100 erythroid-specific and 100 lymphoid lineage -restricted genes. Both tissues represent highly developed experimental systems, and a substantial amount of information has already come to light concerning both cis- and trans-regulatory mechanisms operative within these cell types. DNaseI hypersensitivity in vivo is the sine qua non of a diverse cast of transcriptional regulatory elements including enhancers, promoters, insulators, and locus control regions. The utility of the nuclease hypersensitivity assay for identification of in vivo-functional regulatory sequences is unmatched: it is a mature, functionally-based approach validated by a vast literature and decades of highly productive studies encompassing hundreds of human and other eukaryotic genes. A comprehensive catalogue of HSs surrounding any gene would therefore be expected to encompass the majority - if not all - of its cognate transcriptional control elements active in the tissues under study. Next, a significant data mining effort will be undertaken. This phase will involve (i) structural comparisons among identified functional elements; (ii) identification of candidate transcription factor binding sites within HS sequences using motif analysis methodologies; (iii) identification of correlations with ancillary genomic features such as transcriptional start sites, CpG islands, and certain classes of repetitive sequences; and (iv) structural comparisons between in vivo functional sequences and evolutionarily conserved sequences within the study regions. A major focus will be application of model techniques such as hidden Markov models, technology from gene prediction programs, and classifier kernel methods such as support vector machines. Based on these analyses, initial models for prospective detection of cis-regulatory regions will be developed. Finally, these models will be tested in and out of sample for sensitivity and specificity. Positive feedback from successfully confirmed sites will be utilized to refine the information collected above, thereby enhancing the basic model. Predictive techniques will then be applied systematically to discover cis-regulatory sequences surrounding erythroid, lymphoid, and diverse other classes of human genes. The resulting database will be of incalculable value in furthering the study of the regulation of human genes and the computational methodologies employed therein.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
7R01GM071923-02
Application #
6944750
Study Section
Special Emphasis Panel (ZRG1-SSS-G (90))
Program Officer
Whitmarsh, John
Project Start
2004-09-01
Project End
2008-08-31
Budget Start
2005-09-01
Budget End
2006-08-31
Support Year
2
Fiscal Year
2005
Total Cost
$586,428
Indirect Cost
Name
University of Washington
Department
Genetics
Type
Schools of Medicine
DUNS #
605799469
City
Seattle
State
WA
Country
United States
Zip Code
98195
Sexton, Brittany S; Avey, Denis; Druliner, Brooke R et al. (2014) The spring-loaded genome: nucleosome redistributions are widespread, transient, and DNA-directed. Genome Res 24:251-9
Ganis, Jared J; Hsia, Nelson; Trompouki, Eirini et al. (2012) Zebrafish globin switching occurs in two developmental stages and is controlled by the LCR. Dev Biol 366:185-94
Li, Xiao-Yong; Thomas, Sean; Sabo, Peter J et al. (2011) The role of chromatin accessibility in directing the widespread, overlapping patterns of Drosophila transcription factor binding. Genome Biol 12:R34
Kharchenko, Peter V; Alekseyenko, Artyom A; Schwartz, Yuri B et al. (2011) Comprehensive analysis of the chromatin landscape in Drosophila melanogaster. Nature 471:480-5
Thomas, Sean; Li, Xiao-Yong; Sabo, Peter J et al. (2011) Dynamic reprogramming of chromatin accessibility during Drosophila embryo development. Genome Biol 12:R43
Cuenca, Alex G; Delano, Matthew J; Kelly-Scumpia, Kindra M et al. (2010) Cecal ligation and puncture. Curr Protoc Immunol Chapter 19:Unit 19.13
Stamatoyannopoulos, John A; Adzhubei, Ivan; Thurman, Robert E et al. (2009) Human mutation rate associated with DNA replication timing. Nat Genet 41:393-5
Hesselberth, Jay R; Chen, Xiaoyu; Zhang, Zhihong et al. (2009) Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nat Methods 6:283-9
Sekimata, Masayuki; PĂ©rez-Melgosa, Mercedes; Miller, Sara A et al. (2009) CCCTC-binding factor and the transcription factor T-bet orchestrate T helper 1 cell-specific structure and function at the interferon-gamma locus. Immunity 31:551-64
Mann, Tobias; Humbert, Richard; Dorschner, Michael et al. (2009) A thermodynamic approach to PCR primer design. Nucleic Acids Res 37:e95

Showing the most recent 10 out of 16 publications