Proper regulation of gene expression is essential to the normal development and health of organisms, whereas aberrant gene regulation is known to cause many genetic diseases, including some inherited anemias, and it is thought to be a major contributor to complex phenotypes such as susceptibility to common diseases. Understanding the molecular mechanisms of gene regulation may provide novel candidates for therapeutic interventions. Our studies aim for a deeper molecular understanding of global aspects of gene regulation in an important biological process, the maturation of erythroid precursor cells to become red blood cells. Building on our progress using patterns in sequence alignments to predict cis-regulatory modules for erythroid genes and deciphering functional correlations of their evolutionary history, we propose to acquire genome-wide information on biochemical features associated with regulation to reach a more complete understanding of gene regulation in erythroid cells. Specifically, we propose to use high throughput biochemical assays such as chromatin immunoprecipitation followed by hybridization to microarrays and deep re-sequencing to acquire data on genomic DNA sequences (Aim 1) occupied in vivo by critical tissue-specific transcription factors, (Aim 2) bound by histones with modifications associated with gene activation or repression, (Aim 3) in chromatin with an altered structure, and (Aim 4) transcribed in a mouse erythroid cell model that undergoes maturation upon restoration of the critical transcription factor GATA-1. Then we will (Aim 5) apply existing software and develop new data-processing algorithms to determine peaks of signals that are likely to represent the locations of the features targeted in aims 1-4.
Aim 6 will mine the peak-calling results, along with raw data, multiple sequence alignments and other information to investigate their covariation structure and integrate them to predict cis-regulatory modules, classify the modules by function, identify motifs associated with specific protein occupancy, and deduce the phylogenetic depth of preservation of critical motifs in the regulatory modules.
Aim 7 will experimentally test biological hypotheses that arise from the analyses in Aims 6 and 7, determining the extent to which we can validate the locations of protein occupancy and transcripts, the predictions of both positive and negative cis-regulatory modules by gain-of-function cell transfection assays, and the role of motifs implicated in occupancy by directed mutagenesis and in vivo binding assays. We will test whether the motif- constraint hypothesis for protein-occupied DNA segments involved in enhancement applies to transcription factors in addition to GATA-1, and we will conduct additional experiments probing deeper biological issues. This research will provide not only global insights into mechanisms and effects of gene regulation during erythroid maturation, but the techniques and analytical tools developed here can be applied to better understand the development and differentiation of any tissue.

Public Health Relevance

Proper regulation of gene expression is essential to the normal development and health of organisms, whereas aberrant gene regulation can cause genetic diseases, and it appears to be a major contributor to susceptibility to common diseases. Understanding the molecular mechanisms of gene regulation may provide novel candidates for therapeutic interventions. Our studies collecting genome-wide data on many biochemical features associated with gene regulation, mining the data deeply to predict functional DNA sequences, and experimentally testing those bioinformatic predictions will provide global insights into mechanisms and effects of gene regulation during erythroid maturation and provide techniques and analytical tools to better understand the development and differentiation of any tissue.

Agency
National Institute of Health (NIH)
Institute
National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)
Type
Research Project (R01)
Project #
5R01DK065806-10
Application #
8423806
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Bishop, Terry Rogers
Project Start
2004-02-01
Project End
2015-01-31
Budget Start
2013-02-01
Budget End
2015-01-31
Support Year
10
Fiscal Year
2013
Total Cost
$555,662
Indirect Cost
$120,030
Name
Pennsylvania State University
Department
Biochemistry
Type
Schools of Arts and Sciences
DUNS #
003403953
City
University Park
State
PA
Country
United States
Zip Code
16802
Paralkar, Vikram R; Mishra, Tejaswini; Luan, Jing et al. (2014) Lineage and species-specific long noncoding RNAs during erythro-megakaryocytic development. Blood 123:1927-37
Yue, Feng; Cheng, Yong; Breschi, Alessandra et al. (2014) A comparative encyclopedia of DNA elements in the mouse genome. Nature 515:355-64
Giardine, Belinda; Borg, Joseph; Viennas, Emmanouil et al. (2014) Updates of the HbVar database of human hemoglobin variants and thalassemia mutations. Nucleic Acids Res 42:D1063-9
Cheng, Yong; Ma, Zhihai; Kim, Bong-Hyun et al. (2014) Principles of regulatory information conservation between mouse and human. Nature 515:371-5
Crispino, John D; Weiss, Mitchell J (2014) Erythro-megakaryocytic transcription factors associated with hereditary anemia. Blood 123:3080-8
Hardison, Ross C; Blobel, Gerd A (2013) Genetics. GWAS to therapy by genome edits? Science 342:206-7
Hoffman, Michael M; Ernst, Jason; Wilder, Steven P et al. (2013) Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res 41:827-41
Blobel, Gerd A; Hardison, Ross C (2013) A cluster to remember. Cell 154:718-20
Paralkar, Vikram R; Weiss, Mitchell J (2013) Long noncoding RNAs in biology and hematopoiesis. Blood 121:4842-6
Phillips-Cremins, Jennifer E; Sauria, Michael E G; Sanyal, Amartya et al. (2013) Architectural protein subclasses shape 3D organization of genomes during lineage commitment. Cell 153:1281-95

Showing the most recent 10 out of 46 publications