The DNA sequence of the human genome informs us as to the composition of proteins that make up healthy cells, but also altered compositions that create diseased cells. How protein production is controlled through the regulation of the genes that encode them is of critical importance for healthy and diseased cells. Knowing precisely where gene regulatory proteins bind, and are organized throughout the genome, including their interactions with each other, informs us as to how genes are regulated and mis-regulated. Since there are potentially thousands of different kinds of regulatory proteins and thousands of different kinds of human cell types and environmental responses that are a product of various subsets of regulatory proteins, the entire ?universe? of gene regulatory events is quite substantial and consequently, quite costly to identify. A subset of these events will likely be informative or diagnostic of diseases states. Therefore, an important goal is to define informative interactions using cost-enabling, high accuracy, and robust genome-wide assays. To this end, ChIP-exo was developed to map the genomic binding locations of gene regulatory proteins at near-single base pair resolution. This assay will be applied, in high throughput, to determine the genome-wide positional organization of factors within protein-DNA complexes, like enhanceosomes. By broadly mapping the various classes of proteins that constitute much of the regulated epigenome, general rules about enhancer and repressor complex organization will be deduced.
Aim 1 involves collecting genome-wide ChIP-exo data in human cell lines for a wide variety of protein-DNA complexes.
Aim 2 will develop and implement computational approaches towards pattern recognition and data distillation in ChIP-exo datasets. The results are expected to provide structural insights into macromolecular protein complex assembly on a genomic scale, and in various cell types and conditions.

Public Health Relevance

Proteins that bind throughout the human genome control the genes that govern human health. Precise identification of the positional organization of these proteins within complexes will inform us as to the mechanics of their action, and mis-action when diseased. This project will provide that high-resolution view.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM125722-04
Application #
10078275
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Krasnewich, Donna M
Project Start
2018-01-19
Project End
2021-12-31
Budget Start
2021-01-01
Budget End
2021-12-31
Support Year
4
Fiscal Year
2021
Total Cost
Indirect Cost
Name
Pennsylvania State University
Department
Biochemistry
Type
Schools of Arts and Sciences
DUNS #
003403953
City
University Park
State
PA
Country
United States
Zip Code
16802