Almost a tenth of human genes code for proteins that interact with chromosomes in the nucleus. Most of these DNA-associated proteins (referred to as DAPs) are involved in regulating gene expression, by serving as part of the basic transcriptional machinery, as transcription factors that regulate the spatial and temporal levels of transcription, or as chromatin state regulators. These proteins are key components in biology, as transcriptional regulation underlies fundamental biological processes in organismal development, in determining cell states during differentiation, and in directing physiological responses to the internal and external environment. Thus, comprehensive and detailed assessment of the molecular actions of DAPs, that is, where they interact throughout the human genome, is a fundamental long-term goal of both basic and clinical research. In response to RFA-HG-16-002, ?Expanding the Encyclopedia of DNA Elements (ENCODE) in the Human and Mouse (UM1)?, this application proposes to use a recently-established shovel ready pipeline for mapping DAPs in human cell lines that overcomes the very high failure rates of traditional ChIP-seq, a widely- used approach that requires specific antibodies for each factor. The new approach, called CETCh-seq, involves adding an epitope tag at the endogenous locus encoding each protein, and using chromatin immunoprecipitation with a universal antibody against the epitope followed by high-throughput sequencing (ChIP-seq) to identify DAP-DNA associations genome-wide. This production pipeline will be applied to each of 1,244 DAPs that are expressed in a set of human cell lines and have not yet been mapped by ENCODE. During the four-year project, this pipeline will be used to test each of these factors in one human cell line, and for 100 of the DAPs, in four human cell lines, allowing characterization of cell-type differences. The project will also tag and assay multiple allelic versions of a small number of DAPs in which pathogenic or potentially pathogenic mutations have been identified. The project will produce genome-wide DAP maps and identify motifs for hundreds of human regulatory proteins, providing an important component for the next phase of the ENCODE Project. All data, as well as useful materials in the form of gene editing plasmids and tagged human cell lines, will be made freely available to the research community.
The ability to determine the entire genetic makeup ? the genome sequence ? of people has transformed the way we study human biology, and is beginning to have a significant impact in predicting and treating human disease. However, interpreting our genome sequences continues to be a challenge, leaving big gaps in our understanding of normal and pathological human biology. This ambitious proposal seeks to apply high- throughput genomic technologies to help overcome these limitations by improving the depth and breadth of functional element annotations of human genomes, in particular, by mapping the locations throughout the genome where regulatory proteins interact in human cells.