The long-term objective of this proposal is to understand the key factors, DNA sequence elements, and molecular mechanisms that are involved in human TATA-less transcriptional networks. About 75% of human protein-coding gene promoters lack a TATA box or TATA-like sequence, but the process by which these promoters are transcribed has remained an unsolved mystery. It is of the utmost importance to fill this large and long-standing void in the knowledge of human gene networks. In humans, transcription via the TATA box can be mediated by the TATA box-binding protein (TBP) as well as by the vertebrate-specific TBP-related factor 3 (TRF3), both of which bind to the TATA box element. Bilaterally symmetric animals (bilaterians), such as humans, also contain a TBP paralog termed TBP-related factor 2 (TRF2). Unlike TBP and TRF3, TRF2 does not bind to the TATA box. Moreover, TRF2, but not TBP, has been found to be required for TATA-less transcription in fruit flies. It is possible that TRF2 or a TRF2-relatd factor mediates TATA-less transcription in humans. It has also been recently found that the 5' ends of many steady-state capped RNAs do not correspond to transcription start sites (TSSs). Instead, the 5' ends of many steady-state transcripts appear to have been processed and capped subsequent to initiation. This phenomenon has led to the incorrect assignment of the TSSs of many promoters. Fortunately, the mapping of the 5' ends of nascent transcripts now allows the identification of the correct TSSs. The TSS is one of the key landmarks of a gene, and it is therefore critically important that the TSSs of human genes are reassessed via analysis of their nascent transcripts. For this study, the identification of the correct TSSs would enable the productive analysis of the promoter regions. Hence, the two Specific Aims of this proposal are as follows.
Specific Aim 1 would be directed toward obtaining a much more comprehensive understanding of human transcription by generating new and accurate genome-wide TSS data with nascent transcripts from three human cell lines. The TSS patterns, promoter DNA sequences, expression in different cell types, and possible chromatin signatures would be analyzed. This information should enable the categorization of the TATA-less genes into distinct sets.
In Specific Aim 2, mechanistic analyses would be performed to identify the key factors and DNA sequence motifs that are used in human TATA-less transcription. Both biochemical and cell-based approaches would be used. If this work is successful, then the resulting knowledge would enable the discovery of the full range of factors, DNA sequence elements, and transcriptional networks that regulate the activity of about 75% of human genes.

Public Health Relevance

The process by which about 75% of human genes (termed TATA-less genes) are activated has remained an unsolved mystery. Fortunately, new discoveries as well as recent genome-wide technical advances provide promising new avenues for the solution of this intractable yet important problem. The resulting knowledge of human gene networks would enhance our understanding of nearly every aspect of human biology.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Exploratory/Developmental Grants (R21)
Project #
1R21HG008781-01
Application #
9012650
Study Section
Molecular Genetics A Study Section (MGA)
Program Officer
Pazin, Michael J
Project Start
2016-03-22
Project End
2017-12-31
Budget Start
2016-03-22
Budget End
2016-12-31
Support Year
1
Fiscal Year
2016
Total Cost
$193,750
Indirect Cost
$68,750
Name
University of California San Diego
Department
Biology
Type
Schools of Arts and Sciences
DUNS #
804355790
City
La Jolla
State
CA
Country
United States
Zip Code
92093
Vo Ngoc, Long; Cassidy, California Jack; Huang, Cassidy Yunjing et al. (2017) The human initiator is a distinct and abundant element that is precisely positioned in focused core promoters. Genes Dev 31:6-11
Vo Ngoc, Long; Wang, Yuan-Liang; Kassavetis, George A et al. (2017) The punctilious RNA polymerase II core promoter. Genes Dev 31:1289-1301