Less than 2% of the human genome sequences are protein-coding genes. It has been shown that at least 80% of the non-coding sequences of human genome are associated with certain chromatin biochemical modifications, and more than 70% of the genomic DNA can be transcribed into RNAs at various stages during development. Accumulating evidence suggests that these non-coding regulatory sequences are critical for spatial and temporal gene expression control. However, it remains challenging to determine whether and how these non-coding regulatory DNA and RNA sequences play a causal in a variety biological processes including diseases. In particular, questions of how the activity of enhancers are precisely controlled, and how non-coding RNAs recruit effector proteins to control gene expression and genome function, are largely unexplored. My overall hypothesis is that, cells integrate effector proteins and regulatory non-coding DNA and RNA sequences to create a spectrum of functionalities for precise gene regulation control. The rules governing these functionalities can then be derived by defining the key components, and examining how each functions alone and in combination. To test this, we have developed a robust, innovative multi-omics approaches allowing for comprehensive analysis of the molecular composition associated with non-coding DNA and RNA sequences. My long-term goal is to develop a predictive and functional understanding of the non-coding genome, which will elucidate how these regions can be specifically targeted for genomic medicine. Toward this goal, we seek to achieve three major goals: 1) Control enhancer activity through systematic and targeted recruitment of epigenetic effectors; 2) Define the regulome of lncRNA-mediated gene regulation; 3). Develop innovative mouse model to study the function and regulation of non-coding genome disease model in vivo. Our work will have a broad impact to advance genomics research and genomics medicine by developing new approaches and new mouse models to deepen our knowledge on non-coding regulatory genome.

Public Health Relevance

Less than 2% of the human genome sequences are protein-coding genes, while more than 98% of the genome consists of non-coding regulatory DNA and RNA sequences which play an essential role in biology and human diseases. Targeted engineering the function and activity of non-coding regulatory genome could greatly expand the ?druggable genome?, and holds great promise for gene-regulation based therapeutics and genomics medicine. Here, by taking advantage of our newly developed multi-omics approach, we seed to identify the regulators, which determine the activity of non-coding DNA and RNA sequences and can be utilized to modulating gene expression via engineering non-coding regulatory sequences.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Unknown (R35)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1)
Program Officer
Pazin, Michael J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Duke University
Anatomy/Cell Biology
Schools of Medicine
United States
Zip Code