The 3D architecture of mammalian genome plays a key role in transcription regulation. Through DNA looping, non-coding cis-regulatory elements may regulate target genes from hundreds of kilobases away. Because of this complexity, generating a comprehensive map of long-range DNA looping interactions will greatly facilitate our understanding of genome functions. Our previous work for the first time demonstrated that it is feasible to map the 3D genome in mammalian cells with 3~5 billion Hi-C reads, at a resolution of 5-10kb. At this resolution, interactions between individual cis-regulatory elements can be revealed. Recently, single cell Hi-C approach has also been tested to reveal cell-to-cell variability of chromosome structures. The fast growing field of 3D genome research calls for 3D genome maps in a variety of cell or tissue types under different physiological or pathogenic perturbations. In order to achieve broad applicability, 3D genome mapping technology must address the following challenges: (i) Ability to assay rare bio-samples; (ii) Generating high-quality library for deep sequencing at a level of several billion reads; (iii) The ability to analyze a large number of single cells for the analyses of complex tissue or cellular heterogeneity. However, the library quality from Hi-C and its derivatives is usually poor when the amount of starting material is small. The overall goal of this proposal is to develop a simple and efficient 3C-seq method (Circularized Ligation Products sequencing, or CLP-seq) to generate high-quality libraries suitable for ultra-deep sequencing from a small number of cells. In contrast to Hi-C and its derivatives, CLP-seq is unique because it enriches ligation junction products through a series of enzymatic reactions without the need for biotin labeling and pull-down. From a pilot experiment, we estimate that this new method requires less than 1% of cells as starting material to reach sequencing depth at that level of a billion reads (over 100-fold improvement over Hi-C). Furthermore, because CLP- seq avoids biotin labeling and pull-down, it is amenable to the development of a one-tube single cell CLP- seq protocol (scCLP-seq) for massive scalable single cell analysis. In this project, we will establish and optimize these new technologies, and as proof-of-principle, also produce a significant amount of valuable data resources with these methods in the following three aims.
In aim 1, we will optimize CLP-seq protocol to generate high-complexity libraries for ultra-deep sequencing from small cell populations or rare human tissues.
In aim 2, we will develop a full-package CLP-seq data analysis pipeline to detect and visualize DNA looping interactions at kilobase resolution. We will generate kilobase-resolution 3D genome maps in a few difficult human tissues and perform preliminary functional annotation of non- coding GWAS SNPs in relevant human diseases.
In aim 3, we will further develop a one-tube scCLP- seq protocol for simultaneous analysis of hundreds of single cells. As test cases, we will generate 3D genome data in hundreds of single cells from human islet tissues, and explore strategies to perform subpopulation analysis using clustering methods. We believe this technology advance will expand the field of 3D genome study and eventually benefit our understanding of genome functions and human diseases.

Public Health Relevance

Through long-range DNA looping interactions, mutations and genetic variants in non-coding DNA can lead to misregulation of key genes governing the initiation and progression of human diseases, including cancer, diabetes, obesity, autism, etc. We will develop a simpler and cheaper method to map 3D genome in mammalian cells, offering a much needed tool accessible to a greater body of biomedical researchers who are interested in the functions of non-coding genome. This new technology also offers an efficient tool to map 3D genome architecture in hundreds of single cells simultaneously.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
1R01HG009658-01
Application #
9364054
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Smith, Michael
Project Start
2017-08-09
Project End
2022-06-30
Budget Start
2017-08-09
Budget End
2018-06-30
Support Year
1
Fiscal Year
2017
Total Cost
Indirect Cost
Name
Case Western Reserve University
Department
Genetics
Type
Schools of Medicine
DUNS #
077758407
City
Cleveland
State
OH
Country
United States
Zip Code
44106